Package 'SAEval' reference manual

Package 'SAEval'

Title:	Small Area Estimation Evaluation
Description:	Allows users to produce diagnostic procedures and graphic tools for the evaluation of Small Area estimators.
Authors:	Andrea Fasulo [aut, cre]
Maintainer:	Andrea Fasulo <[email protected]>
License:	EUPL
Version:	1.0.0
Built:	2025-03-08 03:46:32 UTC
Source:	https://github.com/cran/SAEval

Title:

Small Area Estimation Evaluation

Description:

Allows users to produce diagnostic procedures and graphic tools for the evaluation of Small Area estimators.

Authors:

Andrea Fasulo [aut, cre]

Maintainer:

Andrea Fasulo <[email protected]>

License:

EUPL

Version:

1.0.0

Built:

2025-03-08 03:46:32 UTC

Source:

https://github.com/cran/SAEval

The R SAEval Package

Description

SAEval is an R package for diagnostic analysis of Small Area Estimation (SAE). It provide a set of tools for the evaluation of SAE with respect to the direct estimates.

Details

Working with SAE it is good practice to compare different estimators to find the one with the best performance. This package contains functions for statistical calculation of diagnostic procedure aimed at evaluate the quality of the SAE. In detail, in the package are developed some methods originally proposed in Brown et al (2001) to check the quality of SAE.

Furthermore is possible to produce graphical tools that map the chosen indicator for a spatial analysis.

For a complete list of functions, use library(help = "SAEval").

Author(s)

Developed by Andrea Fasulo

Bias diagnostic

Description

bias diagnostic allows to evaluate how the model-based estimates are closed to the unbiased direct estimates.

Usage

bias(data,dir,sae,scatterplot=FALSE,main=NULL)
bias(data,dir,sae,scatterplot=FALSE,main=NULL)

Arguments

`data`	a data frame containing the direct estimates among with the small area estimates, e.g. `SAEval_example`.
`dir`	formula identifing the direct estimates.
`sae`	formula identifing the small area estimates.
`scatterplot`	logical scalar. Should the scatterplot of the estimates be produced (default=FALSE)?. See also 'Details'.
`main`	optionally, if a string is set in `main` it will be used as the scatterplot main title. The defualt main title is the name of the direct estimator versus the SAE names.

Details

bias tests whether the model based estimates are closed to the direct estimates. A parametric test for the slope and for the intercept is carried out to check the unbiasedness of the model predictions. A square-root of the estimates is required when the homoskedasticity assumption underpinning the OLS fitting method is not satisfied. The Goldfeld and Quandt homoscedasticity test is provided, to check such constant variances.

The use of this diagnostic is straightforward when the focus of interest is on small area totals since unbiased direct estimators of such totals are typically available.

If scatterplot=TRUE the direct estimates (X-axis) are plotted on a cartesian plane against the SAE estimates (Y-axis) to verify if there is a departure of the Y = X (red line) from the regression line between model based and direct estimates (black line).

The small area with direct estimate equal to NA value are automatically removed from the data.

Value

Object of class list. The list contains up to 2 objects:

`output1`	a data frame containing for the small area estimates of interest (`methods`), the intercept (`b0`), the slope (`b1`) and the R-squared (`R2`) values among with the F-test (`F`) and Goldfeld and Quandt test (`GQ_Test`).
`output2`	a data frame containing for the trasformed small area estimates of interest (`methods`) the intercept (`b0`), the slope (`b1`)and the R-squared (`R2`) values among with F-test (`F`) and Goldfeld and Quandt test (`GQ_Test`).

Author(s)

Developed by Andrea Fasulo

References

Brown, G., Chambers, R., Heady, P., Heasman, D. (2001), Evaluation of small area estimation methods - An application to unemployment estimates from the UK LFS, in Proceedings of Statistics Canada Symposium 2001: Achieving Data Quality in a Statistical Agency: A Methodological Perspective, Statistics Canada.

Mukhopadhyay, P. K., McDowell, A. (2011). Small area estimation for survey data analysis using SAS software, http://support.sas.com/rnd/app/papers/smallarea.pdf.

Srivastava, A. K., Sud, U. C., Chandra, H. (2007). Small area estimation - An application to National Sample Survey Data, Journal of the Indian Society of Agricultural Statistics, 61(2), 249-254.

Examples

# Load example data
data(SAEval_example)

SAEval.bias<-bias(data=SAEval_example,
dir=~y_d,
sae = ~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis)

SAEval.bias

# Load example data
data(SAEval_example)

SAEval.bias<-bias(data=SAEval_example,
dir=~y_d,
sae = ~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis)

SAEval.bias

Calibration diagnostic

Description

calibration diagnostic refers to the calibration property of model estimates, according to which they should not differ from the direct estimates when aggregated at appropriate large domain levels. Computing this diagnostic we obtain an accurate measure of the calibration property of the model estimates, providing also an evidence of the presence/absence of spatial bias/autocorrelation.

Usage

calibration(data,dir,sae,area)calibration(data,dir,sae,area)

Arguments

`data`	a data frame containing the direct and small area estimates among with their variance, e.g. `SAEval_example`.
`dir`	formula identifing the direct estimates.
`sae`	formula identifing the small area estimates.
`area`	formula identifing the area for which the `calibration` diagnostic is computed.

Details

calibration compute the relative difference between the aggregated model-based estimates and the aggregated direct estimates.

The small area with both direct estimate and variance of the direct estimates equal to NA value are automatically removed from the data.

Value

Object of class list. The list contains objects equal to the number of larger domain specified in area. Each object will contains the calibration diagnostic for all the modes of the area.

Author(s)

Developed by Andrea Fasulo

References

Mukhopadhyay, P. K., McDowell, A. (2011). Small area estimation for survey data analysis using SAS software, http://support.sas.com/rnd/app/papers/smallarea.pdf.

Srivastava, A. K., Sud, U. C., Chandra, H. (2007). Small area estimation - An application to National Sample Survey Data, Journal of the Indian Society of Agricultural Statistics, 61(2), 249-254.

Examples

# Load example data
data(SAEval_example)

SAEval.calibration<-calibration(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,area=~nuts0+nuts1+nuts2)

SAEval.calibration
# Load example data
data(SAEval_example)

SAEval.calibration<-calibration(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,area=~nuts0+nuts1+nuts2)

SAEval.calibration

Confidence interval analysis

Description

cinterval analize the direct estimates with respect to the SAE confidence interval.

Usage

cinterval(data,dir,sae,v.dir,mse.sae,level=0.95,plot=F)
cinterval(data,dir,sae,v.dir,mse.sae,level=0.95,plot=F)

Arguments

`data`	a data frame containing the direct and small area estimates among with their variance, e.g. `SAEval_example`.
`dir`	formula identifing the direct estimates.
`sae`	formula identifing the small area estimates.
`v.dir`	formula identifing the direct estimates variance.
`mse.sae`	formula identifing the small area estimates mean squared error.
`level`	double number. The confidence level represents the proportion of correspondingly confidence inteval that end up containing the true value of the parameter (default=0.95).
`plot`	logical scalar. Should the plot be produced (default=FALSE)?. See also 'Details'.

Details

This diagnostic measures (i) for each SAE estimators the number of direct estimates that fall between the upper and lower bound of the SAE estimates confidence interval and (ii) the number of overlapping confidence intervals.

If plot=TRUE the direct estimates are plotted with the SAE confindence interval to analyze the distributions.

The small area with both direct estimate and variance of the direct estimates equal to NA value are automatically removed from the data.

Value

Object of class data.frame. The data frame contains information for the small area estimators (methods) about the number of direct estimates included in the SAE confidence interval (included) and the number of overlapping confidence intervals (overlap).

Author(s)

Developed by Andrea Fasulo

References

Mukhopadhyay, P. K., McDowell, A. (2011). Small area estimation for survey data analysis using SAS software, http://support.sas.com/rnd/app/papers/smallarea.pdf.

Srivastava, A. K., Sud, U. C., Chandra, H. (2007). Small area estimation - An application to National Sample Survey Data, Journal of the Indian Society of Agricultural Statistics, 61(2), 249-254.

Examples

# Load example data
data(SAEval_example)

SAEval.cinterval<-cinterval(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.cinterval

# Load example data
data(SAEval_example)

SAEval.cinterval<-cinterval(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.cinterval

Coverage diagnostic

Description

coverage diagnostic tests the validity between the 95% adjusted confidence intervals of the model based estimates making comparison with the corresponding adjusted confidence intervals for the direct estimates.

Usage

coverage(data,dir,sae,v.dir,mse.sae,alfa=0.05)
coverage(data,dir,sae,v.dir,mse.sae,alfa=0.05)

Arguments

`data`	a data frame containing the direct and small area estimates among with their variance, e.g. `SAEval_example`.
`dir`	formula identifing the direct estimates.
`sae`	formula identifing the small area estimates.
`v.dir`	formula identifing the direct estimates variance.
`mse.sae`	formula identifing the small area estimates mean squared error.
`alfa`	double number. The significance level of the non-parametric Binomial test (default=0.05).

Details

This diagnostic measures the overlap between the confidence intervals, which is expected to be not significantly different from the 95% of the numbers of small areas.

The small area with both direct estimate and variance of the direct estimates equal to NA value are automatically removed from the data.

Value

Object of class data.frame. The data frame contains information for the small area estimators (methods), non-coverage total (non_coverage), number of small area domains (domains), non-overlap ratio (non_overlap), p-value for Binomial statistic (p_value) and the test result (results).

Author(s)

Developed by Andrea Fasulo

References

Mukhopadhyay, P. K., McDowell, A. (2011). Small area estimation for survey data analysis using SAS software, http://support.sas.com/rnd/app/papers/smallarea.pdf.

Srivastava, A. K., Sud, U. C., Chandra, H. (2007). Small area estimation - An application to National Sample Survey Data, Journal of the Indian Society of Agricultural Statistics, 61(2), 249-254.

Examples

# Load example data
data(SAEval_example)

SAEval.coverage<-coverage(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.coverage

# Load example data
data(SAEval_example)

SAEval.coverage<-coverage(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.coverage

Coefficent of variation's table

Description

cv_table is used to analyse the coefficent of variation distribution of the chosen indicators.

Usage

cv_table(data,cv,boxplot=FALSE)
cv_table(data,cv,boxplot=FALSE)

Arguments

`data`	a data frame containg the coefficient of variation for the direct and small area estimators
`cv`	formula identifing the coefficient of variation.
`boxplot`	logical scalar. Should the boxplot of the coefficient of variation be produced (default=FALSE)?.

Details

cv_table allows to evaluate the cv of the different estimators with respect to some well-known thresholds given by Statistics Canada (2009). For cv below 0.165 there are no rescrictions to the dissemination, for cv in the range 0.166-0.333 is suggested a publication with a warning, for cv above 0.333 the dissemination is not recommendent.

Value

Object of class data.frame. The data frame contains informations about the number of cvs that fall within each class.

Author(s)

Developed by Andrea Fasulo

References

Statistics Canada, 2009, "Quality Guideline", Fifth edition, October 2009

Examples

# Load example data
data(SAEval_example)

# cv for the direct estimates
SAEval_example$cvd<-sqrt(SAEval_example$mse_d)/SAEval_example$y_d
#cv for the synthetic estimates
SAEval_example$cvsae<-sqrt(SAEval_example$mse_sa)/SAEval_example$y_syna

cv_data<-SAEval_example[,c("cvd","cvsae")]

SAEval_cvtable<-cv_table(data=cv_data,
cv=~cvd+cvsae)

SAEval_cvtable

# Load example data
data(SAEval_example)

# cv for the direct estimates
SAEval_example$cvd<-sqrt(SAEval_example$mse_d)/SAEval_example$y_d
#cv for the synthetic estimates
SAEval_example$cvsae<-sqrt(SAEval_example$mse_sa)/SAEval_example$y_syna

cv_data<-SAEval_example[,c("cvd","cvsae")]

SAEval_cvtable<-cv_table(data=cv_data,
cv=~cvd+cvsae)

SAEval_cvtable

Goodness of fit diagnostic

Description

The goodness of fit diagnostic allows to evaluate how close the model-based estimates are to the direct estimates when they are good.

Usage

gof(data,dir,sae,v.dir,mse.sae,alfa=0.05)
gof(data,dir,sae,v.dir,mse.sae,alfa=0.05)

Arguments

`data`	a data frame containing the direct and small area estimates among with their variance, e.g. `SAEval_example`.
`dir`	formula identifing the direct estimates.
`sae`	formula identifing the small area estimates.
`v.dir`	formula identifing the direct estimates variance.
`mse.sae`	formula identifing the small area estimates mean squared error.
`alfa`	double number. The significance level of the Chi-squared test (default=0.05).

Details

As in the bias diagnostic, even with this procedure we want to know if the model estimates are close to the direct estimates. To evaluate this we compute the squared difference between the model estimates and the direct estimate which are weighted inversely by their variance and summed over all the domains. As a check for the lack of bias of the model estimates this statistic is compared with the quantiles of Chi-squared distribution. Finally results are provided using a Wald goodness of fit statistic.

The small area with both direct estimate and variance of the direct estimates equal to NA value are automatically removed from the data.

Value

Object of class data.frame. The data frame contains information for the small area estimators (methods), Wald statistic (W), Chi-squared statistic (c2), p-value for Wald statistic (p_value) and the test result (results).

Author(s)

Developed by Andrea Fasulo

References

Mukhopadhyay, P. K., McDowell, A. (2011). Small area estimation for survey data analysis using SAS software, http://support.sas.com/rnd/app/papers/smallarea.pdf.

Srivastava, A. K., Sud, U. C., Chandra, H. (2007). Small area estimation - An application to National Sample Survey Data, Journal of the Indian Society of Agricultural Statistics, 61(2), 249-254.

Examples

# Load example data
data(SAEval_example)

SAEval.gof<-gof(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.gof

# Load example data
data(SAEval_example)

SAEval.gof<-gof(data=SAEval_example,
       dir=~y_d,
       sae=~y_syna+y_eblupa+y_spaznr+y_eblupb+y_synb+y_logis,
       v.dir=~mse_d,
       mse.sae=~mse_sa+mse_eba2+mse_spaznr+mse_ebb+mse_sb+mse_log)

SAEval.gof

Map the disagregated estimates and the coefficients of variation.

Description

map_sae produces geographical maps for the small area estimates or the direct estimaes among with their CVs.

Usage

map_sae(shapefile,
data,
area,
indicators,
color=c("green","red"),
breaks=FALSE,
main=FALSE,
output_data=FALSE)
map_sae(shapefile,
data,
area,
indicators,
color=c("green","red"),
breaks=FALSE,
main=FALSE,
output_data=FALSE)

Arguments

`shapefile`	object of class `sf` and `data.frame` as defined by the `sf` package containing shapefile informations, e.g. `sa_shp`. See also 'Details'.
`data`	data frame containing for the area of interest the information to be visualized, e.g. `SAEval_example`.
`area`	formula identifing the area of interest.
`indicators`	formula identifing the variables to be visualized.
`color`	a `vector` of 2 color defining the lowest and highest values in the plot
`breaks`	list containing the endpoints for each indicator of interest (default=FALSE).
`main`	logical scalar. Should the maps include a main title (default=FALSE)?. See also 'Details'.
`output_data`	logical scalar. Should the funtion returns a data frame including the map data among with the indicators of interest (default=FALSE)?. See also 'Details'.

Details

shapefile object can be created with the sf package using the function st_read. If main is equal to TRUE the name of the indicator will be used as main title of the map. When output_data is equal to TRUE a map data object is returned so can be easaly maneged using ggplot for a better graphical personalizzation.

Value

Returns maps, and, if selected, a data.frame containing the mapdata enriched with the indicators of interest.

Author(s)

Developed by Andrea Fasulo

References

Pebesma E., et al.,2021, "sf: Simple Features for R", CRAN repository https://CRAN.R-project.org/package=sf

Examples


# Load example data and shape file
data(SAEval_example);data(sa_shp)

SAEval_example$cv_d<-sqrt(SAEval_example$mse_d)/SAEval_example$y_d

SAEval_example$cv_sa<-sqrt(SAEval_example$mse_sa)/SAEval_example$y_syna

# Without using breaks
map_sae(shapefile=sa_shp,data=SAEval_example,area=~sa,indicators=~y_d+cv_d+y_syna+cv_sa,main=TRUE)

# Using breaks
map_sae(shapefile=sa_shp,data=SAEval_example,area=~sa,indicators=~y_d+cv_d+y_syna+cv_sa,
        breaks=list(seq(0,40000,5000),seq(0,1.5,0.15),seq(0,40000,5000),seq(0,1.5,0.15)),main=TRUE)


# Load example data and shape file
data(SAEval_example);data(sa_shp)

SAEval_example$cv_d<-sqrt(SAEval_example$mse_d)/SAEval_example$y_d

SAEval_example$cv_sa<-sqrt(SAEval_example$mse_sa)/SAEval_example$y_syna

# Without using breaks
map_sae(shapefile=sa_shp,data=SAEval_example,area=~sa,indicators=~y_d+cv_d+y_syna+cv_sa,main=TRUE)

# Using breaks
map_sae(shapefile=sa_shp,data=SAEval_example,area=~sa,indicators=~y_d+cv_d+y_syna+cv_sa,
        breaks=list(seq(0,40000,5000),seq(0,1.5,0.15),seq(0,40000,5000),seq(0,1.5,0.15)),main=TRUE)

Example dataset to map Small Area Estimates

Description

sa_shp contains a sf object to map the small area estimates.

Usage

data(sa_shp)
data(sa_shp)

Format

sa_shp is a sf object with the shapefile for the sa domain.

Examples

# Load example data
data(sa_shp)

summary(sa_shp)

# Load example data
data(sa_shp)

summary(sa_shp)

Example dataset for the evaluation of Small Area Estimates

Description

SAEval_example contains a data.frame with direct and indirect estimates for unplunned domain among with their variance.

Usage

data(SAEval_example)
data(SAEval_example)

Format

SAEval_example is a data frame with 107 domains and 18 variables:

sa: domain of interest codes
nuts1: NUTS1 codes
nuts2: NUTS2 codes
nuts0: NUTS0 codes
y_d: direct estimated
mse_d: variance of direct estimates
y_syna: unit level synthetic estimates
mse_sa: MSE of unit level synthetic estimates
y_eblupa: unit level EBLUP estimates
mse_eba2: MSE of unit level EBLUP estimates
y_spaznr: unit level EBLUP estimates with spatial correlation of random effects
mse_spaznr: MSE of unit level EBLUP estimates with spatial correlation of random effects
y_eblupb: area level EBLUP estimates
mse_ebb: MSE of area level EBLUP estimates
y_synb: area level synthetic estimates
mse_sb: MSE of area level synthetic estimates
y_logis: unit level EBLUP type logit estimates
mse_log: MSE of unit level EBLUP type logit estimates

Examples

# Load example data
data(SAEval_example)
summary(SAEval_example)
# being the domain unplunned there are 7 areas without direct estimates
dim(SAEval_example[SAEval_example$y_d==0,])
# Load example data
data(SAEval_example)
summary(SAEval_example)
# being the domain unplunned there are 7 areas without direct estimates
dim(SAEval_example[SAEval_example$y_d==0,])

Package 'SAEval'

Help Index

The R SAEval Package

Description

Details

Author(s)

Bias diagnostic

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Calibration diagnostic

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Confidence interval analysis

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Coverage diagnostic

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Coefficent of variation's table

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Goodness of fit diagnostic

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Map the disagregated estimates and the coefficients of variation.

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Example dataset to map Small Area Estimates

Description

Usage

Format

Examples

Example dataset for the evaluation of Small Area Estimates

Description

Usage

Format

Examples