Function to perform a fast and simple (primitive) method of microaggregation. (for large datasets)

mafast(obj, variables = NULL, by = NULL, aggr = 3, measure = mean)

Arguments

obj

either a sdcMicroObj-class-object or a data.frame

variables

variables to microaggregate. If obj is of class sdcMicroObj the numerical key variables are chosen per default.

by

grouping variable for microaggregation. If obj is of class sdcMicroObj the strata variables are chosen per default.

aggr

aggregation level (default=3)

measure

aggregation statistic, mean, median, trim, onestep (default = mean)

Value

If ‘obj’ was of class sdcMicroObj-class the corresponding slots are filled, like manipNumVars, risk and utility. If ‘obj’ was of class “data.frame” or “matrix” an object of the same class is returned.

See also

Author

Alexander Kowarik

Examples

data(Tarragona)
m1 <- mafast(Tarragona, variables=c("GROSS.PROFIT","OPERATING.PROFIT","SALES"),aggr=3)
data(testdata)
m2 <- mafast(testdata,variables=c("expend","income","savings"),aggr=50,by="sex")
summary(m2)
#>      urbrur           roof           walls           water      
#>  Min.   :1.000   Min.   :2.000   Min.   :2.000   Min.   :1.000  
#>  1st Qu.:2.000   1st Qu.:4.000   1st Qu.:2.000   1st Qu.:3.000  
#>  Median :2.000   Median :4.000   Median :3.000   Median :4.000  
#>  Mean   :1.859   Mean   :3.681   Mean   :2.803   Mean   :3.456  
#>  3rd Qu.:2.000   3rd Qu.:4.000   3rd Qu.:3.000   3rd Qu.:4.000  
#>  Max.   :2.000   Max.   :9.000   Max.   :9.000   Max.   :9.000  
#>     electcon         relat           sex             age           hhcivil     
#>  Min.   :1.000   Min.   :1.00   Min.   :1.000   Min.   : 0.00   Min.   :1.000  
#>  1st Qu.:1.000   1st Qu.:2.00   1st Qu.:1.000   1st Qu.: 9.00   1st Qu.:1.000  
#>  Median :1.000   Median :3.00   Median :1.000   Median :19.00   Median :1.000  
#>  Mean   :2.002   Mean   :2.52   Mean   :1.499   Mean   :24.11   Mean   :1.509  
#>  3rd Qu.:4.000   3rd Qu.:3.00   3rd Qu.:2.000   3rd Qu.:36.00   3rd Qu.:2.000  
#>  Max.   :4.000   Max.   :9.00   Max.   :2.000   Max.   :95.00   Max.   :4.000  
#>      expend             income            savings           ori_hid      
#>  Min.   : 1117110   Min.   :  730220   Min.   : 104676   Min.   :   1.0  
#>  1st Qu.:25091280   1st Qu.:24260000   1st Qu.:2370492   1st Qu.: 241.0  
#>  Median :50335953   Median :49976000   Median :4999592   Median : 494.0  
#>  Mean   :50499785   Mean   :50115690   Mean   :4964039   Mean   : 493.4  
#>  3rd Qu.:75280845   3rd Qu.:74544000   3rd Qu.:7486752   3rd Qu.: 742.2  
#>  Max.   :98533212   Max.   :98208333   Max.   :9837175   Max.   :1000.0  
#>  sampling_weight household_weights
#>  Min.   :100     Min.   :  8.333  
#>  1st Qu.:100     1st Qu.: 14.286  
#>  Median :100     Median : 20.000  
#>  Mean   :100     Mean   : 21.834  
#>  3rd Qu.:100     3rd Qu.: 25.000  
#>  Max.   :100     Max.   :100.000  

## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- dRisk(sdc)
sdc@risk$numeric
#> [1] 1
sdc1 <- mafast(sdc,aggr=4)
sdc1@risk$numeric
#> [1] 0.483871

sdc2 <- mafast(sdc,aggr=10)
sdc2@risk$numeric
#> [1] 0.01075269
# \donttest{
### Performance tests
x <- testdata
for(i in 1:20){
  x <- rbind(x,testdata)
}
system.time({
  xx <- mafast(
    obj = x,
    variables = c("expend", "income", "savings"),
    aggr = 50,
    by = "sex"
  )
})
#>    user  system elapsed 
#>   0.378   0.000   0.238 
# }