Function to perform a fast and simple (primitive) method of microaggregation. (for large datasets)
mafast(obj, variables = NULL, by = NULL, aggr = 3, measure = mean)
either a sdcMicroObj-class
-object or a data.frame
variables to microaggregate. If obj is of class sdcMicroObj the numerical key variables are chosen per default.
grouping variable for microaggregation. If obj is of class sdcMicroObj the strata variables are chosen per default.
aggregation level (default=3)
aggregation statistic, mean, median, trim, onestep (default = mean)
If ‘obj’ was of class sdcMicroObj-class
the corresponding
slots are filled, like manipNumVars, risk and utility. If ‘obj’ was
of class “data.frame” or “matrix” an object of the same class
is returned.
data(Tarragona)
m1 <- mafast(Tarragona, variables=c("GROSS.PROFIT","OPERATING.PROFIT","SALES"),aggr=3)
data(testdata)
m2 <- mafast(testdata,variables=c("expend","income","savings"),aggr=50,by="sex")
summary(m2)
#> urbrur roof walls water
#> Min. :1.000 Min. :2.000 Min. :2.000 Min. :1.000
#> 1st Qu.:2.000 1st Qu.:4.000 1st Qu.:2.000 1st Qu.:3.000
#> Median :2.000 Median :4.000 Median :3.000 Median :4.000
#> Mean :1.859 Mean :3.681 Mean :2.803 Mean :3.456
#> 3rd Qu.:2.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:4.000
#> Max. :2.000 Max. :9.000 Max. :9.000 Max. :9.000
#> electcon relat sex age hhcivil
#> Min. :1.000 Min. :1.00 Min. :1.000 Min. : 0.00 Min. :1.000
#> 1st Qu.:1.000 1st Qu.:2.00 1st Qu.:1.000 1st Qu.: 9.00 1st Qu.:1.000
#> Median :1.000 Median :3.00 Median :1.000 Median :19.00 Median :1.000
#> Mean :2.002 Mean :2.52 Mean :1.499 Mean :24.11 Mean :1.509
#> 3rd Qu.:4.000 3rd Qu.:3.00 3rd Qu.:2.000 3rd Qu.:36.00 3rd Qu.:2.000
#> Max. :4.000 Max. :9.00 Max. :2.000 Max. :95.00 Max. :4.000
#> expend income savings ori_hid
#> Min. : 1117110 Min. : 730220 Min. : 104676 Min. : 1.0
#> 1st Qu.:25091280 1st Qu.:24260000 1st Qu.:2370492 1st Qu.: 241.0
#> Median :50335953 Median :49976000 Median :4999592 Median : 494.0
#> Mean :50499785 Mean :50115690 Mean :4964039 Mean : 493.4
#> 3rd Qu.:75280845 3rd Qu.:74544000 3rd Qu.:7486752 3rd Qu.: 742.2
#> Max. :98533212 Max. :98208333 Max. :9837175 Max. :1000.0
#> sampling_weight household_weights
#> Min. :100 Min. : 8.333
#> 1st Qu.:100 1st Qu.: 14.286
#> Median :100 Median : 20.000
#> Mean :100 Mean : 21.834
#> 3rd Qu.:100 3rd Qu.: 25.000
#> Max. :100 Max. :100.000
## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- dRisk(sdc)
sdc@risk$numeric
#> [1] 1
sdc1 <- mafast(sdc,aggr=4)
sdc1@risk$numeric
#> [1] 0.483871
sdc2 <- mafast(sdc,aggr=10)
sdc2@risk$numeric
#> [1] 0.01075269
# \donttest{
### Performance tests
x <- testdata
for(i in 1:20){
x <- rbind(x,testdata)
}
system.time({
xx <- mafast(
obj = x,
variables = c("expend", "income", "savings"),
aggr = 50,
by = "sex"
)
})
#> user system elapsed
#> 0.390 0.000 0.249
# }