Global recoding of variables
globalRecode(obj, ...)
a numeric vector, a data.frame
or an object of class
sdcMicroObj-class
see possible arguments below
which keyVar should be changed. Character vector of length 1 specifying the variable name that
should be recoded (required if obj
is a data.frame
or
an object of class sdcMicroObj-class
.
either a numeric vector of cut points or number giving the number of intervals which x is to be cut into.
labels for the levels of the resulting category. By default, labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor.
The following arguments are supported:
“equidistant:” for equal sized intervalls
“logEqui:” for equal sized intervalls for log-transformed data
“equalAmount:” for intervalls with approxiomately the same amount of observations
the modified sdcMicroObj-class
or a factor, unless labels = FALSE
which results in the mere integer level codes.
If a labels parameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed.
globalRecode
can not be applied to vectors stored as factors from sdcMicro >= 4.7.0!
Templ, M. and Kowarik, A. and Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4), 1–36, 2015. doi:10.18637/jss.v067.i04
Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4 doi:10.1007/978-3-319-50272-4
data(free1)
free1 <- as.data.frame(free1)
## application to a vector
head(globalRecode(free1$AGE, breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8))
#> [1] 5 3 5 3 3 3
#> Levels: 1 2 3 4 5 6 7 8
table(globalRecode(free1$AGE, breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8))
#>
#> 1 2 3 4 5 6 7 8
#> 0 312 815 968 717 455 511 222
## application to a data.frame
# automatic labels
table(globalRecode(free1, column="AGE", breaks=c(1,9,19,29,39,49,59,69,100))$AGE)
#>
#> (1,9] (9,19] (19,29] (29,39] (39,49] (49,59] (59,69] (69,100]
#> 0 312 815 968 717 455 511 222
## calculation of brea-points using different algorithms
table(globalRecode(free1$AGE, breaks=6))
#>
#> [14,24] (24,34] (34,44] (44,55] (55,65] (65,75]
#> 689 910 925 551 496 429
table(globalRecode(free1$AGE, breaks=6, method="logEqui"))
#>
#> [6,10] (10,18] (18,33] (33,61] (61,110] (110,201]
#> 0 251 1248 1870 631 0
table(globalRecode(free1$AGE, breaks=6, method="equalAmount"))
#>
#> [15,24] (24,32] (32,38] (38,47] (47,61] (61,74]
#> 689 725 590 697 668 631
## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- globalRecode(sdc, column="water", breaks=3)
table(get.sdcMicroObj(sdc, type="manipKeyVars")$water)
#>
#> [0,3] (3,7] (7,10]
#> 47 40 6