Global recoding of variables

globalRecode(obj, ...)

Arguments

obj

a numeric vector, a data.frame or an object of class sdcMicroObj-class

...

see possible arguments below

column:

which keyVar should be changed. Character vector of length 1 specifying the variable name that should be recoded (required if obj is a data.frame or an object of class sdcMicroObj-class.

breaks:

either a numeric vector of cut points or number giving the number of intervals which x is to be cut into.

labels:

labels for the levels of the resulting category. By default, labels are constructed using "(a,b]" interval notation. If labels = FALSE, simple integer codes are returned instead of a factor.

method:

The following arguments are supported:

  • “equidistant:” for equal sized intervalls

  • “logEqui:” for equal sized intervalls for log-transformed data

  • “equalAmount:” for intervalls with approxiomately the same amount of observations

Value

the modified sdcMicroObj-class or a factor, unless labels = FALSE which results in the mere integer level codes.

Details

If a labels parameter is specified, its values are used to name the factor levels. If none is specified, the factor level labels are constructed.

Note

globalRecode can not be applied to vectors stored as factors from sdcMicro >= 4.7.0!

References

Templ, M. and Kowarik, A. and Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4), 1–36, 2015. doi:10.18637/jss.v067.i04

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4 doi:10.1007/978-3-319-50272-4

See also

Author

Matthias Templ and Bernhard Meindl

Examples

data(free1)
free1 <- as.data.frame(free1)

## application to a vector
head(globalRecode(free1$AGE, breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8))
#> [1] 5 3 5 3 3 3
#> Levels: 1 2 3 4 5 6 7 8
table(globalRecode(free1$AGE, breaks=c(1,9,19,29,39,49,59,69,100), labels=1:8))
#> 
#>   1   2   3   4   5   6   7   8 
#>   0 312 815 968 717 455 511 222 

## application to a data.frame
# automatic labels
table(globalRecode(free1, column="AGE", breaks=c(1,9,19,29,39,49,59,69,100))$AGE)
#> 
#>    (1,9]   (9,19]  (19,29]  (29,39]  (39,49]  (49,59]  (59,69] (69,100] 
#>        0      312      815      968      717      455      511      222 

## calculation of brea-points using different algorithms
table(globalRecode(free1$AGE, breaks=6))
#> 
#> [14,24] (24,34] (34,44] (44,55] (55,65] (65,75] 
#>     689     910     925     551     496     429 
table(globalRecode(free1$AGE, breaks=6, method="logEqui"))
#> 
#>    [6,10]   (10,18]   (18,33]   (33,61]  (61,110] (110,201] 
#>         0       251      1248      1870       631         0 
table(globalRecode(free1$AGE, breaks=6, method="equalAmount"))
#> 
#> [15,24] (24,32] (32,38] (38,47] (47,61] (61,74] 
#>     689     725     590     697     668     631 

## for objects of class sdcMicro:
data(testdata2)
sdc <- createSdcObj(testdata2,
  keyVars=c('urbrur','roof','walls','water','electcon','relat','sex'),
  numVars=c('expend','income','savings'), w='sampling_weight')
sdc <- globalRecode(sdc, column="water", breaks=3)
table(get.sdcMicroObj(sdc, type="manipKeyVars")$water)
#> 
#>  [0,3]  (3,7] (7,10] 
#>     47     40      6