riskyCells — riskyCells • sdcMicro

Allows to compute risky (unweighted) combinations of key variables either up to a specified dimension or using identification level. This mimics the approach taken in mu-argus.

riskyCells(obj, useIdentificationLevel = FALSE, threshold, ...)

Arguments

obj

a data.frame, data.table or an sdcMicroObj object

useIdentificationLevel

(logical) specifies if tabulation should be done up to a specific dimension (useIdentificationLevel = FALSE using argument maxDim) or taking identification levels (useIdentificationLevel = FALSE using argument level) into account.

threshold

a numeric vector specifiying the thresholds at which cells are considered to be unsafe. In case a tabulation is done up to a specific level (useIdentificationLevel = FALSE), the thresholds may be specified differently for each dimension. In the other case, the same threshold is used for all tables.

...

see possible arguments below

keyVars: index or variable-names within obj that should be used for tabulation. In case obj is a sdcMicroObj object, this argument is not used and the pre-defined key-variables are used.
level: in case useIdentificationLevel = TRUE, this numeric vector specifies the importance of the key variables. The construction of output tables follows the implementation in mu-argus, see e.g mu-argus. The length of this numeric vector must match the number of key variables.
maxDim: in case useIdentificationLevel = FALSE, this number specifies maximal number of variables to tablulate.

Value

a data.table showing the number of unsafe cells, thresholds for any combination of the key variables. If the input was a sdcMicroObj object and some modifications have been already applied to the categorical key variables, the resulting output contains the number of unsafe cells both for the original and the modified data.

Author

Bernhard Meindl

Examples

## data.frame method / all combinations up to maxDim
# riskyCells(
#  obj = testdata2,
#  keyVars = 1:5,
#  threshold = c(50, 25, 10, 5),
#  useIdentificationLevel = FALSE,
#  maxDim = 4
# )
#riskyCells(
#  obj  = testdata2,
#  keyVars = 1:5,
#  threshold = 10,
#  useIdentificationLevel = FALSE,
#  maxDim = 3
#)
#
### data.frame method / using identification levels
#riskyCells(
#  obj = testdata2,
#  keyVars = 1:6,
#  threshold = 20,
#  useIdentificationLevel = TRUE,
#  level = c(1, 1, 2, 3, 3, 5)
#)
#riskyCells(
#  obj = testdata2,
#  keyVars = c(1, 3, 4, 6),
#  threshold = 10,
#  useIdentificationLevel = TRUE,
#  level = c(1, 2, 2, 4)
#)
#
### sdcMicroObj-method / all combinations up to maxDim
#testdata2[1:6] <- lapply(1:6, function(x) {
#  testdata2[[x]] <- as.factor(testdata2[[x]])
#})
#
#sdc <- createSdcObj(
#  dat = testdata2,
#  keyVars = c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex"),
#  numVars = c("expend", "income", "savings"),
#  w = "sampling_weight")
#
#r0 <- riskyCells(
#  obj = sdc,
#  useIdentificationLevel=FALSE,
# threshold = c(20, 10, 5),
# maxDim = 3
#)
#
### in case key-variables have been modified, we get counts for
### original and modified data
#sdc <- groupAndRename(
#  obj = sdc,
#  var = "roof",
#  before = c("5", "6", "9"),
#  after = "5+"
#)
#r1 <- riskyCells(
#  obj = sdc,
#  useIdentificationLevel = FALSE,
#  threshold = c(10, 5, 3),
#  maxDim = 3
#)
#
### sdcMicroObj-method / using identification levels
#riskyCells(
#  obj = sdc,
#  useIdentificationLevel = TRUE,
#  threshold = 10,
#  level = c(1, 1, 3, 4, 5, 5, 5)
#)