Allows to compute risky (unweighted) combinations of key variables either up to a specified dimension or using identification level. This mimics the approach taken in mu-argus.
riskyCells(obj, useIdentificationLevel = FALSE, threshold, ...)
a data.frame
, data.table
or an sdcMicroObj object
(logical) specifies if tabulation should be
done up to a specific dimension (useIdentificationLevel = FALSE
using
argument maxDim
) or taking identification levels
(useIdentificationLevel = FALSE
using argument level
) into account.
a numeric vector specifiying the thresholds at which cells
are considered to be unsafe. In case a tabulation is done up to a specific
level (useIdentificationLevel = FALSE
), the thresholds may be specified
differently for each dimension. In the other case, the same threshold is
used for all tables.
see possible arguments below
keyVars
: index or variable-names within obj
that should be used for
tabulation. In case obj
is a sdcMicroObj object, this argument is
not used and the pre-defined key-variables are used.
level
: in case useIdentificationLevel = TRUE
, this numeric vector
specifies the importance of the key variables. The construction of output
tables follows the implementation in mu-argus, see e.g
mu-argus.
The length of this numeric vector must match the number of key variables.
maxDim
: in case useIdentificationLevel = FALSE
, this number specifies
maximal number of variables to tablulate.
a data.table
showing the number of unsafe cells, thresholds for
any combination of the key variables. If the input was a sdcMicroObj
object and some modifications have been already applied to the categorical
key variables, the resulting output contains the number of unsafe cells
both for the original and the modified data.
## data.frame method / all combinations up to maxDim
# riskyCells(
# obj = testdata2,
# keyVars = 1:5,
# threshold = c(50, 25, 10, 5),
# useIdentificationLevel = FALSE,
# maxDim = 4
# )
#riskyCells(
# obj = testdata2,
# keyVars = 1:5,
# threshold = 10,
# useIdentificationLevel = FALSE,
# maxDim = 3
#)
#
### data.frame method / using identification levels
#riskyCells(
# obj = testdata2,
# keyVars = 1:6,
# threshold = 20,
# useIdentificationLevel = TRUE,
# level = c(1, 1, 2, 3, 3, 5)
#)
#riskyCells(
# obj = testdata2,
# keyVars = c(1, 3, 4, 6),
# threshold = 10,
# useIdentificationLevel = TRUE,
# level = c(1, 2, 2, 4)
#)
#
### sdcMicroObj-method / all combinations up to maxDim
#testdata2[1:6] <- lapply(1:6, function(x) {
# testdata2[[x]] <- as.factor(testdata2[[x]])
#})
#
#sdc <- createSdcObj(
# dat = testdata2,
# keyVars = c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex"),
# numVars = c("expend", "income", "savings"),
# w = "sampling_weight")
#
#r0 <- riskyCells(
# obj = sdc,
# useIdentificationLevel=FALSE,
# threshold = c(20, 10, 5),
# maxDim = 3
#)
#
### in case key-variables have been modified, we get counts for
### original and modified data
#sdc <- groupAndRename(
# obj = sdc,
# var = "roof",
# before = c("5", "6", "9"),
# after = "5+"
#)
#r1 <- riskyCells(
# obj = sdc,
# useIdentificationLevel = FALSE,
# threshold = c(10, 5, 3),
# maxDim = 3
#)
#
### sdcMicroObj-method / using identification levels
#riskyCells(
# obj = sdc,
# useIdentificationLevel = TRUE,
# threshold = 10,
# level = c(1, 1, 3, 4, 5, 5, 5)
#)