Algorithm to achieve k-anonymity by performing local suppression.
localSuppression(obj, k = 2, importance = NULL, combs = NULL, ...)
kAnon(obj, k = 2, importance = NULL, combs = NULL, ...)
a sdcMicroObj-class
object or a data.frame
Threshold for k-anonymity
Numeric vector of values between 1 and n (n = length(keyVars)
).
This vector defines the "importance" of variables for local suppression.
Variables with importance = 1
will, if possible, not be suppressed;
variables with importance = n
will be prioritized for suppression.
Numeric vector. If specified, the algorithm provides k-anonymity
for each combination of n key variables (with n being the value of the ith
element of this parameter). For example, combs = c(4,3)
means that k-anonymity
will be provided for all combinations of 4 and then 3 key variables.
It is possible to assign different k values for each combination by supplying k
as a vector.
If k
has only one value, it will be used for all subsets.
see additional arguments below:
keyVars
: Names or indices of categorical key variables (for data.frame method)
strataVars
: Name or index of the variable used for stratification.
k-anonymity is ensured within each category of this variable.
alpha
: Numeric value between 0 and 1 specifying how much keys with missing
values (NA
s) contribute to the calculation of fk
and Fk
.
Default is 1
. Used only in the data.frame
method.
nc
: Maximum number of cores used for stratified computations.
Default is 1
. Parallelization is ignored on Windows.
A modified dataset with suppressions that meets k-anonymity based on
the specified key variables, or the modified sdcMicroObj-class
object.
The algorithm provides a k-anonymized data set by suppressing values in key variables. The algorithm tries to find an optimal solution to suppress as few values as possible and considers the specified importance vector. If not specified, the importance vector is constructed in a way such that key variables with a high number of characteristics are considered less important than key variables with a low number of characteristics.
The implementation provides k-anonymity per strata, if slot strataVar
has
been set in sdcMicroObj-class
or if parameter strataVar
is
used when applying the data.frame
method. For details, see the examples provided.
For the parameter alpha
:
alpha = 1
counts all wildcard matches (i.e. NA
s match everything).
alpha = 0
assumes missing values form their own categories.
These are two extremes. With alpha = 0
, frequencies are likely underestimated when
NA
s are present. If combs
is used with alpha = 0
, the heuristic nature of kAnon()
may lead to technically correct, but not always intuitively understandable frequency evaluations.
Deprecated methods localSupp2
and localSupp2Wrapper
are no longer available
in sdcMicro
versions > 4.5.0.
kAnon()
is a more intuitive term for local suppression, since the goal is to achieve k-anonymity.
Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN: 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4
Templ, M., Kowarik, A., Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67(4), 1–36, 2015. doi:10.18637/jss.v067.i04
# \donttest{
data(francdat)
## Local Suppression
localS <- localSuppression(francdat, keyVar = c(4, 5, 6))
localS
#>
#> -----------------------
#> Total number of suppressions in the key variables: 4 (new: 4)
#>
#> Number of suppressions by key variables:
#> (in parenthesis, the total number suppressions is shown)
#>
#> Key2 Key3 Key4
#> 1 1 (1) 0 (0) 3 (3)
#>
#> 2-anonymity == TRUE
#> -----------------------
plot(localS)
## for objects of class sdcMicro, no stratification
data(testdata2)
kv <- c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex")
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
sdc <- localSuppression(sdc)
## for objects of class sdcMicro, with stratification
testdata2$ageG <- cut(testdata2$age, 5, labels = paste0("AG", 1:5))
sdc <- createSdcObj(
dat = testdata2,
keyVars = kv,
w = "sampling_weight",
strataVar = "ageG"
)
sdc <- localSuppression(sdc, nc = 1)
## it is also possible to provide k-anonymity for subsets of key-variables
## with different parameter k!
## in this case we want to provide 10-anonymity for all combinations
## of 5 key variables, 20-anonymity for all combinations with 4 key variables
## and 30-anonymity for all combinations of 3 key variables.
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
combs <- 5:3
k <- c(10, 20, 30)
sdc <- localSuppression(sdc, k = k, combs = combs)
## data.frame method (no stratification)
inp <- testdata2[, c(kv, "ageG")]
ls <- localSuppression(inp, keyVars = 1:7)
print(ls)
#>
#> -----------------------
#> Total number of suppressions in the key variables: 97 (new: 97)
#>
#> Number of suppressions by key variables:
#> (in parenthesis, the total number suppressions is shown)
#>
#> urbrur roof walls water electcon relat sex
#> 1 3 (3) 10 (10) 7 (7) 56 (56) 0 (0) 16 (16) 5 (5)
#>
#> 2-anonymity == TRUE
#> -----------------------
plot(ls)
## data.frame method (with stratification)
ls <- kAnon(inp, keyVars = 1:7, strataVars = 8)
print(ls)
#>
#> -----------------------
#> Total number of suppressions in the key variables: 109 (new: 109)
#>
#> Number of suppressions by key variables and strata:
#> (in parenthesis, the total number suppressions is shown)
#>
#> urbrur roof walls water electcon relat sex
#> AG5 1 (1) 2 (2) 1 (1) 1 (1) 1 (1) 1 (1) 1 (1)
#> AG3 2 (2) 3 (3) 3 (3) 7 (7) 0 (0) 2 (2) 0 (0)
#> AG1 0 (0) 10 (10) 5 (5) 24 (24) 0 (0) 3 (3) 5 (5)
#> AG2 2 (2) 5 (5) 3 (3) 13 (13) 2 (2) 2 (2) 1 (1)
#> AG4 1 (1) 1 (1) 2 (2) 3 (3) 1 (1) 1 (1) 0 (0)
#> Total 6 (6) 21 (21) 14 (14) 48 (48) 4 (4) 9 (9) 7 (7)
#>
#> 2-anonymity == TRUE in all strata!
#> -----------------------
plot(ls)
# }