Algorithm to achieve k-anonymity by performing local suppression.

localSuppression(obj, k = 2, importance = NULL, combs = NULL, ...)

kAnon(obj, k = 2, importance = NULL, combs = NULL, ...)

Arguments

obj

a sdcMicroObj-class-object or a data.frame

k

threshold for k-anonymity

importance

numeric vector of numbers between 1 and n (n=length of vector keyVars). This vector represents the "importance" of variables that should be used for local suppression in order to obtain k-anonymity. key-variables with importance=1 will - if possible - not suppressed, key-variables with importance=n will be used whenever possible.

combs

numeric vector. if specified, the algorithm will provide k-anonymity for each combination of n key variables (with n being the value of the ith element of this parameter. For example, if combs=c(4,3), the algorithm will provide k-anonymity to all combinations of 4 key variables and then k-anonymity to all combinations of 3 key variables. It is possible to apply different k to these subsets by specifying k as a vector. If k has only one element, the same value of k will be used for all subgroups.

...

see arguments below

keyVars:

names (or indices) of categorical key variables (for data-frame method)

strataVars:

name (or index) of variable which is used for stratification purposes, used in the data.frame method. This means that k-anonymity is provided within each category of the specified variable.

alpha:

numeric value between 0 and 1 specifying how much keys that contain missing values (NAs) should contribute to the calculation of fk and Fk. For the default value of 1, nothing changes with respect to the implementation in prior versions. Each wildcard-match would be counted while for alpha=0 keys with missing values would be basically ignored. Used in the data-frame method only because in the method for sdcMicroObj-class-objects, this value is extracted from slot options.

Value

Manipulated data set with suppressions that has k-anonymity with respect to specified key-variables or the manipulated data stored in the sdcMicroObj-class.

Details

The algorithm provides a k-anonymized data set by suppressing values in key variables. The algorithm tries to find an optimal solution to suppress as few values as possible and considers the specified importance vector. If not specified, the importance vector is constructed in a way such that key variables with a high number of characteristics are considered less important than key variables with a low number of characteristics.

The implementation provides k-anonymity per strata, if slot 'strataVar' has been set in sdcMicroObj-class or if parameter 'strataVar' is used when appying the data.frame method. For details, have a look at the examples provided.

Note

Deprecated methods 'localSupp2' and 'localSupp2Wrapper' are no longer available in sdcMicro > 4.5.0. kAnon is a more intutitive term for localSuppression because the aim is always to obtain k-anonymity for some parts of the data.

References

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4

Templ, M. and Kowarik, A. and Meindl, B. Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4), 1--36, 2015. doi:10.18637/jss.v067.i04

Author

Bernhard Meindl, Matthias Templ

Examples

# \donttest{
data(francdat)

## Local Suppression
localS <- localSuppression(francdat, keyVar=c(4,5,6))
localS
#> 
#> -----------------------
#> Total number of suppressions in the key variables: 4 (new: 4)
#> 
#> Number of suppressions by key variables:
#> (in parenthesis, the total number suppressions is shown)
#> 
#>    Key2  Key3  Key4
#> 1 1 (1) 0 (0) 3 (3)
#> 
#> 2-anonymity == TRUE
#> -----------------------
plot(localS)


## for objects of class sdcMicro, no stratification
data(testdata2)
kv <- c("urbrur", "roof", "walls", "water", "electcon", "relat", "sex")
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
sdc <- localSuppression(sdc)

## for objects of class sdcMicro, with stratification
testdata2$ageG <- cut(testdata2$age, 5, labels=paste0("AG",1:5))
sdc <- createSdcObj(
  dat = testdata2,
  keyVars = kv,
  w = "sampling_weight",
  strataVar = "ageG"
)
sdc <- localSuppression(sdc)

## it is also possible to provide k-anonymity for subsets of key-variables
## with different parameter k!
## in this case we want to provide 10-anonymity for all combinations
## of 5 key variables, 20-anonymity for all combinations with 4 key variables
## and 30-anonymity for all combinations of 3 key variables.
sdc <- createSdcObj(testdata2, keyVars = kv, w = "sampling_weight")
combs <- 5:3
k <- c(10, 20, 30)
sdc <- localSuppression(sdc, k = k, combs = combs)

## data.frame method (no stratification)
inp <- testdata2[,c(kv, "ageG")]
ls <- localSuppression(inp, keyVars = 1:7)
print(ls)
#> 
#> -----------------------
#> Total number of suppressions in the key variables: 97 (new: 97)
#> 
#> Number of suppressions by key variables:
#> (in parenthesis, the total number suppressions is shown)
#> 
#>   urbrur    roof walls   water electcon   relat   sex
#> 1  3 (3) 10 (10) 7 (7) 56 (56)    0 (0) 16 (16) 5 (5)
#> 
#> 2-anonymity == TRUE
#> -----------------------
plot(ls)


## data.frame method (with stratification)
ls <- kAnon(inp, keyVars = 1:7, strataVars = 8)
print(ls)
#> 
#> -----------------------
#> Total number of suppressions in the key variables: 109 (new: 109)
#> 
#> Number of suppressions by key variables and strata:
#> (in parenthesis, the total number suppressions is shown)
#> 
#>       urbrur    roof   walls   water electcon relat   sex
#> AG5    1 (1)   2 (2)   1 (1)   1 (1)    1 (1) 1 (1) 1 (1)
#> AG3    2 (2)   3 (3)   3 (3)   7 (7)    0 (0) 2 (2) 0 (0)
#> AG1    0 (0) 10 (10)   5 (5) 24 (24)    0 (0) 3 (3) 5 (5)
#> AG2    2 (2)   5 (5)   3 (3) 13 (13)    2 (2) 2 (2) 1 (1)
#> AG4    1 (1)   1 (1)   2 (2)   3 (3)    1 (1) 1 (1) 0 (0)
#> Total  6 (6) 21 (21) 14 (14) 48 (48)    4 (4) 9 (9) 7 (7)
#> 
#> 2-anonymity == TRUE in all strata!
#> -----------------------
plot(ls)

# }