Computation and estimation of the sample and population frequency counts.

freqCalc(x, keyVars, w = NULL, alpha = 1)

Arguments

x

data frame or matrix

keyVars

key variables

w

column index of the weight variable. Should be set to NULL if one deal with a population.

alpha

numeric value between 0 and 1 specifying how much keys that contain missing values (NAs) should contribute to the calculation of fk and Fk. For the default value of 1, nothing changes with respect to the implementation in prior versions. Each wildcard-match would be counted while for alpha=0 keys with missing values would be basically ignored.

Value

Object from class freqCalc.

freqCalc

data set

keyVars

variables used for frequency calculation

w

index of weight vector. NULL if you do not have a sample.

alpha

value of parameter alpha

fk

the frequency of equal observations in the key variables subset sample given for each observation.

Fk

estimated frequency in the population

n1

number of observations with fk=1

n2

number of observations with fk=2

Details

The function considers the case of missing values in the data. A missing value stands for any of the possible categories of the variable considered. It is possible to apply this function to large data sets with many (catergorical) key variables, since the computation is done in C.

freqCalc() does not support sdcMicro S4 class objects.

References

look e.g. in https://research.cbs.nl/casc/deliv/12d1.pdf Templ, M. Statistical Disclosure Control for Microdata Using the R-Package sdcMicro, Transactions on Data Privacy, vol. 1, number 2, pp. 67-85, 2008. https://www.tdp.cat/issues/abs.a004a08.php

Templ, M. New Developments in Statistical Disclosure Control and Imputation: Robust Statistics Applied to Official Statistics, Suedwestdeutscher Verlag fuer Hochschulschriften, 2009, ISBN: 3838108280, 264 pages.

Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4 doi:10.1007/978-3-319-50272-4

Templ, M. and Meindl, B.: Practical Applications in Statistical Disclosure Control Using R, Privacy and Anonymity in Information Management Systems New Techniques for New Practical Problems, Springer, 31-62, 2010, ISBN: 978-1-84996-237-7.

Author

Bernhard Meindl

Examples


data(francdat)
# \donttest{
f <- freqCalc(francdat, keyVars=c(2,4,5,6),w=8)
f
#> 
#>  --------------------------
#> 4 obs. violate 2-anonymity 
#> 8 obs. violate 3-anonymity 
#>  --------------------------
f$freqCalc
#>   Num1 Key1 Num2 Key2 Key3 Key4 Num3     w
#> 1 0.30    1 0.40    2    5    1    4  18.0
#> 2 0.12    1 0.22    2    1    1   22  45.5
#> 3 0.18    1 0.80    2    1    1    8  39.0
#> 4 1.90    3 9.00    3    1    5   91  17.0
#> 5 1.00    4 1.30    3    1    4   13 541.0
#> 6 1.00    4 1.40    3    1    1   14   8.0
#> 7 0.10    6 0.01    2    1    5    1   5.0
#> 8 0.15    1 0.50    2    5    1    5  92.0
f$fk
#> [1] 2 2 2 1 1 1 1 2
f$Fk
#> [1] 110.0  84.5  84.5  17.0 541.0   8.0   5.0 110.0
## with missings:
x <- francdat
x[3,5] <- NA
x[4,2] <- x[4,4] <- NA
x[5,6]  <- NA
x[6,2]  <- NA
f2 <- freqCalc(x, keyVars=c(2,4,5,6),w=8)
cbind(f2$fk, f2$Fk)
#>      [,1]  [,2]
#> [1,]    3 149.0
#> [2,]    2  84.5
#> [3,]    4 194.5
#> [4,]    3 563.0
#> [5,]    3 566.0
#> [6,]    2 549.0
#> [7,]    2  22.0
#> [8,]    3 149.0

## test parameter 'alpha'
f3a <- freqCalc(x, keyVars=c(2,4,5,6), w=8, alpha=1)
f3b <- freqCalc(x, keyVars=c(2,4,5,6), w=8, alpha=0.5)
f3c <- freqCalc(x, keyVars=c(2,4,5,6), w=8, alpha=0.1)
data.frame(fka=f3a$fk, fkb=f3b$fk, fkc=f3c$fk)
#>   fka fkb fkc
#> 1   3 2.5 2.1
#> 2   2 1.5 1.1
#> 3   4 4.0 4.0
#> 4   3 2.5 2.1
#> 5   3 2.0 1.2
#> 6   2 1.5 1.1
#> 7   2 1.5 1.1
#> 8   3 2.5 2.1
data.frame(Fka=f3a$Fk, Fkb=f3b$Fk, Fkc=f3c$Fk)
#>     Fka   Fkb   Fkc
#> 1 149.0 129.5 113.9
#> 2  84.5  65.0  49.4
#> 3 194.5 194.5 194.5
#> 4 563.0 292.5  76.1
#> 5 566.0 553.5 543.5
#> 6 549.0 278.5  62.1
#> 7  22.0  13.5   6.7
#> 8 149.0 129.5 113.9
# }