R/ck_params_nums.R
ck_flexparams.Rdck_flexparams() allows to define a flex function that is used to lookup perturbation
magnitudes (percentages) used when perturbing continuous variables.
ck_flexparams(fp, p = c(0.25, 0.05), epsilon = 1, q = 3)(numeric scalar); at which point should the noise coefficient
function reaches its desired maximum (defined by the first element of p)
a numeric vector of length 2 where both elements specify a percentage.
The first value refers to the desired maximum perturbation percentage for small
cells (depending on fp) while the second element refers to the desired maximum
perturbation percentage for large cells. Both values must be between 0 and 1 and
need to be in descending order.
a numeric vector in descending order with all values >= 0 and <= 1 with the first
element forced to equal 1. The length of this vector must correspond with the number top_k
specified in ck_params_nums() when creating parameters for type == "top_contr" which is
checked at runtime. This setting allows to use different flex-functions for the largest top_k contributors.
(numeric scalar); Parameter of the function; q needs to be >= 1
an object suitable as input for ck_params_nums().
details about the flex function can be found in Deliverable D4.2, Part I in SGA "Open Source tools for perturbative confidentiality methods"
# \donttest{
x <- ck_create_testdata()
# create some 0/1 variables that should be perturbed later
x[, cnt_females := ifelse(sex == "male", 0, 1)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 60 25.00000 0
#> 2: 2530 28 1 66 25.00000 1
#> 3: 6920 550 1 30 25.00000 0
#> 4: 7960 870 1 98 25.00000 0
#> 5: 9030 20 2 75 16.66667 0
#> ---
#> 4576: 7900 278 1000 41 16.66667 1
#> 4577: 1420 987 1000 30 16.66667 0
#> 4578: 8900 684 1000 43 16.66667 0
#> 4579: 3880 294 1000 41 16.66667 1
#> 4580: 4830 911 1000 39 16.66667 0
x[, cnt_males := ifelse(sex == "male", 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 60 25.00000 0
#> 2: 2530 28 1 66 25.00000 1
#> 3: 6920 550 1 30 25.00000 0
#> 4: 7960 870 1 98 25.00000 0
#> 5: 9030 20 2 75 16.66667 0
#> ---
#> 4576: 7900 278 1000 41 16.66667 1
#> 4577: 1420 987 1000 30 16.66667 0
#> 4578: 8900 684 1000 43 16.66667 0
#> 4579: 3880 294 1000 41 16.66667 1
#> 4580: 4830 911 1000 39 16.66667 0
#> cnt_males
#> <num>
#> 1: 1
#> 2: 0
#> 3: 1
#> 4: 1
#> 5: 1
#> ---
#> 4576: 0
#> 4577: 1
#> 4578: 1
#> 4579: 0
#> 4580: 1
x[, cnt_highincome := ifelse(income >= 9000, 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 60 25.00000 0
#> 2: 2530 28 1 66 25.00000 1
#> 3: 6920 550 1 30 25.00000 0
#> 4: 7960 870 1 98 25.00000 0
#> 5: 9030 20 2 75 16.66667 0
#> ---
#> 4576: 7900 278 1000 41 16.66667 1
#> 4577: 1420 987 1000 30 16.66667 0
#> 4578: 8900 684 1000 43 16.66667 0
#> 4579: 3880 294 1000 41 16.66667 1
#> 4580: 4830 911 1000 39 16.66667 0
#> cnt_males cnt_highincome
#> <num> <num>
#> 1: 1 0
#> 2: 0 0
#> 3: 1 0
#> 4: 1 0
#> 5: 1 1
#> ---
#> 4576: 0 0
#> 4577: 1 0
#> 4578: 1 0
#> 4579: 0 0
#> 4580: 1 0
# a variable with positive and negative contributions
x[, mixed := sample(-10:10, nrow(x), replace = TRUE)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 60 25.00000 0
#> 2: 2530 28 1 66 25.00000 1
#> 3: 6920 550 1 30 25.00000 0
#> 4: 7960 870 1 98 25.00000 0
#> 5: 9030 20 2 75 16.66667 0
#> ---
#> 4576: 7900 278 1000 41 16.66667 1
#> 4577: 1420 987 1000 30 16.66667 0
#> 4578: 8900 684 1000 43 16.66667 0
#> 4579: 3880 294 1000 41 16.66667 1
#> 4580: 4830 911 1000 39 16.66667 0
#> cnt_males cnt_highincome mixed
#> <num> <num> <int>
#> 1: 1 0 0
#> 2: 0 0 0
#> 3: 1 0 -5
#> 4: 1 0 3
#> 5: 1 1 2
#> ---
#> 4576: 0 0 10
#> 4577: 1 0 -8
#> 4578: 1 0 -3
#> 4579: 0 0 -8
#> 4580: 1 0 -5
# create record keys
x$rkey <- ck_generate_rkeys(dat = x)
# define required inputs
# hierarchy with some bogus codes
d_sex <- hier_create(root = "Total", nodes = c("male", "female"))
d_sex <- hier_add(d_sex, root = "female", "f")
d_sex <- hier_add(d_sex, root = "male", "m")
d_age <- hier_create(root = "Total", nodes = paste0("age_group", 1:6))
d_age <- hier_add(d_age, root = "age_group1", "ag1a")
d_age <- hier_add(d_age, root = "age_group2", "ag2a")
# define the cell key object
countvars <- c("cnt_females", "cnt_males", "cnt_highincome")
numvars <- c("expend", "income", "savings", "mixed")
tab <- ck_setup(
x = x,
rkey = "rkey",
dims = list(sex = d_sex, age = d_age),
w = "sampling_weight",
countvars = countvars,
numvars = numvars)
#> computing contributing indices | rawdata <--> table; this might take a while
# show some information about this table instance
tab$print() # identical with print(tab)
#> ── Table Information ───────────────────────────────────────────────────────────
#> ✔ 45 cells in 2 dimensions ('sex', 'age')
#> ✔ weights: yes
#> ── Tabulated / Perturbed countvars ─────────────────────────────────────────────
#> ☐ 'total'
#> ☐ 'cnt_females'
#> ☐ 'cnt_males'
#> ☐ 'cnt_highincome'
#> ── Tabulated / Perturbed numvars ───────────────────────────────────────────────
#> ☐ 'expend'
#> ☐ 'income'
#> ☐ 'savings'
#> ☐ 'mixed'
# information about the hierarchies
tab$hierarchy_info()
#> $sex
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: male 2 FALSE Total
#> 3: m 3 TRUE male
#> 4: female 2 FALSE Total
#> 5: f 3 TRUE female
#>
#> $age
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: age_group1 2 FALSE Total
#> 3: ag1a 3 TRUE age_group1
#> 4: age_group2 2 FALSE Total
#> 5: ag2a 3 TRUE age_group2
#> 6: age_group3 2 TRUE Total
#> 7: age_group4 2 TRUE Total
#> 8: age_group5 2 TRUE Total
#> 9: age_group6 2 TRUE Total
#>
# which variables have been defined?
tab$allvars()
#> $cntvars
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
#>
#> $numvars
#> [1] "expend" "income" "savings" "mixed"
#>
# count variables
tab$cntvars()
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
# continuous variables
tab$numvars()
#> [1] "expend" "income" "savings" "mixed"
# create perturbation parameters for "total" variable and
# write to yaml-file
# create a ptable using functionality from the ptable-pkg
f_yaml <- tempfile(fileext = ".yaml")
p_cnts1 <- ck_params_cnts(
ptab = ptable::pt_ex_cnts(),
path = f_yaml)
#> yaml configuration '/tmp/RtmpHZpsuV/file1d2c73f7f41a.yaml' successfully written.
# read parameters from yaml-file and set them for variable `"total"`
p_cnts1 <- ck_read_yaml(path = f_yaml)
tab$params_cnts_set(val = p_cnts1, v = "total")
#> --> setting perturbation parameters for variable 'total'
# create alternative perturbation parameters by specifying parameters
para2 <- ptable::create_cnt_ptable(
D = 8, V = 3, js = 2, create = FALSE)
p_cnts2 <- ck_params_cnts(ptab = para2)
# use these ptable it for the remaining variables
tab$params_cnts_set(val = p_cnts2, v = countvars)
#> --> setting perturbation parameters for variable 'cnt_females'
#> --> setting perturbation parameters for variable 'cnt_males'
#> --> setting perturbation parameters for variable 'cnt_highincome'
# perturb a variable
tab$perturb(v = "total")
#> Count variable 'total' was perturbed.
# multiple variables can be perturbed as well
tab$perturb(v = c("cnt_males", "cnt_highincome"))
#> Count variable 'cnt_males' was perturbed.
#> Count variable 'cnt_highincome' was perturbed.
# return weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 275617 4580 275617.0000
#> 2: Total age_group1 total 1969 118674 1969 118674.0000
#> 3: Total ag1a total 1969 118674 1969 118674.0000
#> 4: Total age_group2 total 1143 68018 1144 68077.5083
#> 5: Total ag2a total 1143 68018 1144 68077.5083
#> 6: Total age_group3 total 864 52588 864 52588.0000
#> 7: Total age_group4 total 423 25354 424 25413.9385
#> 8: Total age_group5 total 168 10148 167 10087.5952
#> 9: Total age_group6 total 13 835 13 835.0000
#> 10: male Total total 2296 138577 2296 138577.0000
#> 11: m Total total 2296 138577 2296 138577.0000
#> 12: male age_group1 total 1015 60945 1015 60945.0000
#> 13: m age_group1 total 1015 60945 1015 60945.0000
#> 14: male ag1a total 1015 60945 1015 60945.0000
#> 15: m ag1a total 1015 60945 1015 60945.0000
#> 16: male age_group2 total 571 34245 570 34185.0263
#> 17: m age_group2 total 571 34245 570 34185.0263
#> 18: male ag2a total 571 34245 570 34185.0263
#> 19: m ag2a total 571 34245 570 34185.0263
#> 20: male age_group3 total 424 25845 425 25905.9552
#> 21: m age_group3 total 424 25845 425 25905.9552
#> 22: male age_group4 total 195 11996 196 12057.5179
#> 23: m age_group4 total 195 11996 196 12057.5179
#> 24: male age_group5 total 84 5080 84 5080.0000
#> 25: m age_group5 total 84 5080 84 5080.0000
#> 26: male age_group6 total 7 466 8 532.5714
#> 27: m age_group6 total 7 466 8 532.5714
#> 28: female Total total 2284 137040 2285 137100.0000
#> 29: f Total total 2284 137040 2285 137100.0000
#> 30: female age_group1 total 954 57729 953 57668.4874
#> 31: f age_group1 total 954 57729 953 57668.4874
#> 32: female ag1a total 954 57729 953 57668.4874
#> 33: f ag1a total 954 57729 953 57668.4874
#> 34: female age_group2 total 572 33773 572 33773.0000
#> 35: f age_group2 total 572 33773 572 33773.0000
#> 36: female ag2a total 572 33773 572 33773.0000
#> 37: f ag2a total 572 33773 572 33773.0000
#> 38: female age_group3 total 440 26743 441 26803.7795
#> 39: f age_group3 total 440 26743 441 26803.7795
#> 40: female age_group4 total 228 13358 230 13475.1754
#> 41: f age_group4 total 228 13358 230 13475.1754
#> 42: female age_group5 total 84 5068 84 5068.0000
#> 43: f age_group5 total 84 5068 84 5068.0000
#> 44: female age_group6 total 6 369 6 369.0000
#> 45: f age_group6 total 6 369 6 369.0000
#> 46: Total Total cnt_males 2296 138577 2297 138637.3558
#> 47: Total age_group1 cnt_males 1015 60945 1015 60945.0000
#> 48: Total ag1a cnt_males 1015 60945 1015 60945.0000
#> 49: Total age_group2 cnt_males 571 34245 570 34185.0263
#> 50: Total ag2a cnt_males 571 34245 570 34185.0263
#> 51: Total age_group3 cnt_males 424 25845 425 25905.9552
#> 52: Total age_group4 cnt_males 195 11996 197 12119.0359
#> 53: Total age_group5 cnt_males 84 5080 84 5080.0000
#> 54: Total age_group6 cnt_males 7 466 8 532.5714
#> 55: male Total cnt_males 2296 138577 2297 138637.3558
#> 56: m Total cnt_males 2296 138577 2297 138637.3558
#> 57: male age_group1 cnt_males 1015 60945 1015 60945.0000
#> 58: m age_group1 cnt_males 1015 60945 1015 60945.0000
#> 59: male ag1a cnt_males 1015 60945 1015 60945.0000
#> 60: m ag1a cnt_males 1015 60945 1015 60945.0000
#> 61: male age_group2 cnt_males 571 34245 570 34185.0263
#> 62: m age_group2 cnt_males 571 34245 570 34185.0263
#> 63: male ag2a cnt_males 571 34245 570 34185.0263
#> 64: m ag2a cnt_males 571 34245 570 34185.0263
#> 65: male age_group3 cnt_males 424 25845 425 25905.9552
#> 66: m age_group3 cnt_males 424 25845 425 25905.9552
#> 67: male age_group4 cnt_males 195 11996 197 12119.0359
#> 68: m age_group4 cnt_males 195 11996 197 12119.0359
#> 69: male age_group5 cnt_males 84 5080 84 5080.0000
#> 70: m age_group5 cnt_males 84 5080 84 5080.0000
#> 71: male age_group6 cnt_males 7 466 8 532.5714
#> 72: m age_group6 cnt_males 7 466 8 532.5714
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# numerical variables (positive variables using flex-function)
# we also write the config to a yaml file
f_yaml <- tempfile(fileext = ".yaml")
# create a ptable using functionality from the ptable-pkg
# a single ptable for all cells
ptab1 <- ptable::pt_ex_nums(parity = TRUE, separation = FALSE)
# a single ptab for all cells except for very small ones
ptab2 <- ptable::pt_ex_nums(parity = TRUE, separation = TRUE)
# different ptables for cells with even/odd number of contributors
# and very small cells
ptab3 <- ptable::pt_ex_nums(parity = FALSE, separation = TRUE)
p_nums1 <- ck_params_nums(
ptab = ptab1,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.30, 0.03),
epsilon = c(1, 0.5, 0.2),
q = 3),
mu_c = 2,
same_key = FALSE,
use_zero_rkeys = FALSE,
path = f_yaml)
#> yaml configuration '/tmp/RtmpHZpsuV/file1d2c5b91f12f.yaml' successfully written.
# we read the parameters from the yaml-file
p_nums1 <- ck_read_yaml(path = f_yaml)
# for variables with positive and negative values
p_nums2 <- ck_params_nums(
ptab = ptab2,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.15, 0.02),
epsilon = c(1, 0.4, 0.15),
q = 3),
mu_c = 2,
same_key = FALSE)
# simple perturbation parameters (not using the flex-function approach)
p_nums3 <- ck_params_nums(
ptab = ptab3,
type = "mean",
mult_params = ck_simpleparams(p = 0.25),
mu_c = 2,
same_key = FALSE)
# use `p_nums1` for all variables
tab$params_nums_set(p_nums1, c("savings", "income", "expend"))
#> --> setting perturbation parameters for variable 'savings'
#> --> setting perturbation parameters for variable 'income'
#> --> setting perturbation parameters for variable 'expend'
# use different parameters for variable `mixed`
tab$params_nums_set(p_nums2, v = "mixed")
#> --> setting perturbation parameters for variable 'mixed'
# identify sensitive cells to which extra protection (`mu_c`) is added.
tab$supp_p(v = "income", p = 85)
#> computing contributing indices | rawdata <--> table; this might take a while
#> p%-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_pq(v = "income", p = 85, q = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> pq-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_nk(v = "income", n = 2, k = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> nk-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_freq(v = "income", n = 14, weighted = FALSE)
#> freq-rule: 5 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_val(v = "income", n = 10000, weighted = TRUE)
#> val-rule: 0 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_cells(
v = "income",
inp = data.frame(
sex = c("female", "female"),
"age" = c("age_group1", "age_group3")
)
)
#> cell-rule: 2 new sensitive cells (incl. duplicates) found (total: 7)
# perturb variables
tab$perturb(v = c("income", "savings"))
#> Numeric variable 'income' was perturbed.
#> Numeric variable 'savings' was perturbed.
# extract results
tab$numtab("income", mean_before_sum = TRUE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1378728411 1378803432
#> 2: Total age_group1 income 9810547 588645646 588641987
#> 3: Total ag1a income 9810547 588645646 588641987
#> 4: Total age_group2 income 5692119 339113890 339004611
#> 5: Total ag2a income 5692119 339113890 339004611
#> 6: Total age_group3 income 4406946 268059460 268125525
#> 7: Total age_group4 income 2133543 128126535 128233857
#> 8: Total age_group5 income 848151 50544273 50558527
#> 9: Total age_group6 income 61672 4238607 4002653
#> 10: male Total income 11262049 682659125 682642380
#> 11: m Total income 11262049 682659125 682642380
#> 12: male age_group1 income 4877164 292653191 292688150
#> 13: m age_group1 income 4877164 292653191 292688150
#> 14: male ag1a income 4877164 292653191 292688150
#> 15: m ag1a income 4877164 292653191 292688150
#> 16: male age_group2 income 2811379 169879170 169710875
#> 17: m age_group2 income 2811379 169879170 169710875
#> 18: male ag2a income 2811379 169879170 169710875
#> 19: m ag2a income 2811379 169879170 169710875
#> 20: male age_group3 income 2168169 134579789 134535983
#> 21: m age_group3 income 2168169 134579789 134535983
#> 22: male age_group4 income 978510 60132299 60243529
#> 23: m age_group4 income 978510 60132299 60243529
#> 24: male age_group5 income 393134 22911025 22891974
#> 25: m age_group5 income 393134 22911025 22891974
#> 26: male age_group6 income 33693 2503651 2611269
#> 27: m age_group6 income 33693 2503651 2611269
#> 28: female Total income 11690929 696069286 696024828
#> 29: f Total income 11690929 696069286 696024828
#> 30: female age_group1 income 4933383 295992455 295984544
#> 31: f age_group1 income 4933383 295992455 295984544
#> 32: female ag1a income 4933383 295992455 295984544
#> 33: f ag1a income 4933383 295992455 295984544
#> 34: female age_group2 income 2880740 169234720 169277054
#> 35: f age_group2 income 2880740 169234720 169277054
#> 36: female ag2a income 2880740 169234720 169277054
#> 37: f ag2a income 2880740 169234720 169277054
#> 38: female age_group3 income 2238777 133479671 133524702
#> 39: f age_group3 income 2238777 133479671 133524702
#> 40: female age_group4 income 1155033 67994236 68013035
#> 41: f age_group4 income 1155033 67994236 68013035
#> 42: female age_group5 income 455017 27633248 27436426
#> 43: f age_group5 income 455017 27633248 27436426
#> 44: female age_group6 income 27979 1734956 1884549
#> 45: f age_group6 income 27979 1734956 1884549
#> sex age vname uws ws pws
tab$numtab("income", mean_before_sum = FALSE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1378728411 1378765921
#> 2: Total age_group1 income 9810547 588645646 588643817
#> 3: Total ag1a income 9810547 588645646 588643817
#> 4: Total age_group2 income 5692119 339113890 339059246
#> 5: Total ag2a income 5692119 339113890 339059246
#> 6: Total age_group3 income 4406946 268059460 268092490
#> 7: Total age_group4 income 2133543 128126535 128180185
#> 8: Total age_group5 income 848151 50544273 50551399
#> 9: Total age_group6 income 61672 4238607 4118941
#> 10: male Total income 11262049 682659125 682650752
#> 11: m Total income 11262049 682659125 682650752
#> 12: male age_group1 income 4877164 292653191 292670670
#> 13: m age_group1 income 4877164 292653191 292670670
#> 14: male ag1a income 4877164 292653191 292670670
#> 15: m ag1a income 4877164 292653191 292670670
#> 16: male age_group2 income 2811379 169879170 169795002
#> 17: m age_group2 income 2811379 169879170 169795002
#> 18: male ag2a income 2811379 169879170 169795002
#> 19: m ag2a income 2811379 169879170 169795002
#> 20: male age_group3 income 2168169 134579789 134557884
#> 21: m age_group3 income 2168169 134579789 134557884
#> 22: male age_group4 income 978510 60132299 60187888
#> 23: m age_group4 income 978510 60132299 60187888
#> 24: male age_group5 income 393134 22911025 22901497
#> 25: m age_group5 income 393134 22911025 22901497
#> 26: male age_group6 income 33693 2503651 2556894
#> 27: m age_group6 income 33693 2503651 2556894
#> 28: female Total income 11690929 696069286 696047056
#> 29: f Total income 11690929 696069286 696047056
#> 30: female age_group1 income 4933383 295992455 295988500
#> 31: f age_group1 income 4933383 295992455 295988500
#> 32: female ag1a income 4933383 295992455 295988500
#> 33: f ag1a income 4933383 295992455 295988500
#> 34: female age_group2 income 2880740 169234720 169255886
#> 35: f age_group2 income 2880740 169234720 169255886
#> 36: female ag2a income 2880740 169234720 169255886
#> 37: f ag2a income 2880740 169234720 169255886
#> 38: female age_group3 income 2238777 133479671 133502185
#> 39: f age_group3 income 2238777 133479671 133502185
#> 40: female age_group4 income 1155033 67994236 68003635
#> 41: f age_group4 income 1155033 67994236 68003635
#> 42: female age_group5 income 455017 27633248 27534661
#> 43: f age_group5 income 455017 27633248 27534661
#> 44: female age_group6 income 27979 1734956 1808206
#> 45: f age_group6 income 27979 1734956 1808206
#> sex age vname uws ws pws
tab$numtab("savings")
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total savings 2273532 137026795 137032535.3
#> 2: Total age_group1 savings 982386 59436797 59435344.1
#> 3: Total ag1a savings 982386 59436797 59435344.1
#> 4: Total age_group2 savings 552336 32886105 32875905.7
#> 5: Total ag2a savings 552336 32886105 32875905.7
#> 6: Total age_group3 savings 437101 26457789 26452807.1
#> 7: Total age_group4 savings 214661 13014851 13024613.8
#> 8: Total age_group5 savings 80451 4819415 4819584.3
#> 9: Total age_group6 savings 6597 411838 406425.0
#> 10: male Total savings 1159816 70055883 70056754.7
#> 11: m Total savings 1159816 70055883 70056754.7
#> 12: male age_group1 savings 517660 31197472 31200201.3
#> 13: m age_group1 savings 517660 31197472 31200201.3
#> 14: male ag1a savings 517660 31197472 31200201.3
#> 15: m ag1a savings 517660 31197472 31200201.3
#> 16: male age_group2 savings 280923 16723727 16719188.0
#> 17: m age_group2 savings 280923 16723727 16719188.0
#> 18: male ag2a savings 280923 16723727 16719188.0
#> 19: m ag2a savings 280923 16723727 16719188.0
#> 20: male age_group3 savings 214970 13109917 13108526.7
#> 21: m age_group3 savings 214970 13109917 13108526.7
#> 22: male age_group4 savings 99420 6192071 6202017.0
#> 23: m age_group4 savings 99420 6192071 6202017.0
#> 24: male age_group5 savings 43233 2619083 2618672.9
#> 25: m age_group5 savings 43233 2619083 2618672.9
#> 26: male age_group6 savings 3610 213613 213375.7
#> 27: m age_group6 savings 3610 213613 213375.7
#> 28: female Total savings 1113716 66970912 66962502.2
#> 29: f Total savings 1113716 66970912 66962502.2
#> 30: female age_group1 savings 464726 28239325 28241487.4
#> 31: f age_group1 savings 464726 28239325 28241487.4
#> 32: female ag1a savings 464726 28239325 28241487.4
#> 33: f ag1a savings 464726 28239325 28241487.4
#> 34: female age_group2 savings 271413 16162378 16165437.9
#> 35: f age_group2 savings 271413 16162378 16165437.9
#> 36: female ag2a savings 271413 16162378 16165437.9
#> 37: f ag2a savings 271413 16162378 16165437.9
#> 38: female age_group3 savings 222131 13347872 13350884.5
#> 39: f age_group3 savings 222131 13347872 13350884.5
#> 40: female age_group4 savings 115241 6822780 6824733.3
#> 41: f age_group4 savings 115241 6822780 6824733.3
#> 42: female age_group5 savings 37218 2200332 2190989.1
#> 43: f age_group5 savings 37218 2200332 2190989.1
#> 44: female age_group6 savings 2987 198225 200052.5
#> 45: f age_group6 savings 2987 198225 200052.5
#> sex age vname uws ws pws
# results can be resetted, too
tab$reset_cntvars(v = "cnt_males")
# we can then set other parameters and perturb again
tab$params_cnts_set(val = p_cnts1, v = "cnt_males")
#> --> setting perturbation parameters for variable 'cnt_males'
tab$perturb(v = "cnt_males")
#> Count variable 'cnt_males' was perturbed.
# write results to a .csv file
tab$freqtab(
v = c("total", "cnt_males"),
path = file.path(tempdir(), "outtab.csv")
)
#> File '/tmp/RtmpHZpsuV/outtab.csv' successfully written to disk.
#> NULL
# show results containing weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 275617 4580 275617.0000
#> 2: Total age_group1 total 1969 118674 1969 118674.0000
#> 3: Total ag1a total 1969 118674 1969 118674.0000
#> 4: Total age_group2 total 1143 68018 1144 68077.5083
#> 5: Total ag2a total 1143 68018 1144 68077.5083
#> 6: Total age_group3 total 864 52588 864 52588.0000
#> 7: Total age_group4 total 423 25354 424 25413.9385
#> 8: Total age_group5 total 168 10148 167 10087.5952
#> 9: Total age_group6 total 13 835 13 835.0000
#> 10: male Total total 2296 138577 2296 138577.0000
#> 11: m Total total 2296 138577 2296 138577.0000
#> 12: male age_group1 total 1015 60945 1015 60945.0000
#> 13: m age_group1 total 1015 60945 1015 60945.0000
#> 14: male ag1a total 1015 60945 1015 60945.0000
#> 15: m ag1a total 1015 60945 1015 60945.0000
#> 16: male age_group2 total 571 34245 570 34185.0263
#> 17: m age_group2 total 571 34245 570 34185.0263
#> 18: male ag2a total 571 34245 570 34185.0263
#> 19: m ag2a total 571 34245 570 34185.0263
#> 20: male age_group3 total 424 25845 425 25905.9552
#> 21: m age_group3 total 424 25845 425 25905.9552
#> 22: male age_group4 total 195 11996 196 12057.5179
#> 23: m age_group4 total 195 11996 196 12057.5179
#> 24: male age_group5 total 84 5080 84 5080.0000
#> 25: m age_group5 total 84 5080 84 5080.0000
#> 26: male age_group6 total 7 466 8 532.5714
#> 27: m age_group6 total 7 466 8 532.5714
#> 28: female Total total 2284 137040 2285 137100.0000
#> 29: f Total total 2284 137040 2285 137100.0000
#> 30: female age_group1 total 954 57729 953 57668.4874
#> 31: f age_group1 total 954 57729 953 57668.4874
#> 32: female ag1a total 954 57729 953 57668.4874
#> 33: f ag1a total 954 57729 953 57668.4874
#> 34: female age_group2 total 572 33773 572 33773.0000
#> 35: f age_group2 total 572 33773 572 33773.0000
#> 36: female ag2a total 572 33773 572 33773.0000
#> 37: f ag2a total 572 33773 572 33773.0000
#> 38: female age_group3 total 440 26743 441 26803.7795
#> 39: f age_group3 total 440 26743 441 26803.7795
#> 40: female age_group4 total 228 13358 230 13475.1754
#> 41: f age_group4 total 228 13358 230 13475.1754
#> 42: female age_group5 total 84 5068 84 5068.0000
#> 43: f age_group5 total 84 5068 84 5068.0000
#> 44: female age_group6 total 6 369 6 369.0000
#> 45: f age_group6 total 6 369 6 369.0000
#> 46: Total Total cnt_males 2296 138577 2296 138577.0000
#> 47: Total age_group1 cnt_males 1015 60945 1015 60945.0000
#> 48: Total ag1a cnt_males 1015 60945 1015 60945.0000
#> 49: Total age_group2 cnt_males 571 34245 570 34185.0263
#> 50: Total ag2a cnt_males 571 34245 570 34185.0263
#> 51: Total age_group3 cnt_males 424 25845 425 25905.9552
#> 52: Total age_group4 cnt_males 195 11996 196 12057.5179
#> 53: Total age_group5 cnt_males 84 5080 84 5080.0000
#> 54: Total age_group6 cnt_males 7 466 8 532.5714
#> 55: male Total cnt_males 2296 138577 2296 138577.0000
#> 56: m Total cnt_males 2296 138577 2296 138577.0000
#> 57: male age_group1 cnt_males 1015 60945 1015 60945.0000
#> 58: m age_group1 cnt_males 1015 60945 1015 60945.0000
#> 59: male ag1a cnt_males 1015 60945 1015 60945.0000
#> 60: m ag1a cnt_males 1015 60945 1015 60945.0000
#> 61: male age_group2 cnt_males 571 34245 570 34185.0263
#> 62: m age_group2 cnt_males 571 34245 570 34185.0263
#> 63: male ag2a cnt_males 571 34245 570 34185.0263
#> 64: m ag2a cnt_males 571 34245 570 34185.0263
#> 65: male age_group3 cnt_males 424 25845 425 25905.9552
#> 66: m age_group3 cnt_males 424 25845 425 25905.9552
#> 67: male age_group4 cnt_males 195 11996 196 12057.5179
#> 68: m age_group4 cnt_males 195 11996 196 12057.5179
#> 69: male age_group5 cnt_males 84 5080 84 5080.0000
#> 70: m age_group5 cnt_males 84 5080 84 5080.0000
#> 71: male age_group6 cnt_males 7 466 8 532.5714
#> 72: m age_group6 cnt_males 7 466 8 532.5714
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# utility measures for a count variable
tab$measures_cnts(v = "total", exclude_zeros = TRUE)
#> $overview
#> noise cnt pct
#> <fctr> <int> <num>
#> 1: -2 2 0.04444444
#> 2: -1 13 0.28888889
#> 3: 0 21 0.46666667
#> 4: 1 9 0.20000000
#>
#> $measures
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.578 0.008 0.021
#> 7: Median 1.000 0.000 0.010
#> 8: Q60 1.000 0.001 0.016
#> 9: Q70 1.000 0.002 0.021
#> 10: Q80 1.000 0.002 0.024
#> 11: Q90 1.000 0.006 0.037
#> 12: Q95 1.000 0.009 0.066
#> 13: Q99 2.000 0.143 0.183
#> 14: Max 2.000 0.143 0.183
#>
#> $cumdistr_d1
#> cat cnt pct
#> <char> <int> <num>
#> 1: 0 21 0.4666667
#> 2: 1 43 0.9555556
#> 3: 2 45 1.0000000
#>
#> $cumdistr_d2
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 43 0.9555556
#> 2: (0.02,0.05] 43 0.9555556
#> 3: (0.05,0.1] 43 0.9555556
#> 4: (0.1,0.2] 45 1.0000000
#> 5: (0.2,0.3] 45 1.0000000
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $cumdistr_d3
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 29 0.6444444
#> 2: (0.02,0.05] 41 0.9111111
#> 3: (0.05,0.1] 43 0.9555556
#> 4: (0.1,0.2] 45 1.0000000
#> 5: (0.2,0.3] 45 1.0000000
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $false_zero
#> [1] 0
#>
#> $false_nonzero
#> [1] 0
#>
#> $exclude_zeros
#> [1] TRUE
#>
# modifications for perturbed count variables
tab$mod_cnts()
#> sex age row_nr pert ckey countvar
#> <char> <char> <num> <int> <num> <char>
#> 1: Total Total 15 0 0.4894645 total
#> 2: Total age_group1 15 0 0.6373559 total
#> 3: Total ag1a 15 0 0.6373559 total
#> 4: Total age_group2 16 1 0.8031745 total
#> 5: Total ag2a 16 1 0.8031745 total
#> ---
#> 131: f age_group4 -1 0 0.0000000 cnt_males
#> 132: female age_group5 -1 0 0.0000000 cnt_males
#> 133: f age_group5 -1 0 0.0000000 cnt_males
#> 134: female age_group6 -1 0 0.0000000 cnt_males
#> 135: f age_group6 -1 0 0.0000000 cnt_males
# display a summary about utility measures
tab$summary()
#> ┌──────────────────────────────────────────────┐
#> │Utility measures for perturbed count variables│
#> └──────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> countvar Min Q10 Q20 Q30 Q40 Mean Median Q60 Q70 Q80
#> <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: total -1 -1.0 -0.2 0 0 0.178 0 0 1 1.0
#> 2: cnt_highincome -3 -1.6 -1.0 0 0 -0.044 0 0 0 1.0
#> 3: cnt_males -1 -1.0 0.0 0 0 0.067 0 0 0 0.2
#> Q90 Q95 Q99 Max
#> <num> <num> <num> <num>
#> 1: 1 1 2.00 2
#> 2: 2 2 2.56 3
#> 3: 1 1 1.00 1
#>
#> ── Distance-based measures ─────────────────────────────────────────────────────
#> ✔ Variable: 'total'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.578 0.008 0.021
#> 7: Median 1.000 0.000 0.010
#> 8: Q60 1.000 0.001 0.016
#> 9: Q70 1.000 0.002 0.021
#> 10: Q80 1.000 0.002 0.024
#> 11: Q90 1.000 0.006 0.037
#> 12: Q95 1.000 0.009 0.066
#> 13: Q99 2.000 0.143 0.183
#> 14: Max 2.000 0.143 0.183
#>
#> ✔ Variable: 'cnt_males'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.556 0.017 0.032
#> 7: Median 1.000 0.002 0.021
#> 8: Q60 1.000 0.002 0.021
#> 9: Q70 1.000 0.002 0.024
#> 10: Q80 1.000 0.005 0.033
#> 11: Q90 1.000 0.060 0.095
#> 12: Q95 1.000 0.143 0.183
#> 13: Q99 1.000 0.143 0.183
#> 14: Max 1.000 0.143 0.183
#>
#> ✔ Variable: 'cnt_highincome'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.0 0.000 0.000
#> 2: Q10 0.0 0.000 0.000
#> 3: Q20 0.0 0.000 0.000
#> 4: Q30 0.0 0.000 0.000
#> 5: Q40 1.0 0.005 0.034
#> 6: Mean 1.0 0.038 0.084
#> 7: Median 1.0 0.009 0.053
#> 8: Q60 1.0 0.011 0.066
#> 9: Q70 1.0 0.024 0.131
#> 10: Q80 2.0 0.052 0.154
#> 11: Q90 2.1 0.143 0.190
#> 12: Q95 3.0 0.150 0.266
#> 13: Q99 3.0 0.286 0.410
#> 14: Max 3.0 0.286 0.410
#>
#> ┌──────────────────────────────────────────────────┐
#> │Utility measures for perturbed numerical variables│
#> └──────────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> vname Min Q10 Q20 Q30 Q40 Mean
#> <char> <num> <num> <num> <num> <num> <num>
#> 1: expend Inf NA NA NA NA NaN
#> 2: income -119666.44 -84168.15 -28712.405 -9527.670 -3955.398 -4277.001
#> 3: savings -10199.26 -8409.78 -4538.963 -1440.378 -306.416 -194.113
#> 4: mixed Inf NA NA NA NA NaN
#> Median Q60 Q70 Q80 Q90 Q95 Q99
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: NA NA NA NA NA NA NA
#> 2: -1829.318 17479.098 21165.646 24616.965 53486.978 55589.280 73250.364
#> 3: 871.673 2036.928 2615.948 3012.478 3059.882 8958.283 9945.989
#> 4: NA NA NA NA NA NA NA
#> Max
#> <num>
#> 1: -Inf
#> 2: 73250.364
#> 3: 9945.989
#> 4: -Inf
# }