R/ck_params_nums.R
ck_flexparams.Rd
ck_flexparams()
allows to define a flex function that is used to lookup perturbation
magnitudes (percentages) used when perturbing continuous variables.
ck_flexparams(fp, p = c(0.25, 0.05), epsilon = 1, q = 3)
(numeric scalar); at which point should the noise coefficient
function reaches its desired maximum (defined by the first element of p
)
a numeric vector of length 2
where both elements specify a percentage.
The first value refers to the desired maximum perturbation percentage for small
cells (depending on fp
) while the second element refers to the desired maximum
perturbation percentage for large cells. Both values must be between 0
and 1
and
need to be in descending order.
a numeric vector in descending order with all values >= 0
and <= 1
with the first
element forced to equal 1. The length of this vector must correspond with the number top_k
specified in ck_params_nums()
when creating parameters for type == "top_contr"
which is
checked at runtime. This setting allows to use different flex-functions for the largest top_k
contributors.
(numeric scalar); Parameter of the function; q
needs to be >= 1
an object suitable as input for ck_params_nums()
.
details about the flex function can be found in Deliverable D4.2, Part I in SGA "Open Source tools for perturbative confidentiality methods"
# \donttest{
x <- ck_create_testdata()
# create some 0/1 variables that should be perturbed later
x[, cnt_females := ifelse(sex == "male", 0, 1)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 46 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 38 25.00000 0
#> 4: 7960 870 1 37 25.00000 0
#> 5: 9030 20 2 36 16.66667 0
#> ---
#> 4576: 7900 278 1000 65 16.66667 1
#> 4577: 1420 987 1000 62 16.66667 0
#> 4578: 8900 684 1000 31 16.66667 0
#> 4579: 3880 294 1000 34 16.66667 1
#> 4580: 4830 911 1000 73 16.66667 0
x[, cnt_males := ifelse(sex == "male", 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 46 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 38 25.00000 0
#> 4: 7960 870 1 37 25.00000 0
#> 5: 9030 20 2 36 16.66667 0
#> ---
#> 4576: 7900 278 1000 65 16.66667 1
#> 4577: 1420 987 1000 62 16.66667 0
#> 4578: 8900 684 1000 31 16.66667 0
#> 4579: 3880 294 1000 34 16.66667 1
#> 4580: 4830 911 1000 73 16.66667 0
#> cnt_males
#> <num>
#> 1: 1
#> 2: 0
#> 3: 1
#> 4: 1
#> 5: 1
#> ---
#> 4576: 0
#> 4577: 1
#> 4578: 1
#> 4579: 0
#> 4580: 1
x[, cnt_highincome := ifelse(income >= 9000, 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 46 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 38 25.00000 0
#> 4: 7960 870 1 37 25.00000 0
#> 5: 9030 20 2 36 16.66667 0
#> ---
#> 4576: 7900 278 1000 65 16.66667 1
#> 4577: 1420 987 1000 62 16.66667 0
#> 4578: 8900 684 1000 31 16.66667 0
#> 4579: 3880 294 1000 34 16.66667 1
#> 4580: 4830 911 1000 73 16.66667 0
#> cnt_males cnt_highincome
#> <num> <num>
#> 1: 1 0
#> 2: 0 0
#> 3: 1 0
#> 4: 1 0
#> 5: 1 1
#> ---
#> 4576: 0 0
#> 4577: 1 0
#> 4578: 1 0
#> 4579: 0 0
#> 4580: 1 0
# a variable with positive and negative contributions
x[, mixed := sample(-10:10, nrow(x), replace = TRUE)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 46 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 38 25.00000 0
#> 4: 7960 870 1 37 25.00000 0
#> 5: 9030 20 2 36 16.66667 0
#> ---
#> 4576: 7900 278 1000 65 16.66667 1
#> 4577: 1420 987 1000 62 16.66667 0
#> 4578: 8900 684 1000 31 16.66667 0
#> 4579: 3880 294 1000 34 16.66667 1
#> 4580: 4830 911 1000 73 16.66667 0
#> cnt_males cnt_highincome mixed
#> <num> <num> <int>
#> 1: 1 0 -7
#> 2: 0 0 -9
#> 3: 1 0 2
#> 4: 1 0 -7
#> 5: 1 1 -3
#> ---
#> 4576: 0 0 -6
#> 4577: 1 0 -10
#> 4578: 1 0 9
#> 4579: 0 0 1
#> 4580: 1 0 0
# create record keys
x$rkey <- ck_generate_rkeys(dat = x)
# define required inputs
# hierarchy with some bogus codes
d_sex <- hier_create(root = "Total", nodes = c("male", "female"))
d_sex <- hier_add(d_sex, root = "female", "f")
d_sex <- hier_add(d_sex, root = "male", "m")
d_age <- hier_create(root = "Total", nodes = paste0("age_group", 1:6))
d_age <- hier_add(d_age, root = "age_group1", "ag1a")
d_age <- hier_add(d_age, root = "age_group2", "ag2a")
# define the cell key object
countvars <- c("cnt_females", "cnt_males", "cnt_highincome")
numvars <- c("expend", "income", "savings", "mixed")
tab <- ck_setup(
x = x,
rkey = "rkey",
dims = list(sex = d_sex, age = d_age),
w = "sampling_weight",
countvars = countvars,
numvars = numvars)
#> computing contributing indices | rawdata <--> table; this might take a while
# show some information about this table instance
tab$print() # identical with print(tab)
#> ── Table Information ───────────────────────────────────────────────────────────
#> ✔ 45 cells in 2 dimensions ('sex', 'age')
#> ✔ weights: yes
#> ── Tabulated / Perturbed countvars ─────────────────────────────────────────────
#> ☐ 'total'
#> ☐ 'cnt_females'
#> ☐ 'cnt_males'
#> ☐ 'cnt_highincome'
#> ── Tabulated / Perturbed numvars ───────────────────────────────────────────────
#> ☐ 'expend'
#> ☐ 'income'
#> ☐ 'savings'
#> ☐ 'mixed'
# information about the hierarchies
tab$hierarchy_info()
#> $sex
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: male 2 FALSE Total
#> 3: m 3 TRUE male
#> 4: female 2 FALSE Total
#> 5: f 3 TRUE female
#>
#> $age
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: age_group1 2 FALSE Total
#> 3: ag1a 3 TRUE age_group1
#> 4: age_group2 2 FALSE Total
#> 5: ag2a 3 TRUE age_group2
#> 6: age_group3 2 TRUE Total
#> 7: age_group4 2 TRUE Total
#> 8: age_group5 2 TRUE Total
#> 9: age_group6 2 TRUE Total
#>
# which variables have been defined?
tab$allvars()
#> $cntvars
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
#>
#> $numvars
#> [1] "expend" "income" "savings" "mixed"
#>
# count variables
tab$cntvars()
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
# continuous variables
tab$numvars()
#> [1] "expend" "income" "savings" "mixed"
# create perturbation parameters for "total" variable and
# write to yaml-file
# create a ptable using functionality from the ptable-pkg
f_yaml <- tempfile(fileext = ".yaml")
p_cnts1 <- ck_params_cnts(
ptab = ptable::pt_ex_cnts(),
path = f_yaml)
#> yaml configuration '/tmp/Rtmpu511Tl/file1fa54433c4bb.yaml' successfully written.
# read parameters from yaml-file and set them for variable `"total"`
p_cnts1 <- ck_read_yaml(path = f_yaml)
tab$params_cnts_set(val = p_cnts1, v = "total")
#> --> setting perturbation parameters for variable 'total'
# create alternative perturbation parameters by specifying parameters
para2 <- ptable::create_cnt_ptable(
D = 8, V = 3, js = 2, create = FALSE)
p_cnts2 <- ck_params_cnts(ptab = para2)
# use these ptable it for the remaining variables
tab$params_cnts_set(val = p_cnts2, v = countvars)
#> --> setting perturbation parameters for variable 'cnt_females'
#> --> setting perturbation parameters for variable 'cnt_males'
#> --> setting perturbation parameters for variable 'cnt_highincome'
# perturb a variable
tab$perturb(v = "total")
#> Count variable 'total' was perturbed.
# multiple variables can be perturbed as well
tab$perturb(v = c("cnt_males", "cnt_highincome"))
#> Count variable 'cnt_males' was perturbed.
#> Count variable 'cnt_highincome' was perturbed.
# return weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 272686 4581 272745.5384
#> 2: Total age_group1 total 1969 117288 1970 117347.5673
#> 3: Total ag1a total 1969 117288 1970 117347.5673
#> 4: Total age_group2 total 1143 68941 1142 68880.6842
#> 5: Total ag2a total 1143 68941 1142 68880.6842
#> 6: Total age_group3 total 864 51132 863 51072.8194
#> 7: Total age_group4 total 423 24549 421 24432.9291
#> 8: Total age_group5 total 168 10075 168 10075.0000
#> 9: Total age_group6 total 13 701 12 647.0769
#> 10: male Total total 2296 137868 2295 137807.9530
#> 11: m Total total 2296 137868 2295 137807.9530
#> 12: male age_group1 total 1015 60570 1015 60570.0000
#> 13: m age_group1 total 1015 60570 1015 60570.0000
#> 14: male ag1a total 1015 60570 1015 60570.0000
#> 15: m ag1a total 1015 60570 1015 60570.0000
#> 16: male age_group2 total 571 34704 571 34704.0000
#> 17: m age_group2 total 571 34704 571 34704.0000
#> 18: male ag2a total 571 34704 571 34704.0000
#> 19: m ag2a total 571 34704 571 34704.0000
#> 20: male age_group3 total 424 25995 424 25995.0000
#> 21: m age_group3 total 424 25995 424 25995.0000
#> 22: male age_group4 total 195 11353 197 11469.4410
#> 23: m age_group4 total 195 11353 197 11469.4410
#> 24: male age_group5 total 84 4788 82 4674.0000
#> 25: m age_group5 total 84 4788 82 4674.0000
#> 26: male age_group6 total 7 458 6 392.5714
#> 27: m age_group6 total 7 458 6 392.5714
#> 28: female Total total 2284 134818 2284 134818.0000
#> 29: f Total total 2284 134818 2284 134818.0000
#> 30: female age_group1 total 954 56718 953 56658.5472
#> 31: f age_group1 total 954 56718 953 56658.5472
#> 32: female ag1a total 954 56718 953 56658.5472
#> 33: f ag1a total 954 56718 953 56658.5472
#> 34: female age_group2 total 572 34237 573 34296.8549
#> 35: f age_group2 total 572 34237 573 34296.8549
#> 36: female ag2a total 572 34237 573 34296.8549
#> 37: f ag2a total 572 34237 573 34296.8549
#> 38: female age_group3 total 440 25137 441 25194.1295
#> 39: f age_group3 total 440 25137 441 25194.1295
#> 40: female age_group4 total 228 13196 227 13138.1228
#> 41: f age_group4 total 228 13196 227 13138.1228
#> 42: female age_group5 total 84 5287 84 5287.0000
#> 43: f age_group5 total 84 5287 84 5287.0000
#> 44: female age_group6 total 6 243 8 324.0000
#> 45: f age_group6 total 6 243 8 324.0000
#> 46: Total Total cnt_males 2296 137868 2295 137807.9530
#> 47: Total age_group1 cnt_males 1015 60570 1015 60570.0000
#> 48: Total ag1a cnt_males 1015 60570 1015 60570.0000
#> 49: Total age_group2 cnt_males 571 34704 570 34643.2224
#> 50: Total ag2a cnt_males 571 34704 570 34643.2224
#> 51: Total age_group3 cnt_males 424 25995 423 25933.6910
#> 52: Total age_group4 cnt_males 195 11353 198 11527.6615
#> 53: Total age_group5 cnt_males 84 4788 81 4617.0000
#> 54: Total age_group6 cnt_males 7 458 5 327.1429
#> 55: male Total cnt_males 2296 137868 2295 137807.9530
#> 56: m Total cnt_males 2296 137868 2295 137807.9530
#> 57: male age_group1 cnt_males 1015 60570 1015 60570.0000
#> 58: m age_group1 cnt_males 1015 60570 1015 60570.0000
#> 59: male ag1a cnt_males 1015 60570 1015 60570.0000
#> 60: m ag1a cnt_males 1015 60570 1015 60570.0000
#> 61: male age_group2 cnt_males 571 34704 570 34643.2224
#> 62: m age_group2 cnt_males 571 34704 570 34643.2224
#> 63: male ag2a cnt_males 571 34704 570 34643.2224
#> 64: m ag2a cnt_males 571 34704 570 34643.2224
#> 65: male age_group3 cnt_males 424 25995 423 25933.6910
#> 66: m age_group3 cnt_males 424 25995 423 25933.6910
#> 67: male age_group4 cnt_males 195 11353 198 11527.6615
#> 68: m age_group4 cnt_males 195 11353 198 11527.6615
#> 69: male age_group5 cnt_males 84 4788 81 4617.0000
#> 70: m age_group5 cnt_males 84 4788 81 4617.0000
#> 71: male age_group6 cnt_males 7 458 5 327.1429
#> 72: m age_group6 cnt_males 7 458 5 327.1429
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# numerical variables (positive variables using flex-function)
# we also write the config to a yaml file
f_yaml <- tempfile(fileext = ".yaml")
# create a ptable using functionality from the ptable-pkg
# a single ptable for all cells
ptab1 <- ptable::pt_ex_nums(parity = TRUE, separation = FALSE)
# a single ptab for all cells except for very small ones
ptab2 <- ptable::pt_ex_nums(parity = TRUE, separation = TRUE)
# different ptables for cells with even/odd number of contributors
# and very small cells
ptab3 <- ptable::pt_ex_nums(parity = FALSE, separation = TRUE)
p_nums1 <- ck_params_nums(
ptab = ptab1,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.30, 0.03),
epsilon = c(1, 0.5, 0.2),
q = 3),
mu_c = 2,
same_key = FALSE,
use_zero_rkeys = FALSE,
path = f_yaml)
#> yaml configuration '/tmp/Rtmpu511Tl/file1fa56c97848d.yaml' successfully written.
# we read the parameters from the yaml-file
p_nums1 <- ck_read_yaml(path = f_yaml)
# for variables with positive and negative values
p_nums2 <- ck_params_nums(
ptab = ptab2,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.15, 0.02),
epsilon = c(1, 0.4, 0.15),
q = 3),
mu_c = 2,
same_key = FALSE)
# simple perturbation parameters (not using the flex-function approach)
p_nums3 <- ck_params_nums(
ptab = ptab3,
type = "mean",
mult_params = ck_simpleparams(p = 0.25),
mu_c = 2,
same_key = FALSE)
# use `p_nums1` for all variables
tab$params_nums_set(p_nums1, c("savings", "income", "expend"))
#> --> setting perturbation parameters for variable 'savings'
#> --> setting perturbation parameters for variable 'income'
#> --> setting perturbation parameters for variable 'expend'
# use different parameters for variable `mixed`
tab$params_nums_set(p_nums2, v = "mixed")
#> --> setting perturbation parameters for variable 'mixed'
# identify sensitive cells to which extra protection (`mu_c`) is added.
tab$supp_p(v = "income", p = 85)
#> computing contributing indices | rawdata <--> table; this might take a while
#> p%-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_pq(v = "income", p = 85, q = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> pq-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_nk(v = "income", n = 2, k = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> nk-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_freq(v = "income", n = 14, weighted = FALSE)
#> freq-rule: 5 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_val(v = "income", n = 10000, weighted = TRUE)
#> val-rule: 0 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_cells(
v = "income",
inp = data.frame(
sex = c("female", "female"),
"age" = c("age_group1", "age_group3")
)
)
#> cell-rule: 2 new sensitive cells (incl. duplicates) found (total: 7)
# perturb variables
tab$perturb(v = c("income", "savings"))
#> Numeric variable 'income' was perturbed.
#> Numeric variable 'savings' was perturbed.
# extract results
tab$numtab("income", mean_before_sum = TRUE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1367927750 1367774737
#> 2: Total age_group1 income 9810547 586291459 586497067
#> 3: Total ag1a income 9810547 586291459 586497067
#> 4: Total age_group2 income 5692119 341887456 342013470
#> 5: Total ag2a income 5692119 341887456 342013470
#> 6: Total age_group3 income 4406946 260895581 261028817
#> 7: Total age_group4 income 2133543 123691715 123775978
#> 8: Total age_group5 income 848151 51976816 51972573
#> 9: Total age_group6 income 61672 3184723 3026105
#> 10: male Total income 11262049 677390855 677499342
#> 11: m Total income 11262049 677390855 677499342
#> 12: male age_group1 income 4877164 290785366 290749848
#> 13: m age_group1 income 4877164 290785366 290749848
#> 14: male ag1a income 4877164 290785366 290749848
#> 15: m ag1a income 4877164 290785366 290749848
#> 16: male age_group2 income 2811379 171754899 171787838
#> 17: m age_group2 income 2811379 171754899 171787838
#> 18: male ag2a income 2811379 171754899 171787838
#> 19: m ag2a income 2811379 171754899 171787838
#> 20: male age_group3 income 2168169 132170466 132182213
#> 21: m age_group3 income 2168169 132170466 132182213
#> 22: male age_group4 income 978510 57876940 57915815
#> 23: m age_group4 income 978510 57876940 57915815
#> 24: male age_group5 income 393134 22740934 22753298
#> 25: m age_group5 income 393134 22740934 22753298
#> 26: male age_group6 income 33693 2062250 2169036
#> 27: m age_group6 income 33693 2062250 2169036
#> 28: female Total income 11690929 690536895 690529591
#> 29: f Total income 11690929 690536895 690529591
#> 30: female age_group1 income 4933383 295506093 295561848
#> 31: f age_group1 income 4933383 295506093 295561848
#> 32: female ag1a income 4933383 295506093 295561848
#> 33: f ag1a income 4933383 295506093 295561848
#> 34: female age_group2 income 2880740 170132557 170012752
#> 35: f age_group2 income 2880740 170132557 170012752
#> 36: female ag2a income 2880740 170132557 170012752
#> 37: f ag2a income 2880740 170132557 170012752
#> 38: female age_group3 income 2238777 128725115 128680643
#> 39: f age_group3 income 2238777 128725115 128680643
#> 40: female age_group4 income 1155033 65814775 65778356
#> 41: f age_group4 income 1155033 65814775 65778356
#> 42: female age_group5 income 455017 29235882 29125162
#> 43: f age_group5 income 455017 29235882 29125162
#> 44: female age_group6 income 27979 1122473 1135584
#> 45: f age_group6 income 27979 1122473 1135584
#> sex age vname uws ws pws
tab$numtab("income", mean_before_sum = FALSE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1367927750 1367851241
#> 2: Total age_group1 income 9810547 586291459 586394254
#> 3: Total ag1a income 9810547 586291459 586394254
#> 4: Total age_group2 income 5692119 341887456 341950457
#> 5: Total ag2a income 5692119 341887456 341950457
#> 6: Total age_group3 income 4406946 260895581 260962191
#> 7: Total age_group4 income 2133543 123691715 123733840
#> 8: Total age_group5 income 848151 51976816 51974694
#> 9: Total age_group6 income 61672 3184723 3104401
#> 10: male Total income 11262049 677390855 677445096
#> 11: m Total income 11262049 677390855 677445096
#> 12: male age_group1 income 4877164 290785366 290767607
#> 13: m age_group1 income 4877164 290785366 290767607
#> 14: male ag1a income 4877164 290785366 290767607
#> 15: m ag1a income 4877164 290785366 290767607
#> 16: male age_group2 income 2811379 171754899 171771367
#> 17: m age_group2 income 2811379 171754899 171771367
#> 18: male ag2a income 2811379 171754899 171771367
#> 19: m ag2a income 2811379 171754899 171771367
#> 20: male age_group3 income 2168169 132170466 132176340
#> 21: m age_group3 income 2168169 132170466 132176340
#> 22: male age_group4 income 978510 57876940 57896374
#> 23: m age_group4 income 978510 57876940 57896374
#> 24: male age_group5 income 393134 22740934 22747115
#> 25: m age_group5 income 393134 22740934 22747115
#> 26: male age_group6 income 33693 2062250 2114969
#> 27: m age_group6 income 33693 2062250 2114969
#> 28: female Total income 11690929 690536895 690533243
#> 29: f Total income 11690929 690536895 690533243
#> 30: female age_group1 income 4933383 295506093 295533969
#> 31: f age_group1 income 4933383 295506093 295533969
#> 32: female ag1a income 4933383 295506093 295533969
#> 33: f ag1a income 4933383 295506093 295533969
#> 34: female age_group2 income 2880740 170132557 170072644
#> 35: f age_group2 income 2880740 170132557 170072644
#> 36: female ag2a income 2880740 170132557 170072644
#> 37: f ag2a income 2880740 170132557 170072644
#> 38: female age_group3 income 2238777 128725115 128702877
#> 39: f age_group3 income 2238777 128725115 128702877
#> 40: female age_group4 income 1155033 65814775 65796563
#> 41: f age_group4 income 1155033 65814775 65796563
#> 42: female age_group5 income 455017 29235882 29180469
#> 43: f age_group5 income 455017 29235882 29180469
#> 44: female age_group6 income 27979 1122473 1129009
#> 45: f age_group6 income 27979 1122473 1129009
#> sex age vname uws ws pws
tab$numtab("savings")
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total savings 2273532 136102860 136097984.5
#> 2: Total age_group1 savings 982386 59040646 59031680.8
#> 3: Total ag1a savings 982386 59040646 59031680.8
#> 4: Total age_group2 savings 552336 33278908 33281863.8
#> 5: Total ag2a savings 552336 33278908 33281863.8
#> 6: Total age_group3 savings 437101 26060353 26065012.5
#> 7: Total age_group4 savings 214661 12539034 12542412.2
#> 8: Total age_group5 savings 80451 4835524 4835218.1
#> 9: Total age_group6 savings 6597 348395 344777.2
#> 10: male Total savings 1159816 70215220 70217700.1
#> 11: m Total savings 1159816 70215220 70217700.1
#> 12: male age_group1 savings 517660 31254331 31250218.6
#> 13: m age_group1 savings 517660 31254331 31250218.6
#> 14: male ag1a savings 517660 31254331 31250218.6
#> 15: m ag1a savings 517660 31254331 31250218.6
#> 16: male age_group2 savings 280923 17085858 17089070.8
#> 17: m age_group2 savings 280923 17085858 17089070.8
#> 18: male ag2a savings 280923 17085858 17089070.8
#> 19: m ag2a savings 280923 17085858 17089070.8
#> 20: male age_group3 savings 214970 13300653 13299129.9
#> 21: m age_group3 savings 214970 13300653 13299129.9
#> 22: male age_group4 savings 99420 5826952 5828408.7
#> 23: m age_group4 savings 99420 5826952 5828408.7
#> 24: male age_group5 savings 43233 2511432 2511919.0
#> 25: m age_group5 savings 43233 2511432 2511919.0
#> 26: male age_group6 savings 3610 235994 236862.2
#> 27: m age_group6 savings 3610 235994 236862.2
#> 28: female Total savings 1113716 65887640 65885982.8
#> 29: f Total savings 1113716 65887640 65885982.8
#> 30: female age_group1 savings 464726 27786315 27785528.7
#> 31: f age_group1 savings 464726 27786315 27785528.7
#> 32: female ag1a savings 464726 27786315 27785528.7
#> 33: f ag1a savings 464726 27786315 27785528.7
#> 34: female age_group2 savings 271413 16193050 16189740.6
#> 35: f age_group2 savings 271413 16193050 16189740.6
#> 36: female ag2a savings 271413 16193050 16189740.6
#> 37: f ag2a savings 271413 16193050 16189740.6
#> 38: female age_group3 savings 222131 12759700 12758283.1
#> 39: f age_group3 savings 222131 12759700 12758283.1
#> 40: female age_group4 savings 115241 6712082 6712501.1
#> 41: f age_group4 savings 115241 6712082 6712501.1
#> 42: female age_group5 savings 37218 2324092 2318978.4
#> 43: f age_group5 savings 37218 2324092 2318978.4
#> 44: female age_group6 savings 2987 112401 112641.0
#> 45: f age_group6 savings 2987 112401 112641.0
#> sex age vname uws ws pws
# results can be resetted, too
tab$reset_cntvars(v = "cnt_males")
# we can then set other parameters and perturb again
tab$params_cnts_set(val = p_cnts1, v = "cnt_males")
#> --> setting perturbation parameters for variable 'cnt_males'
tab$perturb(v = "cnt_males")
#> Count variable 'cnt_males' was perturbed.
# write results to a .csv file
tab$freqtab(
v = c("total", "cnt_males"),
path = file.path(tempdir(), "outtab.csv")
)
#> File '/tmp/Rtmpu511Tl/outtab.csv' successfully written to disk.
#> NULL
# show results containing weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 272686 4581 272745.5384
#> 2: Total age_group1 total 1969 117288 1970 117347.5673
#> 3: Total ag1a total 1969 117288 1970 117347.5673
#> 4: Total age_group2 total 1143 68941 1142 68880.6842
#> 5: Total ag2a total 1143 68941 1142 68880.6842
#> 6: Total age_group3 total 864 51132 863 51072.8194
#> 7: Total age_group4 total 423 24549 421 24432.9291
#> 8: Total age_group5 total 168 10075 168 10075.0000
#> 9: Total age_group6 total 13 701 12 647.0769
#> 10: male Total total 2296 137868 2295 137807.9530
#> 11: m Total total 2296 137868 2295 137807.9530
#> 12: male age_group1 total 1015 60570 1015 60570.0000
#> 13: m age_group1 total 1015 60570 1015 60570.0000
#> 14: male ag1a total 1015 60570 1015 60570.0000
#> 15: m ag1a total 1015 60570 1015 60570.0000
#> 16: male age_group2 total 571 34704 571 34704.0000
#> 17: m age_group2 total 571 34704 571 34704.0000
#> 18: male ag2a total 571 34704 571 34704.0000
#> 19: m ag2a total 571 34704 571 34704.0000
#> 20: male age_group3 total 424 25995 424 25995.0000
#> 21: m age_group3 total 424 25995 424 25995.0000
#> 22: male age_group4 total 195 11353 197 11469.4410
#> 23: m age_group4 total 195 11353 197 11469.4410
#> 24: male age_group5 total 84 4788 82 4674.0000
#> 25: m age_group5 total 84 4788 82 4674.0000
#> 26: male age_group6 total 7 458 6 392.5714
#> 27: m age_group6 total 7 458 6 392.5714
#> 28: female Total total 2284 134818 2284 134818.0000
#> 29: f Total total 2284 134818 2284 134818.0000
#> 30: female age_group1 total 954 56718 953 56658.5472
#> 31: f age_group1 total 954 56718 953 56658.5472
#> 32: female ag1a total 954 56718 953 56658.5472
#> 33: f ag1a total 954 56718 953 56658.5472
#> 34: female age_group2 total 572 34237 573 34296.8549
#> 35: f age_group2 total 572 34237 573 34296.8549
#> 36: female ag2a total 572 34237 573 34296.8549
#> 37: f ag2a total 572 34237 573 34296.8549
#> 38: female age_group3 total 440 25137 441 25194.1295
#> 39: f age_group3 total 440 25137 441 25194.1295
#> 40: female age_group4 total 228 13196 227 13138.1228
#> 41: f age_group4 total 228 13196 227 13138.1228
#> 42: female age_group5 total 84 5287 84 5287.0000
#> 43: f age_group5 total 84 5287 84 5287.0000
#> 44: female age_group6 total 6 243 8 324.0000
#> 45: f age_group6 total 6 243 8 324.0000
#> 46: Total Total cnt_males 2296 137868 2295 137807.9530
#> 47: Total age_group1 cnt_males 1015 60570 1015 60570.0000
#> 48: Total ag1a cnt_males 1015 60570 1015 60570.0000
#> 49: Total age_group2 cnt_males 571 34704 571 34704.0000
#> 50: Total ag2a cnt_males 571 34704 571 34704.0000
#> 51: Total age_group3 cnt_males 424 25995 424 25995.0000
#> 52: Total age_group4 cnt_males 195 11353 197 11469.4410
#> 53: Total age_group5 cnt_males 84 4788 82 4674.0000
#> 54: Total age_group6 cnt_males 7 458 6 392.5714
#> 55: male Total cnt_males 2296 137868 2295 137807.9530
#> 56: m Total cnt_males 2296 137868 2295 137807.9530
#> 57: male age_group1 cnt_males 1015 60570 1015 60570.0000
#> 58: m age_group1 cnt_males 1015 60570 1015 60570.0000
#> 59: male ag1a cnt_males 1015 60570 1015 60570.0000
#> 60: m ag1a cnt_males 1015 60570 1015 60570.0000
#> 61: male age_group2 cnt_males 571 34704 571 34704.0000
#> 62: m age_group2 cnt_males 571 34704 571 34704.0000
#> 63: male ag2a cnt_males 571 34704 571 34704.0000
#> 64: m ag2a cnt_males 571 34704 571 34704.0000
#> 65: male age_group3 cnt_males 424 25995 424 25995.0000
#> 66: m age_group3 cnt_males 424 25995 424 25995.0000
#> 67: male age_group4 cnt_males 195 11353 197 11469.4410
#> 68: m age_group4 cnt_males 195 11353 197 11469.4410
#> 69: male age_group5 cnt_males 84 4788 82 4674.0000
#> 70: m age_group5 cnt_males 84 4788 82 4674.0000
#> 71: male age_group6 cnt_males 7 458 6 392.5714
#> 72: m age_group6 cnt_males 7 458 6 392.5714
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# utility measures for a count variable
tab$measures_cnts(v = "total", exclude_zeros = TRUE)
#> $overview
#> noise cnt pct
#> <fctr> <int> <num>
#> 1: -2 4 0.08888889
#> 2: -1 9 0.20000000
#> 3: 0 15 0.33333333
#> 4: 1 14 0.31111111
#> 5: 2 3 0.06666667
#>
#> $measures
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 1.000 0.000 0.011
#> 6: Mean 0.822 0.025 0.046
#> 7: Median 1.000 0.001 0.016
#> 8: Q60 1.000 0.001 0.019
#> 9: Q70 1.000 0.002 0.023
#> 10: Q80 1.000 0.006 0.053
#> 11: Q90 2.000 0.056 0.129
#> 12: Q95 2.000 0.143 0.196
#> 13: Q99 2.000 0.333 0.379
#> 14: Max 2.000 0.333 0.379
#>
#> $cumdistr_d1
#> cat cnt pct
#> <char> <int> <num>
#> 1: 0 15 0.3333333
#> 2: 1 38 0.8444444
#> 3: 2 45 1.0000000
#>
#> $cumdistr_d2
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 38 0.8444444
#> 2: (0.02,0.05] 40 0.8888889
#> 3: (0.05,0.1] 41 0.9111111
#> 4: (0.1,0.2] 43 0.9555556
#> 5: (0.2,0.3] 43 0.9555556
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $cumdistr_d3
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 27 0.6000000
#> 2: (0.02,0.05] 36 0.8000000
#> 3: (0.05,0.1] 38 0.8444444
#> 4: (0.1,0.2] 43 0.9555556
#> 5: (0.2,0.3] 43 0.9555556
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $false_zero
#> [1] 0
#>
#> $false_nonzero
#> [1] 0
#>
#> $exclude_zeros
#> [1] TRUE
#>
# modifications for perturbed count variables
tab$mod_cnts()
#> sex age row_nr pert ckey countvar
#> <char> <char> <num> <int> <num> <char>
#> 1: Total Total 16 1 0.7027340 total
#> 2: Total age_group1 16 1 0.6973654 total
#> 3: Total ag1a 16 1 0.6973654 total
#> 4: Total age_group2 14 -1 0.1872797 total
#> 5: Total ag2a 14 -1 0.1872797 total
#> ---
#> 131: f age_group4 -1 0 0.0000000 cnt_males
#> 132: female age_group5 -1 0 0.0000000 cnt_males
#> 133: f age_group5 -1 0 0.0000000 cnt_males
#> 134: female age_group6 -1 0 0.0000000 cnt_males
#> 135: f age_group6 -1 0 0.0000000 cnt_males
# display a summary about utility measures
tab$summary()
#> ┌──────────────────────────────────────────────┐
#> │Utility measures for perturbed count variables│
#> └──────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> countvar Min Q10 Q20 Q30 Q40 Mean Median Q60 Q70 Q80
#> <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: total -2 -1 -1.0 -1 0 -0.067 0 0 0 1
#> 2: cnt_highincome -4 -2 -1.0 0 0 0.289 0 1 1 2
#> 3: cnt_males -2 -1 -0.2 0 0 -0.133 0 0 0 0
#> Q90 Q95 Q99 Max
#> <num> <num> <num> <num>
#> 1: 1 2.0 2 2
#> 2: 2 2.0 5 5
#> 3: 0 1.6 2 2
#>
#> ── Distance-based measures ─────────────────────────────────────────────────────
#> ✔ Variable: 'total'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 1.000 0.000 0.011
#> 6: Mean 0.822 0.025 0.046
#> 7: Median 1.000 0.001 0.016
#> 8: Q60 1.000 0.001 0.019
#> 9: Q70 1.000 0.002 0.023
#> 10: Q80 1.000 0.006 0.053
#> 11: Q90 2.000 0.056 0.129
#> 12: Q95 2.000 0.143 0.196
#> 13: Q99 2.000 0.333 0.379
#> 14: Max 2.000 0.333 0.379
#>
#> ✔ Variable: 'cnt_males'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.667 0.020 0.043
#> 7: Median 0.000 0.000 0.000
#> 8: Q60 1.000 0.000 0.010
#> 9: Q70 1.000 0.010 0.071
#> 10: Q80 1.800 0.021 0.102
#> 11: Q90 2.000 0.071 0.144
#> 12: Q95 2.000 0.143 0.196
#> 13: Q99 2.000 0.143 0.196
#> 14: Max 2.000 0.143 0.196
#>
#> ✔ Variable: 'cnt_highincome'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 1.000 0.005 0.034
#> 4: Q30 1.000 0.009 0.066
#> 5: Q40 1.000 0.018 0.066
#> 6: Mean 1.475 0.038 0.101
#> 7: Median 1.000 0.020 0.088
#> 8: Q60 2.000 0.022 0.102
#> 9: Q70 2.000 0.025 0.112
#> 10: Q80 2.000 0.051 0.158
#> 11: Q90 2.100 0.143 0.183
#> 12: Q95 4.050 0.143 0.196
#> 13: Q99 5.000 0.230 0.430
#> 14: Max 5.000 0.286 0.579
#>
#> ┌──────────────────────────────────────────────────┐
#> │Utility measures for perturbed numerical variables│
#> └──────────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> vname Min Q10 Q20 Q30 Q40 Mean
#> <char> <num> <num> <num> <num> <num> <num>
#> 1: expend Inf NA NA NA NA NaN
#> 2: income -80322.053 -59913.190 -22237.695 -17759.356 -2733.890 5311.218
#> 3: savings -8965.189 -4570.259 -3716.747 -2978.984 -1459.402 -895.135
#> 4: mixed Inf NA NA NA NA NaN
#> Median Q60 Q70 Q80 Q90 Q95 Q99
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: NA NA NA NA NA NA NA
#> 2: 6181.025 16468.481 26187.802 44243.43 59497.280 65888.031 102794.835
#> 3: -786.321 311.611 791.997 2480.08 3212.818 3212.818 4095.742
#> 4: NA NA NA NA NA NA NA
#> Max
#> <num>
#> 1: -Inf
#> 2: 102794.835
#> 3: 4659.522
#> 4: -Inf
# }