This class allows to define statistical tables and perturb both count and numerical variables.
ck_setup(x, rkey, dims, w = NULL, countvars = NULL, numvars = NULL)
an object coercible to a data.frame
either a column name within x
referring to a variable containing record keys
or a single integer(ish) number > 5
that referns to the number of digits for record keys that
will be generated internally.
a list containing slots for each variable that should be
tabulated. Each slot consists should be created/modified using sdcHierarchies::hier_create()
,
sdcHierarchies::hier_add()
and other functionality from package sdcHierarchies
.
(character) a scalar character referring to a variable in x
holding sampling
weights. If w
is NULL
(the default), all weights are assumed to be 1
(character) an optional vector containing names of binary (0/1 coded)
variables withing x
that should be included in the problem instance.
These variables can later be perturbed.
(character) an optional vector of numerical variables that can later be tabulated.
A new cellkey_obj
object. Such objects (internally) contain the fully computed
statistical tables given input microdata (x
), the hierarchical definitionals (dims
) as
well as the remaining inputs. Intermediate results are stored internally and can only be
modified / accessed via the exported public methods described below.
Such objects are typically generated using ck_setup()
.
new()
Create a new table instance
ck_class$new(x, rkey, dims, w = NULL, countvars = NULL, numvars = NULL)
x
an object coercible to a data.frame
rkey
either a column name within x
referring to a variable containing record keys
or a single integer(ish) number > 5
that referns to the number of digits for record keys that
will be generated internally.
dims
a list containing slots for each variable that should be
tabulated. Each slot consists should be created/modified using sdcHierarchies::hier_create()
,
sdcHierarchies::hier_add()
and other functionality from package sdcHierarchies
.
w
(character) a scalar character referring to a variable in x
holding sampling
weights. If w
is NULL
(the default), all weights are assumed to be 1
countvars
(character) an optional vector containing names of binary (0/1 coded)
variables withing x
that should be included in the problem instance.
These variables can later be perturbed.
numvars
(character) an optional vector of numerical variables that can later be tabulated.
A new cellkey_obj
object. Such objects (internally) contain the fully computed
statistical tables given input microdata (x
), the hierarchical definitionals (dims
) as
well as the remaining inputs. Intermediate results are stored internally and can only be
modified / accessed via the exported public methods described below.
freqtab()
Extract results from already perturbed count variables as a
data.table
v
a vector of variable names for count variables. If NULL
(the default), the results are returned for all available count
variables. For variables that have not yet perturbed, columns
puwc
and pwc
are filled with NA
.
path
if not NULL
, a scalar character defining a (relative
or absolute) path to which the result table should be written. A csv
file will be generated and, if specified, path
must have
".csv" as file-ending
This method returns a data.table
containing all combinations of the dimensional variables in
the first n columns. Additionally, the following columns are shown:
vname
: name of the perturbed variable
uwc
: unweighted counts
wc
: weighted counts
puwc
: perturbed unweighted counts or NA
if vname
was not yet perturbed
pwc
: perturbed weighted counts or NA
if vname
was not yet perturbed
numtab()
Extract results from already perturbed continuous variables
as a data.table
.
v
a vector of variable names of continuous variables. If NULL
(the default), the results are returned for all available numeric variables.
mean_before_sum
(logical); if TRUE
, the perturbed values are adjusted
by a factor ((n+p))⁄n
with
n
: the original weighted cell value
p
: the perturbed cell value
This makes sense if the the accuracy of the variable mean is considered to be
more important than accuracy of sums of the variable. The default value is
FALSE
(no adjustment is done)
path
if not NULL
, a scalar character defining a (relative or absolute)
path to which the result table should be written. A csv
file will be generated
and, if specified, path
must have ".csv" as file-ending
This method returns a data.table
containing all combinations of the
dimensional variables in the first n columns. Additionally, the following
columns are shown:
vname
: name of the perturbed variable
uws
: unweighted sum of the given variable
ws
: weighted cellsum
pws
: perturbed weighted sum of the given cell or NA
if vname
has not not perturbed
measures_cnts()
Utility measures for perturbed count variables
v
name of a count variable for which utility measures should be computed.
exclude_zeros
should empty (zero) cells in the original values be excluded when computing distance measures
This method returns a list
containing a set of utility
measures based on some distance functions. For a detailed description
of the computed measures, see ck_cnt_measures()
hierarchy_info()
Information about hierarchies
a list
(for each dimensional variable) with
information on the hierarchies. This may be used to restrict output tables to
specific levels or codes. Each list element is a data.table
containing
the following variables:
code
: the name of a code within the hierarchy
level
: number defining the level of the code; the higher the number,
the lower the hierarchy with 1
being the overall total
is_leaf
: if TRUE
, this code is a leaf node which means no other codes
contribute to it
parent
: name of the parent code
supp_freq()
Identify sensitive cells based on minimum frequency rule
v
a single variable name of a continuous variable (see method numvars()
)
n
a number defining the threshold. All cells <= n
are considered as unsafe.
weighted
if TRUE
, the weighted number of contributors to a cell are compared to
the threshold specified in n
(default); else the unweighted number of contributors is used.
supp_val()
Identify sensitive cells based on weighted or unweighted cell value
v
a single variable name of a continuous variable (see method numvars()
)
n
a number defining the threshold. All cells <= n
are considered as unsafe.
weighted
if TRUE
, the weighted cell value of variable v
is compared to
the threshold specified in n
(default); else the unweighted number is used.
supp_cells()
Identify sensitive cells based on their names
v
a single variable name of a continuous variable (see method numvars()
)
inp
a data.frame
where each colum represents a dimensional variable. Each row of
this input is then used to compute the relevant cells to be identified as sensitive where
NA
-values are possible and used to match any characteristics of the dimensional variable.
supp_p()
Identify sensitive cells based on the p%-rule rule. Please note that this rule can only be applied to positive-only variables.
supp_pq()
Identify sensitive cells based on the pq-rule. Please note that this rule can only be applied to positive-only variables.
supp_nk()
Identify sensitive cells based on the nk-dominance rule. Please note that this rule can only be applied to positive-only variables.
params_cnts_set()
Set perturbation parameters for count variables
val
a perturbation object created with ck_params_cnts()
v
a character vector (or NULL
). If NULL
(the default),
the perturbation parameters provided in val
are set for all
count variables; otherwise one may specify the names of
the count variables for which the parameters should be set.
reset_cntvars()
reset results and parameters for already perturbed count variables
reset_numvars()
reset results and parameters for already perturbed numerical variables
params_nums_set()
set perturbation parameters for continuous variables.
val
a perturbation object created with ck_params_nums()
v
a character vector (or NULL
); if NULL
(the default), the
perturbation parameters provided in val
are set for all continuous
variables; otherwise one may specify the names of the numeric variables for
which the parameters should be set.
summary()
some aggregated summary statistics about perturbed variables
print()
prints information about the current table
# \donttest{
x <- ck_create_testdata()
# create some 0/1 variables that should be perturbed later
x[, cnt_females := ifelse(sex == "male", 0, 1)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 95 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 66 25.00000 0
#> 4: 7960 870 1 50 25.00000 0
#> 5: 9030 20 2 87 16.66667 0
#> ---
#> 4576: 7900 278 1000 92 16.66667 1
#> 4577: 1420 987 1000 40 16.66667 0
#> 4578: 8900 684 1000 39 16.66667 0
#> 4579: 3880 294 1000 88 16.66667 1
#> 4580: 4830 911 1000 46 16.66667 0
x[, cnt_males := ifelse(sex == "male", 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 95 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 66 25.00000 0
#> 4: 7960 870 1 50 25.00000 0
#> 5: 9030 20 2 87 16.66667 0
#> ---
#> 4576: 7900 278 1000 92 16.66667 1
#> 4577: 1420 987 1000 40 16.66667 0
#> 4578: 8900 684 1000 39 16.66667 0
#> 4579: 3880 294 1000 88 16.66667 1
#> 4580: 4830 911 1000 46 16.66667 0
#> cnt_males
#> <num>
#> 1: 1
#> 2: 0
#> 3: 1
#> 4: 1
#> 5: 1
#> ---
#> 4576: 0
#> 4577: 1
#> 4578: 1
#> 4579: 0
#> 4580: 1
x[, cnt_highincome := ifelse(income >= 9000, 1, 0)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 95 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 66 25.00000 0
#> 4: 7960 870 1 50 25.00000 0
#> 5: 9030 20 2 87 16.66667 0
#> ---
#> 4576: 7900 278 1000 92 16.66667 1
#> 4577: 1420 987 1000 40 16.66667 0
#> 4578: 8900 684 1000 39 16.66667 0
#> 4579: 3880 294 1000 88 16.66667 1
#> 4580: 4830 911 1000 46 16.66667 0
#> cnt_males cnt_highincome
#> <num> <num>
#> 1: 1 0
#> 2: 0 0
#> 3: 1 0
#> 4: 1 0
#> 5: 1 1
#> ---
#> 4576: 0 0
#> 4577: 1 0
#> 4578: 1 0
#> 4579: 0 0
#> 4580: 1 0
# a variable with positive and negative contributions
x[, mixed := sample(-10:10, nrow(x), replace = TRUE)]
#> urbrur roof walls water electcon relat sex age hhcivil expend
#> <int> <int> <int> <int> <int> <int> <fctr> <fctr> <int> <num>
#> 1: 2 4 3 3 1 1 male age_group3 2 9093
#> 2: 2 4 3 3 1 2 female age_group3 2 2734
#> 3: 2 4 3 3 1 3 male age_group1 1 2652
#> 4: 2 4 3 3 1 3 male age_group1 1 1807
#> 5: 2 4 2 3 1 1 male age_group4 2 671
#> ---
#> 4576: 2 4 3 4 1 2 female age_group3 2 3696
#> 4577: 2 4 3 4 1 3 male age_group1 1 282
#> 4578: 2 4 3 4 1 3 male age_group1 1 840
#> 4579: 2 4 3 4 1 3 female age_group1 1 6258
#> 4580: 2 4 3 4 1 3 male age_group1 1 7019
#> income savings ori_hid sampling_weight household_weights cnt_females
#> <num> <num> <int> <int> <num> <num>
#> 1: 5780 12 1 95 25.00000 0
#> 2: 2530 28 1 82 25.00000 1
#> 3: 6920 550 1 66 25.00000 0
#> 4: 7960 870 1 50 25.00000 0
#> 5: 9030 20 2 87 16.66667 0
#> ---
#> 4576: 7900 278 1000 92 16.66667 1
#> 4577: 1420 987 1000 40 16.66667 0
#> 4578: 8900 684 1000 39 16.66667 0
#> 4579: 3880 294 1000 88 16.66667 1
#> 4580: 4830 911 1000 46 16.66667 0
#> cnt_males cnt_highincome mixed
#> <num> <num> <int>
#> 1: 1 0 -2
#> 2: 0 0 0
#> 3: 1 0 8
#> 4: 1 0 4
#> 5: 1 1 -1
#> ---
#> 4576: 0 0 -4
#> 4577: 1 0 7
#> 4578: 1 0 -9
#> 4579: 0 0 -1
#> 4580: 1 0 -10
# create record keys
x$rkey <- ck_generate_rkeys(dat = x)
# define required inputs
# hierarchy with some bogus codes
d_sex <- hier_create(root = "Total", nodes = c("male", "female"))
d_sex <- hier_add(d_sex, root = "female", "f")
d_sex <- hier_add(d_sex, root = "male", "m")
d_age <- hier_create(root = "Total", nodes = paste0("age_group", 1:6))
d_age <- hier_add(d_age, root = "age_group1", "ag1a")
d_age <- hier_add(d_age, root = "age_group2", "ag2a")
# define the cell key object
countvars <- c("cnt_females", "cnt_males", "cnt_highincome")
numvars <- c("expend", "income", "savings", "mixed")
tab <- ck_setup(
x = x,
rkey = "rkey",
dims = list(sex = d_sex, age = d_age),
w = "sampling_weight",
countvars = countvars,
numvars = numvars)
#> computing contributing indices | rawdata <--> table; this might take a while
# show some information about this table instance
tab$print() # identical with print(tab)
#> ── Table Information ───────────────────────────────────────────────────────────
#> ✔ 45 cells in 2 dimensions ('sex', 'age')
#> ✔ weights: yes
#> ── Tabulated / Perturbed countvars ─────────────────────────────────────────────
#> ☐ 'total'
#> ☐ 'cnt_females'
#> ☐ 'cnt_males'
#> ☐ 'cnt_highincome'
#> ── Tabulated / Perturbed numvars ───────────────────────────────────────────────
#> ☐ 'expend'
#> ☐ 'income'
#> ☐ 'savings'
#> ☐ 'mixed'
# information about the hierarchies
tab$hierarchy_info()
#> $sex
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: male 2 FALSE Total
#> 3: m 3 TRUE male
#> 4: female 2 FALSE Total
#> 5: f 3 TRUE female
#>
#> $age
#> code level is_leaf parent
#> <char> <int> <lgcl> <char>
#> 1: Total 1 FALSE Total
#> 2: age_group1 2 FALSE Total
#> 3: ag1a 3 TRUE age_group1
#> 4: age_group2 2 FALSE Total
#> 5: ag2a 3 TRUE age_group2
#> 6: age_group3 2 TRUE Total
#> 7: age_group4 2 TRUE Total
#> 8: age_group5 2 TRUE Total
#> 9: age_group6 2 TRUE Total
#>
# which variables have been defined?
tab$allvars()
#> $cntvars
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
#>
#> $numvars
#> [1] "expend" "income" "savings" "mixed"
#>
# count variables
tab$cntvars()
#> [1] "total" "cnt_females" "cnt_males" "cnt_highincome"
# continuous variables
tab$numvars()
#> [1] "expend" "income" "savings" "mixed"
# create perturbation parameters for "total" variable and
# write to yaml-file
# create a ptable using functionality from the ptable-pkg
f_yaml <- tempfile(fileext = ".yaml")
p_cnts1 <- ck_params_cnts(
ptab = ptable::pt_ex_cnts(),
path = f_yaml)
#> yaml configuration '/tmp/RtmpHZpsuV/file1d2ca9e4389.yaml' successfully written.
# read parameters from yaml-file and set them for variable `"total"`
p_cnts1 <- ck_read_yaml(path = f_yaml)
tab$params_cnts_set(val = p_cnts1, v = "total")
#> --> setting perturbation parameters for variable 'total'
# create alternative perturbation parameters by specifying parameters
para2 <- ptable::create_cnt_ptable(
D = 8, V = 3, js = 2, create = FALSE)
p_cnts2 <- ck_params_cnts(ptab = para2)
# use these ptable it for the remaining variables
tab$params_cnts_set(val = p_cnts2, v = countvars)
#> --> setting perturbation parameters for variable 'cnt_females'
#> --> setting perturbation parameters for variable 'cnt_males'
#> --> setting perturbation parameters for variable 'cnt_highincome'
# perturb a variable
tab$perturb(v = "total")
#> Count variable 'total' was perturbed.
# multiple variables can be perturbed as well
tab$perturb(v = c("cnt_males", "cnt_highincome"))
#> Count variable 'cnt_males' was perturbed.
#> Count variable 'cnt_highincome' was perturbed.
# return weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 274735 4579 274675.0142
#> 2: Total age_group1 total 1969 117619 1969 117619.0000
#> 3: Total ag1a total 1969 117619 1969 117619.0000
#> 4: Total age_group2 total 1143 68298 1144 68357.7533
#> 5: Total ag2a total 1143 68298 1144 68357.7533
#> 6: Total age_group3 total 864 52685 863 52624.0220
#> 7: Total age_group4 total 423 25052 422 24992.7754
#> 8: Total age_group5 total 168 10362 168 10362.0000
#> 9: Total age_group6 total 13 719 12 663.6923
#> 10: male Total total 2296 139256 2297 139316.6516
#> 11: m Total total 2296 139256 2297 139316.6516
#> 12: male age_group1 total 1015 60996 1015 60996.0000
#> 13: m age_group1 total 1015 60996 1015 60996.0000
#> 14: male ag1a total 1015 60996 1015 60996.0000
#> 15: m ag1a total 1015 60996 1015 60996.0000
#> 16: male age_group2 total 571 34091 572 34150.7040
#> 17: m age_group2 total 571 34091 572 34150.7040
#> 18: male ag2a total 571 34091 572 34150.7040
#> 19: m ag2a total 571 34091 572 34150.7040
#> 20: male age_group3 total 424 26747 425 26810.0825
#> 21: m age_group3 total 424 26747 425 26810.0825
#> 22: male age_group4 total 195 11459 194 11400.2359
#> 23: m age_group4 total 195 11459 194 11400.2359
#> 24: male age_group5 total 84 5548 84 5548.0000
#> 25: m age_group5 total 84 5548 84 5548.0000
#> 26: male age_group6 total 7 415 7 415.0000
#> 27: m age_group6 total 7 415 7 415.0000
#> 28: female Total total 2284 135479 2283 135419.6835
#> 29: f Total total 2284 135479 2283 135419.6835
#> 30: female age_group1 total 954 56623 953 56563.6468
#> 31: f age_group1 total 954 56623 953 56563.6468
#> 32: female ag1a total 954 56623 953 56563.6468
#> 33: f ag1a total 954 56623 953 56563.6468
#> 34: female age_group2 total 572 34207 570 34087.3951
#> 35: f age_group2 total 572 34207 570 34087.3951
#> 36: female ag2a total 572 34207 570 34087.3951
#> 37: f ag2a total 572 34207 570 34087.3951
#> 38: female age_group3 total 440 25938 440 25938.0000
#> 39: f age_group3 total 440 25938 440 25938.0000
#> 40: female age_group4 total 228 13593 226 13473.7632
#> 41: f age_group4 total 228 13593 226 13473.7632
#> 42: female age_group5 total 84 4814 85 4871.3095
#> 43: f age_group5 total 84 4814 85 4871.3095
#> 44: female age_group6 total 6 304 7 354.6667
#> 45: f age_group6 total 6 304 7 354.6667
#> 46: Total Total cnt_males 2296 139256 2298 139377.3031
#> 47: Total age_group1 cnt_males 1015 60996 1014 60935.9054
#> 48: Total ag1a cnt_males 1015 60996 1014 60935.9054
#> 49: Total age_group2 cnt_males 571 34091 572 34150.7040
#> 50: Total ag2a cnt_males 571 34091 572 34150.7040
#> 51: Total age_group3 cnt_males 424 26747 425 26810.0825
#> 52: Total age_group4 cnt_males 195 11459 193 11341.4718
#> 53: Total age_group5 cnt_males 84 5548 84 5548.0000
#> 54: Total age_group6 cnt_males 7 415 7 415.0000
#> 55: male Total cnt_males 2296 139256 2298 139377.3031
#> 56: m Total cnt_males 2296 139256 2298 139377.3031
#> 57: male age_group1 cnt_males 1015 60996 1014 60935.9054
#> 58: m age_group1 cnt_males 1015 60996 1014 60935.9054
#> 59: male ag1a cnt_males 1015 60996 1014 60935.9054
#> 60: m ag1a cnt_males 1015 60996 1014 60935.9054
#> 61: male age_group2 cnt_males 571 34091 572 34150.7040
#> 62: m age_group2 cnt_males 571 34091 572 34150.7040
#> 63: male ag2a cnt_males 571 34091 572 34150.7040
#> 64: m ag2a cnt_males 571 34091 572 34150.7040
#> 65: male age_group3 cnt_males 424 26747 425 26810.0825
#> 66: m age_group3 cnt_males 424 26747 425 26810.0825
#> 67: male age_group4 cnt_males 195 11459 193 11341.4718
#> 68: m age_group4 cnt_males 195 11459 193 11341.4718
#> 69: male age_group5 cnt_males 84 5548 84 5548.0000
#> 70: m age_group5 cnt_males 84 5548 84 5548.0000
#> 71: male age_group6 cnt_males 7 415 7 415.0000
#> 72: m age_group6 cnt_males 7 415 7 415.0000
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# numerical variables (positive variables using flex-function)
# we also write the config to a yaml file
f_yaml <- tempfile(fileext = ".yaml")
# create a ptable using functionality from the ptable-pkg
# a single ptable for all cells
ptab1 <- ptable::pt_ex_nums(parity = TRUE, separation = FALSE)
# a single ptab for all cells except for very small ones
ptab2 <- ptable::pt_ex_nums(parity = TRUE, separation = TRUE)
# different ptables for cells with even/odd number of contributors
# and very small cells
ptab3 <- ptable::pt_ex_nums(parity = FALSE, separation = TRUE)
p_nums1 <- ck_params_nums(
ptab = ptab1,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.30, 0.03),
epsilon = c(1, 0.5, 0.2),
q = 3),
mu_c = 2,
same_key = FALSE,
use_zero_rkeys = FALSE,
path = f_yaml)
#> yaml configuration '/tmp/RtmpHZpsuV/file1d2c23870ab8.yaml' successfully written.
# we read the parameters from the yaml-file
p_nums1 <- ck_read_yaml(path = f_yaml)
# for variables with positive and negative values
p_nums2 <- ck_params_nums(
ptab = ptab2,
type = "top_contr",
top_k = 3,
mult_params = ck_flexparams(
fp = 1000,
p = c(0.15, 0.02),
epsilon = c(1, 0.4, 0.15),
q = 3),
mu_c = 2,
same_key = FALSE)
# simple perturbation parameters (not using the flex-function approach)
p_nums3 <- ck_params_nums(
ptab = ptab3,
type = "mean",
mult_params = ck_simpleparams(p = 0.25),
mu_c = 2,
same_key = FALSE)
# use `p_nums1` for all variables
tab$params_nums_set(p_nums1, c("savings", "income", "expend"))
#> --> setting perturbation parameters for variable 'savings'
#> --> setting perturbation parameters for variable 'income'
#> --> setting perturbation parameters for variable 'expend'
# use different parameters for variable `mixed`
tab$params_nums_set(p_nums2, v = "mixed")
#> --> setting perturbation parameters for variable 'mixed'
# identify sensitive cells to which extra protection (`mu_c`) is added.
tab$supp_p(v = "income", p = 85)
#> computing contributing indices | rawdata <--> table; this might take a while
#> p%-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_pq(v = "income", p = 85, q = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> pq-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_nk(v = "income", n = 2, k = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> nk-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_freq(v = "income", n = 14, weighted = FALSE)
#> freq-rule: 5 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_val(v = "income", n = 10000, weighted = TRUE)
#> val-rule: 0 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_cells(
v = "income",
inp = data.frame(
sex = c("female", "female"),
"age" = c("age_group1", "age_group3")
)
)
#> cell-rule: 2 new sensitive cells (incl. duplicates) found (total: 7)
# perturb variables
tab$perturb(v = c("income", "savings"))
#> Numeric variable 'income' was perturbed.
#> Numeric variable 'savings' was perturbed.
# extract results
tab$numtab("income", mean_before_sum = TRUE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1372944443 1372897637
#> 2: Total age_group1 income 9810547 582608576 582622542
#> 3: Total ag1a income 9810547 582608576 582622542
#> 4: Total age_group2 income 5692119 340734756 340529166
#> 5: Total ag2a income 5692119 340734756 340529166
#> 6: Total age_group3 income 4406946 265921524 265894945
#> 7: Total age_group4 income 2133543 128461945 128406083
#> 8: Total age_group5 income 848151 51983953 51921855
#> 9: Total age_group6 income 61672 3233689 3142045
#> 10: male Total income 11262049 678276253 678313867
#> 11: m Total income 11262049 678276253 678313867
#> 12: male age_group1 income 4877164 291355787 291259712
#> 13: m age_group1 income 4877164 291355787 291259712
#> 14: male ag1a income 4877164 291355787 291259712
#> 15: m ag1a income 4877164 291355787 291259712
#> 16: male age_group2 income 2811379 168045577 168258118
#> 17: m age_group2 income 2811379 168045577 168258118
#> 18: male ag2a income 2811379 168045577 168258118
#> 19: m ag2a income 2811379 168045577 168258118
#> 20: male age_group3 income 2168169 134275518 134265587
#> 21: m age_group3 income 2168169 134275518 134265587
#> 22: male age_group4 income 978510 56832655 56961661
#> 23: m age_group4 income 978510 56832655 56961661
#> 24: male age_group5 income 393134 25913404 25975341
#> 25: m age_group5 income 393134 25913404 25975341
#> 26: male age_group6 income 33693 1853312 1863495
#> 27: m age_group6 income 33693 1853312 1863495
#> 28: female Total income 11690929 694668190 694756801
#> 29: f Total income 11690929 694668190 694756801
#> 30: female age_group1 income 4933383 291252789 291194858
#> 31: f age_group1 income 4933383 291252789 291194858
#> 32: female ag1a income 4933383 291252789 291194858
#> 33: f ag1a income 4933383 291252789 291194858
#> 34: female age_group2 income 2880740 172689179 172559568
#> 35: f age_group2 income 2880740 172689179 172559568
#> 36: female ag2a income 2880740 172689179 172559568
#> 37: f ag2a income 2880740 172689179 172559568
#> 38: female age_group3 income 2238777 131646006 131664254
#> 39: f age_group3 income 2238777 131646006 131664254
#> 40: female age_group4 income 1155033 71629290 71642922
#> 41: f age_group4 income 1155033 71629290 71642922
#> 42: female age_group5 income 455017 26070549 26201591
#> 43: f age_group5 income 455017 26070549 26201591
#> 44: female age_group6 income 27979 1380377 1463458
#> 45: f age_group6 income 27979 1380377 1463458
#> sex age vname uws ws pws
tab$numtab("income", mean_before_sum = FALSE)
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total income 22952978 1372944443 1372921040
#> 2: Total age_group1 income 9810547 582608576 582615559
#> 3: Total ag1a income 9810547 582608576 582615559
#> 4: Total age_group2 income 5692119 340734756 340631945
#> 5: Total ag2a income 5692119 340734756 340631945
#> 6: Total age_group3 income 4406946 265921524 265908234
#> 7: Total age_group4 income 2133543 128461945 128434011
#> 8: Total age_group5 income 848151 51983953 51952895
#> 9: Total age_group6 income 61672 3233689 3187538
#> 10: male Total income 11262049 678276253 678295060
#> 11: m Total income 11262049 678276253 678295060
#> 12: male age_group1 income 4877164 291355787 291307746
#> 13: m age_group1 income 4877164 291355787 291307746
#> 14: male ag1a income 4877164 291355787 291307746
#> 15: m ag1a income 4877164 291355787 291307746
#> 16: male age_group2 income 2811379 168045577 168151814
#> 17: m age_group2 income 2811379 168045577 168151814
#> 18: male ag2a income 2811379 168045577 168151814
#> 19: m ag2a income 2811379 168045577 168151814
#> 20: male age_group3 income 2168169 134275518 134270552
#> 21: m age_group3 income 2168169 134275518 134270552
#> 22: male age_group4 income 978510 56832655 56897122
#> 23: m age_group4 income 978510 56832655 56897122
#> 24: male age_group5 income 393134 25913404 25944354
#> 25: m age_group5 income 393134 25913404 25944354
#> 26: male age_group6 income 33693 1853312 1858397
#> 27: m age_group6 income 33693 1853312 1858397
#> 28: female Total income 11690929 694668190 694712494
#> 29: f Total income 11690929 694668190 694712494
#> 30: female age_group1 income 4933383 291252789 291223822
#> 31: f age_group1 income 4933383 291252789 291223822
#> 32: female ag1a income 4933383 291252789 291223822
#> 33: f ag1a income 4933383 291252789 291223822
#> 34: female age_group2 income 2880740 172689179 172624361
#> 35: f age_group2 income 2880740 172689179 172624361
#> 36: female ag2a income 2880740 172689179 172624361
#> 37: f ag2a income 2880740 172689179 172624361
#> 38: female age_group3 income 2238777 131646006 131655130
#> 39: f age_group3 income 2238777 131646006 131655130
#> 40: female age_group4 income 1155033 71629290 71636106
#> 41: f age_group4 income 1155033 71629290 71636106
#> 42: female age_group5 income 455017 26070549 26135988
#> 43: f age_group5 income 455017 26070549 26135988
#> 44: female age_group6 income 27979 1380377 1421311
#> 45: f age_group6 income 27979 1380377 1421311
#> sex age vname uws ws pws
tab$numtab("savings")
#> sex age vname uws ws pws
#> <char> <char> <char> <num> <num> <num>
#> 1: Total Total savings 2273532 136580295 136580287.1
#> 2: Total age_group1 savings 982386 58709757 58713161.9
#> 3: Total ag1a savings 982386 58709757 58713161.9
#> 4: Total age_group2 savings 552336 33059645 33050718.2
#> 5: Total ag2a savings 552336 33059645 33050718.2
#> 6: Total age_group3 savings 437101 26752461 26747416.2
#> 7: Total age_group4 savings 214661 12733051 12730964.4
#> 8: Total age_group5 savings 80451 4955091 4953802.8
#> 9: Total age_group6 savings 6597 370290 368048.3
#> 10: male Total savings 1159816 70657333 70655693.1
#> 11: m Total savings 1159816 70657333 70655693.1
#> 12: male age_group1 savings 517660 31159350 31155059.7
#> 13: m age_group1 savings 517660 31159350 31155059.7
#> 14: male ag1a savings 517660 31159350 31155059.7
#> 15: m ag1a savings 517660 31159350 31155059.7
#> 16: male age_group2 savings 280923 17014658 17025138.7
#> 17: m age_group2 savings 280923 17014658 17025138.7
#> 18: male ag2a savings 280923 17014658 17025138.7
#> 19: m ag2a savings 280923 17014658 17025138.7
#> 20: male age_group3 savings 214970 13514290 13511055.9
#> 21: m age_group3 savings 214970 13514290 13511055.9
#> 22: male age_group4 savings 99420 5885043 5890217.2
#> 23: m age_group4 savings 99420 5885043 5890217.2
#> 24: male age_group5 savings 43233 2838809 2839759.3
#> 25: m age_group5 savings 43233 2838809 2839759.3
#> 26: male age_group6 savings 3610 245183 245585.7
#> 27: m age_group6 savings 3610 245183 245585.7
#> 28: female Total savings 1113716 65922962 65916769.6
#> 29: f Total savings 1113716 65922962 65916769.6
#> 30: female age_group1 savings 464726 27550407 27551599.1
#> 31: f age_group1 savings 464726 27550407 27551599.1
#> 32: female ag1a savings 464726 27550407 27551599.1
#> 33: f ag1a savings 464726 27550407 27551599.1
#> 34: female age_group2 savings 271413 16044987 16040905.0
#> 35: f age_group2 savings 271413 16044987 16040905.0
#> 36: female ag2a savings 271413 16044987 16040905.0
#> 37: f ag2a savings 271413 16044987 16040905.0
#> 38: female age_group3 savings 222131 13238171 13237410.8
#> 39: f age_group3 savings 222131 13238171 13237410.8
#> 40: female age_group4 savings 115241 6848008 6848459.8
#> 41: f age_group4 savings 115241 6848008 6848459.8
#> 42: female age_group5 savings 37218 2116282 2122532.4
#> 43: f age_group5 savings 37218 2116282 2122532.4
#> 44: female age_group6 savings 2987 125107 126806.7
#> 45: f age_group6 savings 2987 125107 126806.7
#> sex age vname uws ws pws
# results can be resetted, too
tab$reset_cntvars(v = "cnt_males")
# we can then set other parameters and perturb again
tab$params_cnts_set(val = p_cnts1, v = "cnt_males")
#> --> setting perturbation parameters for variable 'cnt_males'
tab$perturb(v = "cnt_males")
#> Count variable 'cnt_males' was perturbed.
# write results to a .csv file
tab$freqtab(
v = c("total", "cnt_males"),
path = file.path(tempdir(), "outtab.csv")
)
#> File '/tmp/RtmpHZpsuV/outtab.csv' successfully written to disk.
#> NULL
# show results containing weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#> sex age vname uwc wc puwc pwc
#> <char> <char> <char> <num> <num> <num> <num>
#> 1: Total Total total 4580 274735 4579 274675.0142
#> 2: Total age_group1 total 1969 117619 1969 117619.0000
#> 3: Total ag1a total 1969 117619 1969 117619.0000
#> 4: Total age_group2 total 1143 68298 1144 68357.7533
#> 5: Total ag2a total 1143 68298 1144 68357.7533
#> 6: Total age_group3 total 864 52685 863 52624.0220
#> 7: Total age_group4 total 423 25052 422 24992.7754
#> 8: Total age_group5 total 168 10362 168 10362.0000
#> 9: Total age_group6 total 13 719 12 663.6923
#> 10: male Total total 2296 139256 2297 139316.6516
#> 11: m Total total 2296 139256 2297 139316.6516
#> 12: male age_group1 total 1015 60996 1015 60996.0000
#> 13: m age_group1 total 1015 60996 1015 60996.0000
#> 14: male ag1a total 1015 60996 1015 60996.0000
#> 15: m ag1a total 1015 60996 1015 60996.0000
#> 16: male age_group2 total 571 34091 572 34150.7040
#> 17: m age_group2 total 571 34091 572 34150.7040
#> 18: male ag2a total 571 34091 572 34150.7040
#> 19: m ag2a total 571 34091 572 34150.7040
#> 20: male age_group3 total 424 26747 425 26810.0825
#> 21: m age_group3 total 424 26747 425 26810.0825
#> 22: male age_group4 total 195 11459 194 11400.2359
#> 23: m age_group4 total 195 11459 194 11400.2359
#> 24: male age_group5 total 84 5548 84 5548.0000
#> 25: m age_group5 total 84 5548 84 5548.0000
#> 26: male age_group6 total 7 415 7 415.0000
#> 27: m age_group6 total 7 415 7 415.0000
#> 28: female Total total 2284 135479 2283 135419.6835
#> 29: f Total total 2284 135479 2283 135419.6835
#> 30: female age_group1 total 954 56623 953 56563.6468
#> 31: f age_group1 total 954 56623 953 56563.6468
#> 32: female ag1a total 954 56623 953 56563.6468
#> 33: f ag1a total 954 56623 953 56563.6468
#> 34: female age_group2 total 572 34207 570 34087.3951
#> 35: f age_group2 total 572 34207 570 34087.3951
#> 36: female ag2a total 572 34207 570 34087.3951
#> 37: f ag2a total 572 34207 570 34087.3951
#> 38: female age_group3 total 440 25938 440 25938.0000
#> 39: f age_group3 total 440 25938 440 25938.0000
#> 40: female age_group4 total 228 13593 226 13473.7632
#> 41: f age_group4 total 228 13593 226 13473.7632
#> 42: female age_group5 total 84 4814 85 4871.3095
#> 43: f age_group5 total 84 4814 85 4871.3095
#> 44: female age_group6 total 6 304 7 354.6667
#> 45: f age_group6 total 6 304 7 354.6667
#> 46: Total Total cnt_males 2296 139256 2297 139316.6516
#> 47: Total age_group1 cnt_males 1015 60996 1015 60996.0000
#> 48: Total ag1a cnt_males 1015 60996 1015 60996.0000
#> 49: Total age_group2 cnt_males 571 34091 572 34150.7040
#> 50: Total ag2a cnt_males 571 34091 572 34150.7040
#> 51: Total age_group3 cnt_males 424 26747 425 26810.0825
#> 52: Total age_group4 cnt_males 195 11459 194 11400.2359
#> 53: Total age_group5 cnt_males 84 5548 84 5548.0000
#> 54: Total age_group6 cnt_males 7 415 7 415.0000
#> 55: male Total cnt_males 2296 139256 2297 139316.6516
#> 56: m Total cnt_males 2296 139256 2297 139316.6516
#> 57: male age_group1 cnt_males 1015 60996 1015 60996.0000
#> 58: m age_group1 cnt_males 1015 60996 1015 60996.0000
#> 59: male ag1a cnt_males 1015 60996 1015 60996.0000
#> 60: m ag1a cnt_males 1015 60996 1015 60996.0000
#> 61: male age_group2 cnt_males 571 34091 572 34150.7040
#> 62: m age_group2 cnt_males 571 34091 572 34150.7040
#> 63: male ag2a cnt_males 571 34091 572 34150.7040
#> 64: m ag2a cnt_males 571 34091 572 34150.7040
#> 65: male age_group3 cnt_males 424 26747 425 26810.0825
#> 66: m age_group3 cnt_males 424 26747 425 26810.0825
#> 67: male age_group4 cnt_males 195 11459 194 11400.2359
#> 68: m age_group4 cnt_males 195 11459 194 11400.2359
#> 69: male age_group5 cnt_males 84 5548 84 5548.0000
#> 70: m age_group5 cnt_males 84 5548 84 5548.0000
#> 71: male age_group6 cnt_males 7 415 7 415.0000
#> 72: m age_group6 cnt_males 7 415 7 415.0000
#> 73: female Total cnt_males 0 0 0 0.0000
#> 74: f Total cnt_males 0 0 0 0.0000
#> 75: female age_group1 cnt_males 0 0 0 0.0000
#> 76: f age_group1 cnt_males 0 0 0 0.0000
#> 77: female ag1a cnt_males 0 0 0 0.0000
#> 78: f ag1a cnt_males 0 0 0 0.0000
#> 79: female age_group2 cnt_males 0 0 0 0.0000
#> 80: f age_group2 cnt_males 0 0 0 0.0000
#> 81: female ag2a cnt_males 0 0 0 0.0000
#> 82: f ag2a cnt_males 0 0 0 0.0000
#> 83: female age_group3 cnt_males 0 0 0 0.0000
#> 84: f age_group3 cnt_males 0 0 0 0.0000
#> 85: female age_group4 cnt_males 0 0 0 0.0000
#> 86: f age_group4 cnt_males 0 0 0 0.0000
#> 87: female age_group5 cnt_males 0 0 0 0.0000
#> 88: f age_group5 cnt_males 0 0 0 0.0000
#> 89: female age_group6 cnt_males 0 0 0 0.0000
#> 90: f age_group6 cnt_males 0 0 0 0.0000
#> sex age vname uwc wc puwc pwc
# utility measures for a count variable
tab$measures_cnts(v = "total", exclude_zeros = TRUE)
#> $overview
#> noise cnt pct
#> <fctr> <int> <num>
#> 1: -1 14 0.3111111
#> 2: 0 13 0.2888889
#> 3: 1 12 0.2666667
#> 4: 2 6 0.1333333
#>
#> $measures
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 1.000 0.000 0.008
#> 5: Q40 1.000 0.001 0.013
#> 6: Mean 0.844 0.011 0.030
#> 7: Median 1.000 0.001 0.016
#> 8: Q60 1.000 0.002 0.021
#> 9: Q70 1.000 0.002 0.024
#> 10: Q80 1.000 0.004 0.042
#> 11: Q90 2.000 0.011 0.062
#> 12: Q95 2.000 0.064 0.126
#> 13: Q99 2.000 0.167 0.196
#> 14: Max 2.000 0.167 0.196
#>
#> $cumdistr_d1
#> cat cnt pct
#> <char> <int> <num>
#> 1: 0 13 0.2888889
#> 2: 1 39 0.8666667
#> 3: 2 45 1.0000000
#>
#> $cumdistr_d2
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 42 0.9333333
#> 2: (0.02,0.05] 42 0.9333333
#> 3: (0.05,0.1] 43 0.9555556
#> 4: (0.1,0.2] 45 1.0000000
#> 5: (0.2,0.3] 45 1.0000000
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $cumdistr_d3
#> cat cnt pct
#> <char> <int> <num>
#> 1: [0,0.02] 25 0.5555556
#> 2: (0.02,0.05] 38 0.8444444
#> 3: (0.05,0.1] 42 0.9333333
#> 4: (0.1,0.2] 45 1.0000000
#> 5: (0.2,0.3] 45 1.0000000
#> 6: (0.3,0.4] 45 1.0000000
#> 7: (0.4,0.5] 45 1.0000000
#> 8: (0.5,Inf] 45 1.0000000
#>
#> $false_zero
#> [1] 0
#>
#> $false_nonzero
#> [1] 0
#>
#> $exclude_zeros
#> [1] TRUE
#>
# modifications for perturbed count variables
tab$mod_cnts()
#> sex age row_nr pert ckey countvar
#> <char> <char> <num> <int> <num> <char>
#> 1: Total Total 14 -1 0.2095876 total
#> 2: Total age_group1 15 0 0.4720923 total
#> 3: Total ag1a 15 0 0.4720923 total
#> 4: Total age_group2 16 1 0.7101392 total
#> 5: Total ag2a 16 1 0.7101392 total
#> ---
#> 131: f age_group4 -1 0 0.0000000 cnt_males
#> 132: female age_group5 -1 0 0.0000000 cnt_males
#> 133: f age_group5 -1 0 0.0000000 cnt_males
#> 134: female age_group6 -1 0 0.0000000 cnt_males
#> 135: f age_group6 -1 0 0.0000000 cnt_males
# display a summary about utility measures
tab$summary()
#> ┌──────────────────────────────────────────────┐
#> │Utility measures for perturbed count variables│
#> └──────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> countvar Min Q10 Q20 Q30 Q40 Mean Median Q60 Q70 Q80
#> <char> <num> <num> <num> <num> <num> <num> <num> <num> <num> <num>
#> 1: total -2 -2 -1 -1 -0.4 -0.222 0 0 0.8 1
#> 2: cnt_highincome -2 -2 -1 0 0.0 0.000 0 0 0.0 1
#> 3: cnt_males -1 0 0 0 0.0 0.200 0 0 0.0 1
#> Q90 Q95 Q99 Max
#> <num> <num> <num> <num>
#> 1: 1.0 1 1 1
#> 2: 1.6 2 4 4
#> 3: 1.0 1 1 1
#>
#> ── Distance-based measures ─────────────────────────────────────────────────────
#> ✔ Variable: 'total'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 1.000 0.000 0.008
#> 5: Q40 1.000 0.001 0.013
#> 6: Mean 0.844 0.011 0.030
#> 7: Median 1.000 0.001 0.016
#> 8: Q60 1.000 0.002 0.021
#> 9: Q70 1.000 0.002 0.024
#> 10: Q80 1.000 0.004 0.042
#> 11: Q90 2.000 0.011 0.062
#> 12: Q95 2.000 0.064 0.126
#> 13: Q99 2.000 0.167 0.196
#> 14: Max 2.000 0.167 0.196
#>
#> ✔ Variable: 'cnt_males'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.556 0.001 0.012
#> 7: Median 1.000 0.000 0.010
#> 8: Q60 1.000 0.002 0.021
#> 9: Q70 1.000 0.002 0.021
#> 10: Q80 1.000 0.002 0.024
#> 11: Q90 1.000 0.003 0.029
#> 12: Q95 1.000 0.005 0.036
#> 13: Q99 1.000 0.005 0.036
#> 14: Max 1.000 0.005 0.036
#>
#> ✔ Variable: 'cnt_highincome'
#>
#> what d1 d2 d3
#> <char> <num> <num> <num>
#> 1: Min 0.0 0.000 0.000
#> 2: Q10 0.0 0.000 0.000
#> 3: Q20 0.0 0.000 0.000
#> 4: Q30 0.0 0.000 0.000
#> 5: Q40 0.6 0.003 0.020
#> 6: Mean 1.0 0.035 0.082
#> 7: Median 1.0 0.010 0.053
#> 8: Q60 1.0 0.011 0.062
#> 9: Q70 1.3 0.030 0.113
#> 10: Q80 2.0 0.049 0.124
#> 11: Q90 2.0 0.143 0.183
#> 12: Q95 2.1 0.149 0.288
#> 13: Q99 4.0 0.267 0.486
#> 14: Max 4.0 0.267 0.486
#>
#> ┌──────────────────────────────────────────────────┐
#> │Utility measures for perturbed numerical variables│
#> └──────────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> vname Min Q10 Q20 Q30 Q40 Mean
#> <char> <num> <num> <num> <num> <num> <num>
#> 1: expend Inf NA NA NA NA NaN
#> 2: income -102810.592 -64817.625 -48041.257 -28966.707 -17335.115 1912.673
#> 3: savings -8926.759 -4742.982 -4123.655 -3234.077 -1639.889 -51.239
#> 4: mixed Inf NA NA NA NA NaN
#> Median Q60 Q70 Q80 Q90 Q95 Q99
#> <num> <num> <num> <num> <num> <num> <num>
#> 1: NA NA NA NA NA NA NA
#> 2: 5084.576 7839.162 28521.323 44304.255 65438.643 106237.09 106237.09
#> 3: -7.858 651.200 1192.077 3404.915 6250.378 10480.73 10480.73
#> 4: NA NA NA NA NA NA NA
#> Max
#> <num>
#> 1: -Inf
#> 2: 106237.09
#> 3: 10480.73
#> 4: -Inf
# }