R6 Class defining statistical tables that can be perturbed

This class allows to define statistical tables and perturb both count and numerical variables.

ck_setup(x, rkey, dims, w = NULL, countvars = NULL, numvars = NULL)

Arguments

x: an object coercible to a data.frame
rkey: either a column name within x referring to a variable containing record keys or a single integer(ish) number > 5 that referns to the number of digits for record keys that will be generated internally.
dims: a list containing slots for each variable that should be tabulated. Each slot consists should be created/modified using sdcHierarchies::hier_create(), sdcHierarchies::hier_add() and other functionality from package sdcHierarchies.
w: (character) a scalar character referring to a variable in x holding sampling weights. If w is NULL (the default), all weights are assumed to be 1
countvars: (character) an optional vector containing names of binary (0/1 coded) variables withing x that should be included in the problem instance. These variables can later be perturbed.
numvars: (character) an optional vector of numerical variables that can later be tabulated.

Value

A new cellkey_obj object. Such objects (internally) contain the fully computed statistical tables given input microdata (x), the hierarchical definitionals (dims) as well as the remaining inputs. Intermediate results are stored internally and can only be modified / accessed via the exported public methods described below.

Details

Such objects are typically generated using ck_setup().

Methods

Method `new()`

Create a new table instance

Usage

ck_class$new(x, rkey, dims, w = NULL, countvars = NULL, numvars = NULL)

Arguments

x: an object coercible to a data.frame
rkey: either a column name within x referring to a variable containing record keys or a single integer(ish) number > 5 that referns to the number of digits for record keys that will be generated internally.
dims: a list containing slots for each variable that should be tabulated. Each slot consists should be created/modified using sdcHierarchies::hier_create(), sdcHierarchies::hier_add() and other functionality from package sdcHierarchies.
w: (character) a scalar character referring to a variable in x holding sampling weights. If w is NULL (the default), all weights are assumed to be 1
countvars: (character) an optional vector containing names of binary (0/1 coded) variables withing x that should be included in the problem instance. These variables can later be perturbed.
numvars: (character) an optional vector of numerical variables that can later be tabulated.

Returns

Method `perturb()`

Perturb a count- or magnitude variable

Usage

ck_class$perturb(v)

Arguments

v: name(s) of count- or magnitude variables that should be perturbed.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. Updated data can be accessed using other exported methods like $freqtab() or $numtab().

Method `freqtab()`

Extract results from already perturbed count variables as a data.table

Usage

ck_class$freqtab(v = NULL, path = NULL)

Arguments

v: a vector of variable names for count variables. If NULL (the default), the results are returned for all available count variables. For variables that have not yet perturbed, columns puwc and pwc are filled with NA.
path: if not NULL, a scalar character defining a (relative or absolute) path to which the result table should be written. A csv file will be generated and, if specified, path must have ".csv" as file-ending

Returns

This method returns a data.table containing all combinations of the dimensional variables in the first n columns. Additionally, the following columns are shown:

vname: name of the perturbed variable
uwc: unweighted counts
wc: weighted counts
puwc: perturbed unweighted counts or NA if vname was not yet perturbed
pwc: perturbed weighted counts or NA if vname was not yet perturbed

Method `numtab()`

Extract results from already perturbed continuous variables as a data.table.

Usage

ck_class$numtab(v = NULL, mean_before_sum = FALSE, path = NULL)

Arguments

v

a vector of variable names of continuous variables. If NULL (the default), the results are returned for all available numeric variables.

mean_before_sum

(logical); if TRUE, the perturbed values are adjusted by a factor ((n+p))⁄n with

n: the original weighted cell value
p: the perturbed cell value

This makes sense if the the accuracy of the variable mean is considered to be more important than accuracy of sums of the variable. The default value is FALSE (no adjustment is done)

path

if not NULL, a scalar character defining a (relative or absolute) path to which the result table should be written. A csv file will be generated and, if specified, path must have ".csv" as file-ending

Returns

This method returns a data.table containing all combinations of the dimensional variables in the first n columns. Additionally, the following columns are shown:

vname: name of the perturbed variable
uws: unweighted sum of the given variable
ws: weighted cellsum
pws: perturbed weighted sum of the given cell or NA if vname has not not perturbed

Method `measures_cnts()`

Utility measures for perturbed count variables

Usage

ck_class$measures_cnts(v, exclude_zeros = TRUE)

Arguments

v: name of a count variable for which utility measures should be computed.
exclude_zeros: should empty (zero) cells in the original values be excluded when computing distance measures

Returns

This method returns a list containing a set of utility measures based on some distance functions. For a detailed description of the computed measures, see ck_cnt_measures()

Method `measures_nums()`

Utility measures for continuous variables (not yet implemented)

Usage

ck_class$measures_nums(v)

Arguments

v: name of a continuous variable for which utility measures should be computed.

Returns

for (now) an empty list; In future versions of the package, the Method will return utility measures for perturbed magnitude tables.

Method `allvars()`

Names of variables that can be perturbed / tabulated

Usage

ck_class$allvars()

Returns

returns a list with the following two elements:

cntvars: character vector with names of available count variables for perturbation
numvars: character vector with names of available numerical variables for perturbation

Method `cntvars()`

Names of count variables that can be perturbed

Usage

ck_class$cntvars()

Returns

a character vector containing variable names

Method `numvars()`

Names of continuous variables that can be perturbed

Usage

ck_class$numvars()

Returns

a character vector containing variable names

Method `hierarchy_info()`

Information about hierarchies

Usage

ck_class$hierarchy_info()

Returns

a list (for each dimensional variable) with information on the hierarchies. This may be used to restrict output tables to specific levels or codes. Each list element is a data.table containing the following variables:

code: the name of a code within the hierarchy
level: number defining the level of the code; the higher the number, the lower the hierarchy with 1 being the overall total
is_leaf: if TRUE, this code is a leaf node which means no other codes contribute to it
parent: name of the parent code

Method `mod_cnts()`

Modifications applied to count variables

Usage

ck_class$mod_cnts()

Returns

a data.table containing modifications applied to count variables

Method `mod_nums()`

Modifications applied to numerical variables

Usage

ck_class$mod_nums()

Returns

a data.table containing modifications applied to numerical variables

Method `supp_freq()`

Identify sensitive cells based on minimum frequency rule

Usage

ck_class$supp_freq(v, n, weighted = TRUE)

Arguments

v: a single variable name of a continuous variable (see method numvars())
n: a number defining the threshold. All cells <= n are considered as unsafe.
weighted: if TRUE, the weighted number of contributors to a cell are compared to the threshold specified in n (default); else the unweighted number of contributors is used.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `supp_val()`

Identify sensitive cells based on weighted or unweighted cell value

Usage

ck_class$supp_val(v, n, weighted = TRUE)

Arguments

v: a single variable name of a continuous variable (see method numvars())
n: a number defining the threshold. All cells <= n are considered as unsafe.
weighted: if TRUE, the weighted cell value of variable v is compared to the threshold specified in n (default); else the unweighted number is used.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `supp_cells()`

Identify sensitive cells based on their names

Usage

ck_class$supp_cells(v, inp)

Arguments

v: a single variable name of a continuous variable (see method numvars())
inp: a data.frame where each colum represents a dimensional variable. Each row of this input is then used to compute the relevant cells to be identified as sensitive where NA-values are possible and used to match any characteristics of the dimensional variable.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `supp_p()`

Identify sensitive cells based on the p%-rule rule. Please note that this rule can only be applied to positive-only variables.

Usage

ck_class$supp_p(v, p)

Arguments

v: a single variable name of a continuous variable (see method numvars())
p: a number defining a percentage between 1 and 99.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `supp_pq()`

Identify sensitive cells based on the pq-rule. Please note that this rule can only be applied to positive-only variables.

Usage

ck_class$supp_pq(v, p, q)

Arguments

v: a single variable name of a continuous variable (see method numvars())
p: a number defining a percentage between 1 and 99.
q: a number defining a percentage between 1 and 99. This value must be larger than p.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `supp_nk()`

Identify sensitive cells based on the nk-dominance rule. Please note that this rule can only be applied to positive-only variables.

Usage

ck_class$supp_nk(v, n, k)

Arguments

v: a single variable name of a continuous variable (see method numvars())
n: an integerish number >= 2
k: a number defining a percentage between 1 and 99. All cells to which the top n contributers contribute more than k% is considered unsafe

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `params_cnts_get()`

Return perturbation parameters of count variables

Usage

ck_class$params_cnts_get()

Returns

a named list in which each list-element contains the active perturbation parameters for the specific count variable defined by the list-name.

Method `params_cnts_set()`

Set perturbation parameters for count variables

Usage

ck_class$params_cnts_set(val, v = NULL)

Arguments

val: a perturbation object created with ck_params_cnts()
v: a character vector (or NULL). If NULL (the default), the perturbation parameters provided in val are set for all count variables; otherwise one may specify the names of the count variables for which the parameters should be set.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `reset_cntvars()`

reset results and parameters for already perturbed count variables

Usage

ck_class$reset_cntvars(v = NULL)

Arguments

v: if v equals NULL (the default), the results are reset for all perturbed count variables; otherwise it is possible to specify the names of already perturbed count variables.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb() or $freqtab()).

Method `reset_numvars()`

reset results and parameters for already perturbed numerical variables

Usage

ck_class$reset_numvars(v = NULL)

Arguments

v: if v equals NULL (the default), the results are reset for all perturbed numerical variables; otherwise it is possible to specify the names of already perturbed continuous variables.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb() or $numtab()).

Method `reset_allvars()`

reset results and parameters for all already perturbed variables.

Usage

ck_class$reset_allvars()

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb(), $freqtab() or $numtab()).

Method `params_nums_get()`

Return perturbation parameters of continuous variables

Usage

ck_class$params_nums_get()

Returns

a named list in which each list-element contains the active perturbation parameters for the specific continuous variable defined by the list-name.

Method `params_nums_set()`

set perturbation parameters for continuous variables.

Usage

ck_class$params_nums_set(val, v = NULL)

Arguments

val: a perturbation object created with ck_params_nums()
v: a character vector (or NULL); if NULL (the default), the perturbation parameters provided in val are set for all continuous variables; otherwise one may specify the names of the numeric variables for which the parameters should be set.

Returns

A modified cellkey_obj object in which private slots were updated for side-effects. These updated values are used by other methods (e.g $perturb()).

Method `summary()`

some aggregated summary statistics about perturbed variables

Usage

ck_class$summary()

Returns

invisible NULL

Method `print()`

prints information about the current table

Usage

ck_class$print()

Returns

invisible NULL

Examples

# \donttest{
x <- ck_create_testdata()

# create some 0/1 variables that should be perturbed later
x[, cnt_females := ifelse(sex == "male", 0, 1)]
#>       urbrur  roof walls water electcon relat    sex        age hhcivil expend
#>        <int> <int> <int> <int>    <int> <int> <fctr>     <fctr>   <int>  <num>
#>    1:      2     4     3     3        1     1   male age_group3       2   9093
#>    2:      2     4     3     3        1     2 female age_group3       2   2734
#>    3:      2     4     3     3        1     3   male age_group1       1   2652
#>    4:      2     4     3     3        1     3   male age_group1       1   1807
#>    5:      2     4     2     3        1     1   male age_group4       2    671
#>   ---                                                                         
#> 4576:      2     4     3     4        1     2 female age_group3       2   3696
#> 4577:      2     4     3     4        1     3   male age_group1       1    282
#> 4578:      2     4     3     4        1     3   male age_group1       1    840
#> 4579:      2     4     3     4        1     3 female age_group1       1   6258
#> 4580:      2     4     3     4        1     3   male age_group1       1   7019
#>       income savings ori_hid sampling_weight household_weights cnt_females
#>        <num>   <num>   <int>           <int>             <num>       <num>
#>    1:   5780      12       1              64          25.00000           0
#>    2:   2530      28       1              42          25.00000           1
#>    3:   6920     550       1              95          25.00000           0
#>    4:   7960     870       1              82          25.00000           0
#>    5:   9030      20       2              66          16.66667           0
#>   ---                                                                     
#> 4576:   7900     278    1000              67          16.66667           1
#> 4577:   1420     987    1000              51          16.66667           0
#> 4578:   8900     684    1000              92          16.66667           0
#> 4579:   3880     294    1000              40          16.66667           1
#> 4580:   4830     911    1000              39          16.66667           0
x[, cnt_males := ifelse(sex == "male", 1, 0)]
#>       urbrur  roof walls water electcon relat    sex        age hhcivil expend
#>        <int> <int> <int> <int>    <int> <int> <fctr>     <fctr>   <int>  <num>
#>    1:      2     4     3     3        1     1   male age_group3       2   9093
#>    2:      2     4     3     3        1     2 female age_group3       2   2734
#>    3:      2     4     3     3        1     3   male age_group1       1   2652
#>    4:      2     4     3     3        1     3   male age_group1       1   1807
#>    5:      2     4     2     3        1     1   male age_group4       2    671
#>   ---                                                                         
#> 4576:      2     4     3     4        1     2 female age_group3       2   3696
#> 4577:      2     4     3     4        1     3   male age_group1       1    282
#> 4578:      2     4     3     4        1     3   male age_group1       1    840
#> 4579:      2     4     3     4        1     3 female age_group1       1   6258
#> 4580:      2     4     3     4        1     3   male age_group1       1   7019
#>       income savings ori_hid sampling_weight household_weights cnt_females
#>        <num>   <num>   <int>           <int>             <num>       <num>
#>    1:   5780      12       1              64          25.00000           0
#>    2:   2530      28       1              42          25.00000           1
#>    3:   6920     550       1              95          25.00000           0
#>    4:   7960     870       1              82          25.00000           0
#>    5:   9030      20       2              66          16.66667           0
#>   ---                                                                     
#> 4576:   7900     278    1000              67          16.66667           1
#> 4577:   1420     987    1000              51          16.66667           0
#> 4578:   8900     684    1000              92          16.66667           0
#> 4579:   3880     294    1000              40          16.66667           1
#> 4580:   4830     911    1000              39          16.66667           0
#>       cnt_males
#>           <num>
#>    1:         1
#>    2:         0
#>    3:         1
#>    4:         1
#>    5:         1
#>   ---          
#> 4576:         0
#> 4577:         1
#> 4578:         1
#> 4579:         0
#> 4580:         1
x[, cnt_highincome := ifelse(income >= 9000, 1, 0)]
#>       urbrur  roof walls water electcon relat    sex        age hhcivil expend
#>        <int> <int> <int> <int>    <int> <int> <fctr>     <fctr>   <int>  <num>
#>    1:      2     4     3     3        1     1   male age_group3       2   9093
#>    2:      2     4     3     3        1     2 female age_group3       2   2734
#>    3:      2     4     3     3        1     3   male age_group1       1   2652
#>    4:      2     4     3     3        1     3   male age_group1       1   1807
#>    5:      2     4     2     3        1     1   male age_group4       2    671
#>   ---                                                                         
#> 4576:      2     4     3     4        1     2 female age_group3       2   3696
#> 4577:      2     4     3     4        1     3   male age_group1       1    282
#> 4578:      2     4     3     4        1     3   male age_group1       1    840
#> 4579:      2     4     3     4        1     3 female age_group1       1   6258
#> 4580:      2     4     3     4        1     3   male age_group1       1   7019
#>       income savings ori_hid sampling_weight household_weights cnt_females
#>        <num>   <num>   <int>           <int>             <num>       <num>
#>    1:   5780      12       1              64          25.00000           0
#>    2:   2530      28       1              42          25.00000           1
#>    3:   6920     550       1              95          25.00000           0
#>    4:   7960     870       1              82          25.00000           0
#>    5:   9030      20       2              66          16.66667           0
#>   ---                                                                     
#> 4576:   7900     278    1000              67          16.66667           1
#> 4577:   1420     987    1000              51          16.66667           0
#> 4578:   8900     684    1000              92          16.66667           0
#> 4579:   3880     294    1000              40          16.66667           1
#> 4580:   4830     911    1000              39          16.66667           0
#>       cnt_males cnt_highincome
#>           <num>          <num>
#>    1:         1              0
#>    2:         0              0
#>    3:         1              0
#>    4:         1              0
#>    5:         1              1
#>   ---                         
#> 4576:         0              0
#> 4577:         1              0
#> 4578:         1              0
#> 4579:         0              0
#> 4580:         1              0
# a variable with positive and negative contributions
x[, mixed := sample(-10:10, nrow(x), replace = TRUE)]
#>       urbrur  roof walls water electcon relat    sex        age hhcivil expend
#>        <int> <int> <int> <int>    <int> <int> <fctr>     <fctr>   <int>  <num>
#>    1:      2     4     3     3        1     1   male age_group3       2   9093
#>    2:      2     4     3     3        1     2 female age_group3       2   2734
#>    3:      2     4     3     3        1     3   male age_group1       1   2652
#>    4:      2     4     3     3        1     3   male age_group1       1   1807
#>    5:      2     4     2     3        1     1   male age_group4       2    671
#>   ---                                                                         
#> 4576:      2     4     3     4        1     2 female age_group3       2   3696
#> 4577:      2     4     3     4        1     3   male age_group1       1    282
#> 4578:      2     4     3     4        1     3   male age_group1       1    840
#> 4579:      2     4     3     4        1     3 female age_group1       1   6258
#> 4580:      2     4     3     4        1     3   male age_group1       1   7019
#>       income savings ori_hid sampling_weight household_weights cnt_females
#>        <num>   <num>   <int>           <int>             <num>       <num>
#>    1:   5780      12       1              64          25.00000           0
#>    2:   2530      28       1              42          25.00000           1
#>    3:   6920     550       1              95          25.00000           0
#>    4:   7960     870       1              82          25.00000           0
#>    5:   9030      20       2              66          16.66667           0
#>   ---                                                                     
#> 4576:   7900     278    1000              67          16.66667           1
#> 4577:   1420     987    1000              51          16.66667           0
#> 4578:   8900     684    1000              92          16.66667           0
#> 4579:   3880     294    1000              40          16.66667           1
#> 4580:   4830     911    1000              39          16.66667           0
#>       cnt_males cnt_highincome mixed
#>           <num>          <num> <int>
#>    1:         1              0    -6
#>    2:         0              0     2
#>    3:         1              0    -2
#>    4:         1              0     0
#>    5:         1              1     8
#>   ---                               
#> 4576:         0              0    -2
#> 4577:         1              0     5
#> 4578:         1              0    -4
#> 4579:         0              0     7
#> 4580:         1              0    -9

# create record keys
x$rkey <- ck_generate_rkeys(dat = x)

# define required inputs

# hierarchy with some bogus codes
d_sex <- hier_create(root = "Total", nodes = c("male", "female"))
d_sex <- hier_add(d_sex, root = "female", "f")
d_sex <- hier_add(d_sex, root = "male", "m")

d_age <- hier_create(root = "Total", nodes = paste0("age_group", 1:6))
d_age <- hier_add(d_age, root = "age_group1", "ag1a")
d_age <- hier_add(d_age, root = "age_group2", "ag2a")

# define the cell key object
countvars <- c("cnt_females", "cnt_males", "cnt_highincome")
numvars <- c("expend", "income", "savings", "mixed")
tab <- ck_setup(
  x = x,
  rkey = "rkey",
  dims = list(sex = d_sex, age = d_age),
  w = "sampling_weight",
  countvars = countvars,
  numvars = numvars)
#> computing contributing indices | rawdata <--> table; this might take a while

# show some information about this table instance
tab$print() # identical with print(tab)
#> ── Table Information ───────────────────────────────────────────────────────────
#> ✔ 45 cells in 2 dimensions ('sex', 'age')
#> ✔ weights: yes
#> ── Tabulated / Perturbed countvars ─────────────────────────────────────────────
#> ☐ 'total'
#> ☐ 'cnt_females'
#> ☐ 'cnt_males'
#> ☐ 'cnt_highincome'
#> ── Tabulated / Perturbed numvars ───────────────────────────────────────────────
#> ☐ 'expend'
#> ☐ 'income'
#> ☐ 'savings'
#> ☐ 'mixed'

# information about the hierarchies
tab$hierarchy_info()
#> $sex
#>      code level is_leaf parent
#>    <char> <int>  <lgcl> <char>
#> 1:  Total     1   FALSE  Total
#> 2:   male     2   FALSE  Total
#> 3:      m     3    TRUE   male
#> 4: female     2   FALSE  Total
#> 5:      f     3    TRUE female
#> 
#> $age
#>          code level is_leaf     parent
#>        <char> <int>  <lgcl>     <char>
#> 1:      Total     1   FALSE      Total
#> 2: age_group1     2   FALSE      Total
#> 3:       ag1a     3    TRUE age_group1
#> 4: age_group2     2   FALSE      Total
#> 5:       ag2a     3    TRUE age_group2
#> 6: age_group3     2    TRUE      Total
#> 7: age_group4     2    TRUE      Total
#> 8: age_group5     2    TRUE      Total
#> 9: age_group6     2    TRUE      Total
#> 

# which variables have been defined?
tab$allvars()
#> $cntvars
#> [1] "total"          "cnt_females"    "cnt_males"      "cnt_highincome"
#> 
#> $numvars
#> [1] "expend"  "income"  "savings" "mixed"  
#> 

# count variables
tab$cntvars()
#> [1] "total"          "cnt_females"    "cnt_males"      "cnt_highincome"

# continuous variables
tab$numvars()
#> [1] "expend"  "income"  "savings" "mixed"  

# create perturbation parameters for "total" variable and
# write to yaml-file

# create a ptable using functionality from the ptable-pkg
f_yaml <- tempfile(fileext = ".yaml")
p_cnts1 <- ck_params_cnts(
  ptab = ptable::pt_ex_cnts(),
  path = f_yaml)
#> yaml configuration '/tmp/Rtmpu511Tl/file1fa549201ce2.yaml' successfully written.

# read parameters from yaml-file and set them for variable `"total"`
p_cnts1 <- ck_read_yaml(path = f_yaml)

tab$params_cnts_set(val = p_cnts1, v = "total")
#> --> setting perturbation parameters for variable 'total'

# create alternative perturbation parameters by specifying parameters
para2 <- ptable::create_cnt_ptable(
  D = 8, V = 3, js = 2, create = FALSE)

p_cnts2 <- ck_params_cnts(ptab = para2)

# use these ptable it for the remaining variables
tab$params_cnts_set(val = p_cnts2, v = countvars)
#> --> setting perturbation parameters for variable 'cnt_females'
#> --> setting perturbation parameters for variable 'cnt_males'
#> --> setting perturbation parameters for variable 'cnt_highincome'

# perturb a variable
tab$perturb(v = "total")
#> Count variable 'total' was perturbed.

# multiple variables can be perturbed as well
tab$perturb(v = c("cnt_males", "cnt_highincome"))
#> Count variable 'cnt_males' was perturbed.
#> Count variable 'cnt_highincome' was perturbed.

# return weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#>        sex        age     vname   uwc     wc  puwc        pwc
#>     <char>     <char>    <char> <num>  <num> <num>      <num>
#>  1:  Total      Total     total  4580 274707  4582 274826.959
#>  2:  Total age_group1     total  1969 118802  1969 118802.000
#>  3:  Total       ag1a     total  1969 118802  1969 118802.000
#>  4:  Total age_group2     total  1143  68583  1141  68462.995
#>  5:  Total       ag2a     total  1143  68583  1141  68462.995
#>  6:  Total age_group3     total   864  51473   864  51473.000
#>  7:  Total age_group4     total   423  25121   423  25121.000
#>  8:  Total age_group5     total   168   9970   167   9910.655
#>  9:  Total age_group6     total    13    758    13    758.000
#> 10:   male      Total     total  2296 137345  2296 137345.000
#> 11:      m      Total     total  2296 137345  2296 137345.000
#> 12:   male age_group1     total  1015  61334  1014  61273.572
#> 13:      m age_group1     total  1015  61334  1014  61273.572
#> 14:   male       ag1a     total  1015  61334  1014  61273.572
#> 15:      m       ag1a     total  1015  61334  1014  61273.572
#> 16:   male age_group2     total   571  34108   572  34167.734
#> 17:      m age_group2     total   571  34108   572  34167.734
#> 18:   male       ag2a     total   571  34108   572  34167.734
#> 19:      m       ag2a     total   571  34108   572  34167.734
#> 20:   male age_group3     total   424  24983   424  24983.000
#> 21:      m age_group3     total   424  24983   424  24983.000
#> 22:   male age_group4     total   195  11748   196  11808.246
#> 23:      m age_group4     total   195  11748   196  11808.246
#> 24:   male age_group5     total    84   4696    84   4696.000
#> 25:      m age_group5     total    84   4696    84   4696.000
#> 26:   male age_group6     total     7    476     7    476.000
#> 27:      m age_group6     total     7    476     7    476.000
#> 28: female      Total     total  2284 137362  2284 137362.000
#> 29:      f      Total     total  2284 137362  2284 137362.000
#> 30: female age_group1     total   954  57468   953  57407.761
#> 31:      f age_group1     total   954  57468   953  57407.761
#> 32: female       ag1a     total   954  57468   953  57407.761
#> 33:      f       ag1a     total   954  57468   953  57407.761
#> 34: female age_group2     total   572  34475   571  34414.729
#> 35:      f age_group2     total   572  34475   571  34414.729
#> 36: female       ag2a     total   572  34475   571  34414.729
#> 37:      f       ag2a     total   572  34475   571  34414.729
#> 38: female age_group3     total   440  26490   439  26429.795
#> 39:      f age_group3     total   440  26490   439  26429.795
#> 40: female age_group4     total   228  13373   228  13373.000
#> 41:      f age_group4     total   228  13373   228  13373.000
#> 42: female age_group5     total    84   5274    84   5274.000
#> 43:      f age_group5     total    84   5274    84   5274.000
#> 44: female age_group6     total     6    282     7    329.000
#> 45:      f age_group6     total     6    282     7    329.000
#> 46:  Total      Total cnt_males  2296 137345  2296 137345.000
#> 47:  Total age_group1 cnt_males  1015  61334  1014  61273.572
#> 48:  Total       ag1a cnt_males  1015  61334  1014  61273.572
#> 49:  Total age_group2 cnt_males   571  34108   572  34167.734
#> 50:  Total       ag2a cnt_males   571  34108   572  34167.734
#> 51:  Total age_group3 cnt_males   424  24983   424  24983.000
#> 52:  Total age_group4 cnt_males   195  11748   196  11808.246
#> 53:  Total age_group5 cnt_males    84   4696    85   4751.905
#> 54:  Total age_group6 cnt_males     7    476     7    476.000
#> 55:   male      Total cnt_males  2296 137345  2296 137345.000
#> 56:      m      Total cnt_males  2296 137345  2296 137345.000
#> 57:   male age_group1 cnt_males  1015  61334  1014  61273.572
#> 58:      m age_group1 cnt_males  1015  61334  1014  61273.572
#> 59:   male       ag1a cnt_males  1015  61334  1014  61273.572
#> 60:      m       ag1a cnt_males  1015  61334  1014  61273.572
#> 61:   male age_group2 cnt_males   571  34108   572  34167.734
#> 62:      m age_group2 cnt_males   571  34108   572  34167.734
#> 63:   male       ag2a cnt_males   571  34108   572  34167.734
#> 64:      m       ag2a cnt_males   571  34108   572  34167.734
#> 65:   male age_group3 cnt_males   424  24983   424  24983.000
#> 66:      m age_group3 cnt_males   424  24983   424  24983.000
#> 67:   male age_group4 cnt_males   195  11748   196  11808.246
#> 68:      m age_group4 cnt_males   195  11748   196  11808.246
#> 69:   male age_group5 cnt_males    84   4696    85   4751.905
#> 70:      m age_group5 cnt_males    84   4696    85   4751.905
#> 71:   male age_group6 cnt_males     7    476     7    476.000
#> 72:      m age_group6 cnt_males     7    476     7    476.000
#> 73: female      Total cnt_males     0      0     0      0.000
#> 74:      f      Total cnt_males     0      0     0      0.000
#> 75: female age_group1 cnt_males     0      0     0      0.000
#> 76:      f age_group1 cnt_males     0      0     0      0.000
#> 77: female       ag1a cnt_males     0      0     0      0.000
#> 78:      f       ag1a cnt_males     0      0     0      0.000
#> 79: female age_group2 cnt_males     0      0     0      0.000
#> 80:      f age_group2 cnt_males     0      0     0      0.000
#> 81: female       ag2a cnt_males     0      0     0      0.000
#> 82:      f       ag2a cnt_males     0      0     0      0.000
#> 83: female age_group3 cnt_males     0      0     0      0.000
#> 84:      f age_group3 cnt_males     0      0     0      0.000
#> 85: female age_group4 cnt_males     0      0     0      0.000
#> 86:      f age_group4 cnt_males     0      0     0      0.000
#> 87: female age_group5 cnt_males     0      0     0      0.000
#> 88:      f age_group5 cnt_males     0      0     0      0.000
#> 89: female age_group6 cnt_males     0      0     0      0.000
#> 90:      f age_group6 cnt_males     0      0     0      0.000
#>        sex        age     vname   uwc     wc  puwc        pwc

# numerical variables (positive variables using flex-function)
# we also write the config to a yaml file
f_yaml <- tempfile(fileext = ".yaml")

# create a ptable using functionality from the ptable-pkg
# a single ptable for all cells
ptab1 <- ptable::pt_ex_nums(parity = TRUE, separation = FALSE)

# a single ptab for all cells except for very small ones
ptab2 <- ptable::pt_ex_nums(parity = TRUE, separation = TRUE)

# different ptables for cells with even/odd number of contributors
# and very small cells
ptab3 <- ptable::pt_ex_nums(parity = FALSE, separation = TRUE)

p_nums1 <- ck_params_nums(
  ptab = ptab1,
  type = "top_contr",
  top_k = 3,
  mult_params = ck_flexparams(
    fp = 1000,
    p = c(0.30, 0.03),
    epsilon = c(1, 0.5, 0.2),
    q = 3),
  mu_c = 2,
  same_key = FALSE,
  use_zero_rkeys = FALSE,
  path = f_yaml)
#> yaml configuration '/tmp/Rtmpu511Tl/file1fa562262870.yaml' successfully written.

# we read the parameters from the yaml-file
p_nums1 <- ck_read_yaml(path = f_yaml)

# for variables with positive and negative values
p_nums2 <- ck_params_nums(
  ptab = ptab2,
  type = "top_contr",
  top_k = 3,
  mult_params = ck_flexparams(
    fp = 1000,
    p = c(0.15, 0.02),
    epsilon = c(1, 0.4, 0.15),
    q = 3),
  mu_c = 2,
  same_key = FALSE)

# simple perturbation parameters (not using the flex-function approach)
p_nums3 <- ck_params_nums(
  ptab = ptab3,
  type = "mean",
  mult_params = ck_simpleparams(p = 0.25),
  mu_c = 2,
  same_key = FALSE)

# use `p_nums1` for all variables
tab$params_nums_set(p_nums1, c("savings", "income", "expend"))
#> --> setting perturbation parameters for variable 'savings'
#> --> setting perturbation parameters for variable 'income'
#> --> setting perturbation parameters for variable 'expend'

# use different parameters for variable `mixed`
tab$params_nums_set(p_nums2, v = "mixed")
#> --> setting perturbation parameters for variable 'mixed'

# identify sensitive cells to which extra protection (`mu_c`) is added.
tab$supp_p(v = "income", p = 85)
#> computing contributing indices | rawdata <--> table; this might take a while
#> p%-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_pq(v = "income", p = 85, q = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> pq-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_nk(v = "income", n = 2, k = 90)
#> computing contributing indices | rawdata <--> table; this might take a while
#> nk-rule: 0 new sensitive cells (incl. duplicates) found (total: 0)
tab$supp_freq(v = "income", n = 14, weighted = FALSE)
#> freq-rule: 5 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_val(v = "income", n = 10000, weighted = TRUE)
#> val-rule: 0 new sensitive cells (incl. duplicates) found (total: 5)
tab$supp_cells(
  v = "income",
  inp = data.frame(
    sex = c("female", "female"),
    "age" = c("age_group1", "age_group3")
  )
)
#> cell-rule: 2 new sensitive cells (incl. duplicates) found (total: 7)

# perturb variables
tab$perturb(v = c("income", "savings"))
#> Numeric variable 'income' was perturbed.
#> Numeric variable 'savings' was perturbed.

# extract results
tab$numtab("income", mean_before_sum = TRUE)
#>        sex        age  vname      uws         ws        pws
#>     <char>     <char> <char>    <num>      <num>      <num>
#>  1:  Total      Total income 22952978 1377005797 1376945138
#>  2:  Total age_group1 income  9810547  590371940  590376154
#>  3:  Total       ag1a income  9810547  590371940  590376154
#>  4:  Total age_group2 income  5692119  340327551  340187928
#>  5:  Total       ag2a income  5692119  340327551  340187928
#>  6:  Total age_group3 income  4406946  263870748  263807267
#>  7:  Total age_group4 income  2133543  128635475  128623136
#>  8:  Total age_group5 income   848151   50513336   50403682
#>  9:  Total age_group6 income    61672    3286747    3051762
#> 10:   male      Total income 11262049  674250565  674175625
#> 11:      m      Total income 11262049  674250565  674175625
#> 12:   male age_group1 income  4877164  294221368  294321199
#> 13:      m age_group1 income  4877164  294221368  294321199
#> 14:   male       ag1a income  4877164  294221368  294321199
#> 15:      m       ag1a income  4877164  294221368  294321199
#> 16:   male age_group2 income  2811379  166958567  166998608
#> 17:      m age_group2 income  2811379  166958567  166998608
#> 18:   male       ag2a income  2811379  166958567  166998608
#> 19:      m       ag2a income  2811379  166958567  166998608
#> 20:   male age_group3 income  2168169  128987112  129015795
#> 21:      m age_group3 income  2168169  128987112  129015795
#> 22:   male age_group4 income   978510   60358734   60375054
#> 23:      m age_group4 income   978510   60358734   60375054
#> 24:   male age_group5 income   393134   21599577   21687227
#> 25:      m age_group5 income   393134   21599577   21687227
#> 26:   male age_group6 income    33693    2125207    2198004
#> 27:      m age_group6 income    33693    2125207    2198004
#> 28: female      Total income 11690929  702755232  703051791
#> 29:      f      Total income 11690929  702755232  703051791
#> 30: female age_group1 income  4933383  296150572  296142272
#> 31:      f age_group1 income  4933383  296150572  296142272
#> 32: female       ag1a income  4933383  296150572  296142272
#> 33:      f       ag1a income  4933383  296150572  296142272
#> 34: female age_group2 income  2880740  173368984  173383331
#> 35:      f age_group2 income  2880740  173368984  173383331
#> 36: female       ag2a income  2880740  173368984  173383331
#> 37:      f       ag2a income  2880740  173368984  173383331
#> 38: female age_group3 income  2238777  134883636  134914416
#> 39:      f age_group3 income  2238777  134883636  134914416
#> 40: female age_group4 income  1155033   68276741   68470582
#> 41:      f age_group4 income  1155033   68276741   68470582
#> 42: female age_group5 income   455017   28913759   28953004
#> 43:      f age_group5 income   455017   28913759   28953004
#> 44: female age_group6 income    27979    1161540    1111365
#> 45:      f age_group6 income    27979    1161540    1111365
#>        sex        age  vname      uws         ws        pws
tab$numtab("income", mean_before_sum = FALSE)
#>        sex        age  vname      uws         ws        pws
#>     <char>     <char> <char>    <num>      <num>      <num>
#>  1:  Total      Total income 22952978 1377005797 1376975467
#>  2:  Total age_group1 income  9810547  590371940  590374047
#>  3:  Total       ag1a income  9810547  590371940  590374047
#>  4:  Total age_group2 income  5692119  340327551  340257732
#>  5:  Total       ag2a income  5692119  340327551  340257732
#>  6:  Total age_group3 income  4406946  263870748  263839006
#>  7:  Total age_group4 income  2133543  128635475  128629305
#>  8:  Total age_group5 income   848151   50513336   50458479
#>  9:  Total age_group6 income    61672    3286747    3167076
#> 10:   male      Total income 11262049  674250565  674213094
#> 11:      m      Total income 11262049  674250565  674213094
#> 12:   male age_group1 income  4877164  294221368  294271279
#> 13:      m age_group1 income  4877164  294221368  294271279
#> 14:   male       ag1a income  4877164  294221368  294271279
#> 15:      m       ag1a income  4877164  294221368  294271279
#> 16:   male age_group2 income  2811379  166958567  166978586
#> 17:      m age_group2 income  2811379  166958567  166978586
#> 18:   male       ag2a income  2811379  166958567  166978586
#> 19:      m       ag2a income  2811379  166958567  166978586
#> 20:   male age_group3 income  2168169  128987112  129001453
#> 21:      m age_group3 income  2168169  128987112  129001453
#> 22:   male age_group4 income   978510   60358734   60366894
#> 23:      m age_group4 income   978510   60358734   60366894
#> 24:   male age_group5 income   393134   21599577   21643358
#> 25:      m age_group5 income   393134   21599577   21643358
#> 26:   male age_group6 income    33693    2125207    2161299
#> 27:      m age_group6 income    33693    2125207    2161299
#> 28: female      Total income 11690929  702755232  702903496
#> 29:      f      Total income 11690929  702755232  702903496
#> 30: female age_group1 income  4933383  296150572  296146422
#> 31:      f age_group1 income  4933383  296150572  296146422
#> 32: female       ag1a income  4933383  296150572  296146422
#> 33:      f       ag1a income  4933383  296150572  296146422
#> 34: female age_group2 income  2880740  173368984  173376158
#> 35:      f age_group2 income  2880740  173368984  173376158
#> 36: female       ag2a income  2880740  173368984  173376158
#> 37:      f       ag2a income  2880740  173368984  173376158
#> 38: female age_group3 income  2238777  134883636  134899025
#> 39:      f age_group3 income  2238777  134883636  134899025
#> 40: female age_group4 income  1155033   68276741   68373593
#> 41:      f age_group4 income  1155033   68276741   68373593
#> 42: female age_group5 income   455017   28913759   28933375
#> 43:      f age_group5 income   455017   28913759   28933375
#> 44: female age_group6 income    27979    1161540    1136176
#> 45:      f age_group6 income    27979    1161540    1136176
#>        sex        age  vname      uws         ws        pws
tab$numtab("savings")
#>        sex        age   vname     uws        ws         pws
#>     <char>     <char>  <char>   <num>     <num>       <num>
#>  1:  Total      Total savings 2273532 136873818 136870309.5
#>  2:  Total age_group1 savings  982386  59422265  59420448.2
#>  3:  Total       ag1a savings  982386  59422265  59420448.2
#>  4:  Total age_group2 savings  552336  33246815  33243636.4
#>  5:  Total       ag2a savings  552336  33246815  33243636.4
#>  6:  Total age_group3 savings  437101  26180709  26181760.6
#>  7:  Total age_group4 savings  214661  12776589  12774656.3
#>  8:  Total age_group5 savings   80451   4886105   4878388.5
#>  9:  Total age_group6 savings    6597    361335    353629.9
#> 10:   male      Total savings 1159816  69662037  69662037.0
#> 11:      m      Total savings 1159816  69662037  69662037.0
#> 12:   male age_group1 savings  517660  31430472  31435293.2
#> 13:      m age_group1 savings  517660  31430472  31435293.2
#> 14:   male       ag1a savings  517660  31430472  31435293.2
#> 15:      m       ag1a savings  517660  31430472  31435293.2
#> 16:   male age_group2 savings  280923  16836241  16836943.0
#> 17:      m age_group2 savings  280923  16836241  16836943.0
#> 18:   male       ag2a savings  280923  16836241  16836943.0
#> 19:      m       ag2a savings  280923  16836241  16836943.0
#> 20:   male age_group3 savings  214970  12682929  12680329.3
#> 21:      m age_group3 savings  214970  12682929  12680329.3
#> 22:   male age_group4 savings   99420   5966130   5964685.1
#> 23:      m age_group4 savings   99420   5966130   5964685.1
#> 24:   male age_group5 savings   43233   2508522   2514248.6
#> 25:      m age_group5 savings   43233   2508522   2514248.6
#> 26:   male age_group6 savings    3610    237743    236468.4
#> 27:      m age_group6 savings    3610    237743    236468.4
#> 28: female      Total savings 1113716  67211781  67214410.0
#> 29:      f      Total savings 1113716  67211781  67214410.0
#> 30: female age_group1 savings  464726  27991793  27991815.6
#> 31:      f age_group1 savings  464726  27991793  27991815.6
#> 32: female       ag1a savings  464726  27991793  27991815.6
#> 33:      f       ag1a savings  464726  27991793  27991815.6
#> 34: female age_group2 savings  271413  16410574  16411672.7
#> 35:      f age_group2 savings  271413  16410574  16411672.7
#> 36: female       ag2a savings  271413  16410574  16411672.7
#> 37:      f       ag2a savings  271413  16410574  16411672.7
#> 38: female age_group3 savings  222131  13497780  13501050.9
#> 39:      f age_group3 savings  222131  13497780  13501050.9
#> 40: female age_group4 savings  115241   6810459   6819421.9
#> 41:      f age_group4 savings  115241   6810459   6819421.9
#> 42: female age_group5 savings   37218   2377583   2380704.9
#> 43:      f age_group5 savings   37218   2377583   2380704.9
#> 44: female age_group6 savings    2987    123592    122549.0
#> 45:      f age_group6 savings    2987    123592    122549.0
#>        sex        age   vname     uws        ws         pws

# results can be resetted, too
tab$reset_cntvars(v = "cnt_males")

# we can then set other parameters and perturb again
tab$params_cnts_set(val = p_cnts1, v = "cnt_males")
#> --> setting perturbation parameters for variable 'cnt_males'

tab$perturb(v = "cnt_males")
#> Count variable 'cnt_males' was perturbed.

# write results to a .csv file
tab$freqtab(
  v = c("total", "cnt_males"),
  path = file.path(tempdir(), "outtab.csv")
)
#> File '/tmp/Rtmpu511Tl/outtab.csv' successfully written to disk.
#> NULL

# show results containing weighted and unweighted results
tab$freqtab(v = c("total", "cnt_males"))
#>        sex        age     vname   uwc     wc  puwc        pwc
#>     <char>     <char>    <char> <num>  <num> <num>      <num>
#>  1:  Total      Total     total  4580 274707  4582 274826.959
#>  2:  Total age_group1     total  1969 118802  1969 118802.000
#>  3:  Total       ag1a     total  1969 118802  1969 118802.000
#>  4:  Total age_group2     total  1143  68583  1141  68462.995
#>  5:  Total       ag2a     total  1143  68583  1141  68462.995
#>  6:  Total age_group3     total   864  51473   864  51473.000
#>  7:  Total age_group4     total   423  25121   423  25121.000
#>  8:  Total age_group5     total   168   9970   167   9910.655
#>  9:  Total age_group6     total    13    758    13    758.000
#> 10:   male      Total     total  2296 137345  2296 137345.000
#> 11:      m      Total     total  2296 137345  2296 137345.000
#> 12:   male age_group1     total  1015  61334  1014  61273.572
#> 13:      m age_group1     total  1015  61334  1014  61273.572
#> 14:   male       ag1a     total  1015  61334  1014  61273.572
#> 15:      m       ag1a     total  1015  61334  1014  61273.572
#> 16:   male age_group2     total   571  34108   572  34167.734
#> 17:      m age_group2     total   571  34108   572  34167.734
#> 18:   male       ag2a     total   571  34108   572  34167.734
#> 19:      m       ag2a     total   571  34108   572  34167.734
#> 20:   male age_group3     total   424  24983   424  24983.000
#> 21:      m age_group3     total   424  24983   424  24983.000
#> 22:   male age_group4     total   195  11748   196  11808.246
#> 23:      m age_group4     total   195  11748   196  11808.246
#> 24:   male age_group5     total    84   4696    84   4696.000
#> 25:      m age_group5     total    84   4696    84   4696.000
#> 26:   male age_group6     total     7    476     7    476.000
#> 27:      m age_group6     total     7    476     7    476.000
#> 28: female      Total     total  2284 137362  2284 137362.000
#> 29:      f      Total     total  2284 137362  2284 137362.000
#> 30: female age_group1     total   954  57468   953  57407.761
#> 31:      f age_group1     total   954  57468   953  57407.761
#> 32: female       ag1a     total   954  57468   953  57407.761
#> 33:      f       ag1a     total   954  57468   953  57407.761
#> 34: female age_group2     total   572  34475   571  34414.729
#> 35:      f age_group2     total   572  34475   571  34414.729
#> 36: female       ag2a     total   572  34475   571  34414.729
#> 37:      f       ag2a     total   572  34475   571  34414.729
#> 38: female age_group3     total   440  26490   439  26429.795
#> 39:      f age_group3     total   440  26490   439  26429.795
#> 40: female age_group4     total   228  13373   228  13373.000
#> 41:      f age_group4     total   228  13373   228  13373.000
#> 42: female age_group5     total    84   5274    84   5274.000
#> 43:      f age_group5     total    84   5274    84   5274.000
#> 44: female age_group6     total     6    282     7    329.000
#> 45:      f age_group6     total     6    282     7    329.000
#> 46:  Total      Total cnt_males  2296 137345  2296 137345.000
#> 47:  Total age_group1 cnt_males  1015  61334  1014  61273.572
#> 48:  Total       ag1a cnt_males  1015  61334  1014  61273.572
#> 49:  Total age_group2 cnt_males   571  34108   572  34167.734
#> 50:  Total       ag2a cnt_males   571  34108   572  34167.734
#> 51:  Total age_group3 cnt_males   424  24983   424  24983.000
#> 52:  Total age_group4 cnt_males   195  11748   196  11808.246
#> 53:  Total age_group5 cnt_males    84   4696    84   4696.000
#> 54:  Total age_group6 cnt_males     7    476     7    476.000
#> 55:   male      Total cnt_males  2296 137345  2296 137345.000
#> 56:      m      Total cnt_males  2296 137345  2296 137345.000
#> 57:   male age_group1 cnt_males  1015  61334  1014  61273.572
#> 58:      m age_group1 cnt_males  1015  61334  1014  61273.572
#> 59:   male       ag1a cnt_males  1015  61334  1014  61273.572
#> 60:      m       ag1a cnt_males  1015  61334  1014  61273.572
#> 61:   male age_group2 cnt_males   571  34108   572  34167.734
#> 62:      m age_group2 cnt_males   571  34108   572  34167.734
#> 63:   male       ag2a cnt_males   571  34108   572  34167.734
#> 64:      m       ag2a cnt_males   571  34108   572  34167.734
#> 65:   male age_group3 cnt_males   424  24983   424  24983.000
#> 66:      m age_group3 cnt_males   424  24983   424  24983.000
#> 67:   male age_group4 cnt_males   195  11748   196  11808.246
#> 68:      m age_group4 cnt_males   195  11748   196  11808.246
#> 69:   male age_group5 cnt_males    84   4696    84   4696.000
#> 70:      m age_group5 cnt_males    84   4696    84   4696.000
#> 71:   male age_group6 cnt_males     7    476     7    476.000
#> 72:      m age_group6 cnt_males     7    476     7    476.000
#> 73: female      Total cnt_males     0      0     0      0.000
#> 74:      f      Total cnt_males     0      0     0      0.000
#> 75: female age_group1 cnt_males     0      0     0      0.000
#> 76:      f age_group1 cnt_males     0      0     0      0.000
#> 77: female       ag1a cnt_males     0      0     0      0.000
#> 78:      f       ag1a cnt_males     0      0     0      0.000
#> 79: female age_group2 cnt_males     0      0     0      0.000
#> 80:      f age_group2 cnt_males     0      0     0      0.000
#> 81: female       ag2a cnt_males     0      0     0      0.000
#> 82:      f       ag2a cnt_males     0      0     0      0.000
#> 83: female age_group3 cnt_males     0      0     0      0.000
#> 84:      f age_group3 cnt_males     0      0     0      0.000
#> 85: female age_group4 cnt_males     0      0     0      0.000
#> 86:      f age_group4 cnt_males     0      0     0      0.000
#> 87: female age_group5 cnt_males     0      0     0      0.000
#> 88:      f age_group5 cnt_males     0      0     0      0.000
#> 89: female age_group6 cnt_males     0      0     0      0.000
#> 90:      f age_group6 cnt_males     0      0     0      0.000
#>        sex        age     vname   uwc     wc  puwc        pwc

# utility measures for a count variable
tab$measures_cnts(v = "total", exclude_zeros = TRUE)
#> $overview
#>     noise   cnt        pct
#>    <fctr> <int>      <num>
#> 1:     -2     1 0.02222222
#> 2:     -1     8 0.17777778
#> 3:      0    19 0.42222222
#> 4:      1    15 0.33333333
#> 5:      2     2 0.04444444
#> 
#> $measures
#>       what    d1    d2    d3
#>     <char> <num> <num> <num>
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 0.000 0.000 0.000
#>  3:    Q20 0.000 0.000 0.000
#>  4:    Q30 0.000 0.000 0.000
#>  5:    Q40 0.000 0.000 0.000
#>  6:   Mean 0.644 0.008 0.020
#>  7: Median 1.000 0.001 0.016
#>  8:    Q60 1.000 0.001 0.016
#>  9:    Q70 1.000 0.002 0.021
#> 10:    Q80 1.000 0.002 0.022
#> 11:    Q90 1.000 0.004 0.033
#> 12:    Q95 1.800 0.006 0.038
#> 13:    Q99 2.000 0.167 0.196
#> 14:    Max 2.000 0.167 0.196
#> 
#> $cumdistr_d1
#>       cat   cnt       pct
#>    <char> <int>     <num>
#> 1:      0    19 0.4222222
#> 2:      1    42 0.9333333
#> 3:      2    45 1.0000000
#> 
#> $cumdistr_d2
#>            cat   cnt       pct
#>         <char> <int>     <num>
#> 1:    [0,0.02]    43 0.9555556
#> 2: (0.02,0.05]    43 0.9555556
#> 3:  (0.05,0.1]    43 0.9555556
#> 4:   (0.1,0.2]    45 1.0000000
#> 5:   (0.2,0.3]    45 1.0000000
#> 6:   (0.3,0.4]    45 1.0000000
#> 7:   (0.4,0.5]    45 1.0000000
#> 8:   (0.5,Inf]    45 1.0000000
#> 
#> $cumdistr_d3
#>            cat   cnt       pct
#>         <char> <int>     <num>
#> 1:    [0,0.02]    28 0.6222222
#> 2: (0.02,0.05]    43 0.9555556
#> 3:  (0.05,0.1]    43 0.9555556
#> 4:   (0.1,0.2]    45 1.0000000
#> 5:   (0.2,0.3]    45 1.0000000
#> 6:   (0.3,0.4]    45 1.0000000
#> 7:   (0.4,0.5]    45 1.0000000
#> 8:   (0.5,Inf]    45 1.0000000
#> 
#> $false_zero
#> [1] 0
#> 
#> $false_nonzero
#> [1] 0
#> 
#> $exclude_zeros
#> [1] TRUE
#> 

# modifications for perturbed count variables
tab$mod_cnts()
#>         sex        age row_nr  pert       ckey  countvar
#>      <char>     <char>  <num> <int>      <num>    <char>
#>   1:  Total      Total     17     2 0.95532247     total
#>   2:  Total age_group1     15     0 0.44707334     total
#>   3:  Total       ag1a     15     0 0.44707334     total
#>   4:  Total age_group2     13    -2 0.01327944     total
#>   5:  Total       ag2a     13    -2 0.01327944     total
#>  ---                                                    
#> 131:      f age_group4     -1     0 0.00000000 cnt_males
#> 132: female age_group5     -1     0 0.00000000 cnt_males
#> 133:      f age_group5     -1     0 0.00000000 cnt_males
#> 134: female age_group6     -1     0 0.00000000 cnt_males
#> 135:      f age_group6     -1     0 0.00000000 cnt_males

# display a summary about utility measures
tab$summary()
#> ┌──────────────────────────────────────────────┐
#> │Utility measures for perturbed count variables│
#> └──────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#>          countvar   Min   Q10   Q20   Q30   Q40   Mean Median   Q60   Q70   Q80
#>            <char> <num> <num> <num> <num> <num>  <num>  <num> <num> <num> <num>
#> 1:          total    -2    -1    -1    -1   0.0 -0.200      0     0     0   0.2
#> 2: cnt_highincome    -4    -3    -2    -2  -1.4 -0.778     -1     0     0   1.0
#> 3:      cnt_males    -1    -1     0     0   0.0  0.067      0     0     0   0.2
#>      Q90   Q95   Q99   Max
#>    <num> <num> <num> <num>
#> 1:     1     1  1.56     2
#> 2:     2     2  2.00     2
#> 3:     1     1  1.00     1
#> 
#> ── Distance-based measures ─────────────────────────────────────────────────────
#> ✔ Variable: 'total'
#> 
#>       what    d1    d2    d3
#>     <char> <num> <num> <num>
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 0.000 0.000 0.000
#>  3:    Q20 0.000 0.000 0.000
#>  4:    Q30 0.000 0.000 0.000
#>  5:    Q40 0.000 0.000 0.000
#>  6:   Mean 0.644 0.008 0.020
#>  7: Median 1.000 0.001 0.016
#>  8:    Q60 1.000 0.001 0.016
#>  9:    Q70 1.000 0.002 0.021
#> 10:    Q80 1.000 0.002 0.022
#> 11:    Q90 1.000 0.004 0.033
#> 12:    Q95 1.800 0.006 0.038
#> 13:    Q99 2.000 0.167 0.196
#> 14:    Max 2.000 0.167 0.196
#> 
#> ✔ Variable: 'cnt_males'
#> 
#>       what    d1    d2    d3
#>     <char> <num> <num> <num>
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 0.000 0.000 0.000
#>  3:    Q20 0.000 0.000 0.000
#>  4:    Q30 0.000 0.000 0.000
#>  5:    Q40 0.000 0.000 0.000
#>  6:   Mean 0.556 0.001 0.012
#>  7: Median 1.000 0.001 0.016
#>  8:    Q60 1.000 0.001 0.016
#>  9:    Q70 1.000 0.002 0.021
#> 10:    Q80 1.000 0.002 0.021
#> 11:    Q90 1.000 0.003 0.027
#> 12:    Q95 1.000 0.005 0.036
#> 13:    Q99 1.000 0.005 0.036
#> 14:    Max 1.000 0.005 0.036
#> 
#> ✔ Variable: 'cnt_highincome'
#> 
#>       what    d1    d2    d3
#>     <char> <num> <num> <num>
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 1.000 0.005 0.034
#>  3:    Q20 1.000 0.012 0.072
#>  4:    Q30 1.000 0.020 0.084
#>  5:    Q40 1.600 0.024 0.100
#>  6:   Mean 1.775 0.043 0.119
#>  7: Median 2.000 0.024 0.106
#>  8:    Q60 2.000 0.030 0.118
#>  9:    Q70 2.000 0.035 0.126
#> 10:    Q80 2.000 0.039 0.144
#> 11:    Q90 3.100 0.062 0.200
#> 12:    Q95 4.000 0.150 0.210
#> 13:    Q99 4.000 0.286 0.410
#> 14:    Max 4.000 0.286 0.410
#> 
#> ┌──────────────────────────────────────────────────┐
#> │Utility measures for perturbed numerical variables│
#> └──────────────────────────────────────────────────┘
#> ── Distribution statistics of perturbations ────────────────────────────────────
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#> Warning: no non-missing arguments to min; returning Inf
#> Warning: no non-missing arguments to max; returning -Inf
#>      vname         Min        Q10        Q20       Q30      Q40     Mean
#>     <char>       <num>      <num>      <num>     <num>    <num>    <num>
#> 1:  expend         Inf         NA         NA        NA       NA      NaN
#> 2:  income -119670.898 -37471.122 -25364.211 -4149.947 7173.574 12287.54
#> 3: savings   -7716.547  -2947.004  -1816.778 -1228.303   13.560   699.44
#> 4:   mixed         Inf         NA         NA        NA       NA      NaN
#>      Median      Q60       Q70       Q80       Q90       Q95        Q99
#>       <num>    <num>     <num>     <num>     <num>     <num>      <num>
#> 1:       NA       NA        NA        NA        NA        NA         NA
#> 2: 8159.516 17079.81 20019.388 43780.682 49911.087 96851.629 148263.749
#> 3:  702.049  1070.46  2322.902  3270.909  4821.238  5726.561   8962.924
#> 4:       NA       NA        NA        NA        NA        NA         NA
#>           Max
#>         <num>
#> 1:       -Inf
#> 2: 148263.749
#> 3:   8962.924
#> 4:       -Inf
# }

R6 Class defining statistical tables that can be perturbed

Arguments

Value

Details

Methods

Public methods

Method new()

Usage

Arguments

Returns

Method perturb()

Usage

Arguments

Returns

Method freqtab()

Usage

Arguments

Returns

Method numtab()

Usage

Arguments

Returns

Method measures_cnts()

Usage

Arguments

Returns

Method measures_nums()

Usage

Arguments

Returns

Method allvars()

Usage

Returns

Method cntvars()

Usage

Returns

Method numvars()

Usage

Returns

Method hierarchy_info()

Usage

Returns

Method mod_cnts()

Usage

Returns

Method mod_nums()

Usage

Returns

Method supp_freq()

Usage

Arguments

Returns

Method supp_val()

Usage

Arguments

Returns

Method supp_cells()

Usage

Arguments

Returns

Method supp_p()

Usage

Arguments

Returns

Method supp_pq()

Usage

Arguments

Returns

Method supp_nk()

Usage

Arguments

Returns

Method params_cnts_get()

Usage

Returns

Method params_cnts_set()

Usage

Arguments

Returns

Method reset_cntvars()

Method `new()`

Method `perturb()`

Method `freqtab()`

Method `numtab()`

Method `measures_cnts()`

Method `measures_nums()`

Method `allvars()`

Method `cntvars()`

Method `numvars()`

Method `hierarchy_info()`

Method `mod_cnts()`

Method `mod_nums()`

Method `supp_freq()`

Method `supp_val()`

Method `supp_cells()`

Method `supp_p()`

Method `supp_pq()`

Method `supp_nk()`

Method `params_cnts_get()`

Method `params_cnts_set()`

Method `reset_cntvars()`

Method `reset_numvars()`

Method `reset_allvars()`

Method `params_nums_get()`

Method `params_nums_set()`

Method `summary()`

Method `print()`