This function computes utility/information loss measures based on two numeric vectors (original and perturbed)

ck_cnt_measures(orig, pert, exclude_zeros = TRUE)

Arguments

orig

a numeric vector holding original values

pert

a numeric vector holding perturbed values

exclude_zeros

a scalar logical value; if TRUE (the default), all only cells with counts > 0 are used when computing distances d1, d2 and d3. If this argument is FALSE, the complete vector is used.

Value

a list containing the following elements:

  • overview: a data.table with the following three columns:

    • noise: amount of noise computed as orig - pert

    • cnt: number of cells perturbed with the value given in column noise

    • pct: percentage of cells perturbed with the value given in column noise

  • measures: a data.table containing measures of the distribution of three different distances between original and perturbed values of the unweighted counts. Column what specifies the computed measure. The three distances considered are:

    • d1: absolute distance between original and masked values

    • d2: relative absolute distance between original and masked values

    • d3: absolute distance between square-roots of original and perturbed values

  • cumdistr_d1, cumdistr_d2 and cumdistr_d3: for each distance d1, d2 and d3, a data.table with the following three columns:

    • cat: a specific value (for d1) or interval (for distances d2 and d3)

    • cnt: number of records smaller or equal the value in column cat for the given distance

    • pct: proportion of records smaller or equal the value in column cat for the selected distance

  • false_zero: number of cells that were perturbed to zero

  • false_nonzero: number of cells that were initially zero but have been perturbed to a number different from zero

  • exclude_zeros: were empty cells exluded from computation or not

Examples

orig <- c(1:10, 0, 0)
pert <- orig; pert[c(1, 5, 7)] <- c(0, 6, 9)

# ignore empty cells when computing measures `d1`, `d2`, `d3`
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = TRUE)
#> $overview
#>    noise cnt        pct
#> 1:    -2   1 0.08333333
#> 2:    -1   1 0.08333333
#> 3:     0   9 0.75000000
#> 4:     1   1 0.08333333
#> 
#> $measures
#>       what    d1    d2    d3
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 0.000 0.000 0.000
#>  3:    Q20 0.000 0.000 0.000
#>  4:    Q30 0.000 0.000 0.000
#>  5:    Q40 0.000 0.000 0.000
#>  6:   Mean 0.333 0.054 0.063
#>  7: Median 0.000 0.000 0.000
#>  8:    Q60 0.000 0.000 0.000
#>  9:    Q70 0.000 0.000 0.000
#> 10:    Q80 0.400 0.080 0.085
#> 11:    Q90 1.200 0.217 0.242
#> 12:    Q95 1.600 0.251 0.298
#> 13:    Q99 1.920 0.279 0.343
#> 14:    Max 2.000 0.286 0.354
#> 
#> $cumdistr_d1
#>    cat cnt       pct
#> 1:   0   7 0.7777778
#> 2:   1   8 0.8888889
#> 3:   2   9 1.0000000
#> 
#> $cumdistr_d2
#>            cat cnt       pct
#> 1:    [0,0.02]   7 0.7777778
#> 2: (0.02,0.05]   7 0.7777778
#> 3:  (0.05,0.1]   7 0.7777778
#> 4:   (0.1,0.2]   8 0.8888889
#> 5:   (0.2,0.3]   9 1.0000000
#> 6:   (0.3,0.4]   9 1.0000000
#> 7:   (0.4,0.5]   9 1.0000000
#> 8:   (0.5,Inf]   9 1.0000000
#> 
#> $cumdistr_d3
#>            cat cnt       pct
#> 1:    [0,0.02]   7 0.7777778
#> 2: (0.02,0.05]   7 0.7777778
#> 3:  (0.05,0.1]   7 0.7777778
#> 4:   (0.1,0.2]   7 0.7777778
#> 5:   (0.2,0.3]   8 0.8888889
#> 6:   (0.3,0.4]   9 1.0000000
#> 7:   (0.4,0.5]   9 1.0000000
#> 8:   (0.5,Inf]   9 1.0000000
#> 
#> $false_zero
#> [1] 1
#> 
#> $false_nonzero
#> [1] 0
#> 
#> $exclude_zeros
#> [1] TRUE
#> 

# use all cells
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = FALSE)
#> $overview
#>    noise cnt        pct
#> 1:    -2   1 0.08333333
#> 2:    -1   1 0.08333333
#> 3:     0   9 0.75000000
#> 4:     1   1 0.08333333
#> 
#> $measures
#>       what    d1    d2    d3
#>  1:    Min 0.000 0.000 0.000
#>  2:    Q10 0.000 0.000 0.000
#>  3:    Q20 0.000 0.000 0.000
#>  4:    Q30 0.000 0.000 0.000
#>  5:    Q40 0.000 0.000 0.000
#>  6:   Mean 0.333 0.124 0.131
#>  7: Median 0.000 0.000 0.000
#>  8:    Q60 0.000 0.000 0.000
#>  9:    Q70 0.000 0.000 0.000
#> 10:    Q80 0.800 0.160 0.171
#> 11:    Q90 1.000 0.277 0.340
#> 12:    Q95 1.450 0.607 0.645
#> 13:    Q99 1.890 0.921 0.929
#> 14:    Max 2.000 1.000 1.000
#> 
#> $cumdistr_d1
#>    cat cnt       pct
#> 1:   0   9 0.7500000
#> 2:   1  11 0.9166667
#> 3:   2  12 1.0000000
#> 
#> $cumdistr_d2
#>            cat cnt       pct
#> 1:    [0,0.02]   9 0.7500000
#> 2: (0.02,0.05]   9 0.7500000
#> 3:  (0.05,0.1]   9 0.7500000
#> 4:   (0.1,0.2]  10 0.8333333
#> 5:   (0.2,0.3]  11 0.9166667
#> 6:   (0.3,0.4]  11 0.9166667
#> 7:   (0.4,0.5]  11 0.9166667
#> 8:   (0.5,Inf]  12 1.0000000
#> 
#> $cumdistr_d3
#>            cat cnt       pct
#> 1:    [0,0.02]   9 0.7500000
#> 2: (0.02,0.05]   9 0.7500000
#> 3:  (0.05,0.1]   9 0.7500000
#> 4:   (0.1,0.2]   9 0.7500000
#> 5:   (0.2,0.3]  10 0.8333333
#> 6:   (0.3,0.4]  11 0.9166667
#> 7:   (0.4,0.5]  11 0.9166667
#> 8:   (0.5,Inf]  12 1.0000000
#> 
#> $false_zero
#> [1] 1
#> 
#> $false_nonzero
#> [1] 0
#> 
#> $exclude_zeros
#> [1] FALSE
#> 

# for an application on a perturbed object, see ?cellkey_pkg