This function computes utility/information loss measures based on two numeric vectors (original and perturbed)
ck_cnt_measures(orig, pert, exclude_zeros = TRUE)
a numeric vector holding original values
a numeric vector holding perturbed values
a scalar logical value; if TRUE
(the default), all only cells
with counts > 0
are used when computing distances d1
, d2
and d3
. If this
argument is FALSE
, the complete vector is used.
a list
containing the following elements:
overview
: a data.table
with the following three columns:
noise
: amount of noise computed as orig
- pert
cnt
: number of cells perturbed with the value given in column noise
pct
: percentage of cells perturbed with the value given in column noise
measures
: a data.table
containing measures of the distribution
of three different distances between original and perturbed values
of the unweighted counts. Column what
specifies the computed measure.
The three distances considered are:
d1
: absolute distance between original and masked values
d2
: relative absolute distance between original and masked values
d3
: absolute distance between square-roots of original and perturbed
values
cumdistr_d1
, cumdistr_d2
and cumdistr_d3
: for each distance d1
, d2
and d3
, a data.table
with the following three columns:
cat
: a specific value (for d1
) or interval (for distances d2
and d3
)
cnt
: number of records smaller or equal the value in column cat
for the
given distance
pct
: proportion of records smaller or equal the value
in column cat
for the selected distance
false_zero
: number of cells that were perturbed to zero
false_nonzero
: number of cells that were initially zero but
have been perturbed to a number different from zero
exclude_zeros
: were empty cells exluded from computation or not
orig <- c(1:10, 0, 0)
pert <- orig; pert[c(1, 5, 7)] <- c(0, 6, 9)
# ignore empty cells when computing measures `d1`, `d2`, `d3`
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = TRUE)
#> $overview
#> noise cnt pct
#> 1: -2 1 0.08333333
#> 2: -1 1 0.08333333
#> 3: 0 9 0.75000000
#> 4: 1 1 0.08333333
#>
#> $measures
#> what d1 d2 d3
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.333 0.054 0.063
#> 7: Median 0.000 0.000 0.000
#> 8: Q60 0.000 0.000 0.000
#> 9: Q70 0.000 0.000 0.000
#> 10: Q80 0.400 0.080 0.085
#> 11: Q90 1.200 0.217 0.242
#> 12: Q95 1.600 0.251 0.298
#> 13: Q99 1.920 0.279 0.343
#> 14: Max 2.000 0.286 0.354
#>
#> $cumdistr_d1
#> cat cnt pct
#> 1: 0 7 0.7777778
#> 2: 1 8 0.8888889
#> 3: 2 9 1.0000000
#>
#> $cumdistr_d2
#> cat cnt pct
#> 1: [0,0.02] 7 0.7777778
#> 2: (0.02,0.05] 7 0.7777778
#> 3: (0.05,0.1] 7 0.7777778
#> 4: (0.1,0.2] 8 0.8888889
#> 5: (0.2,0.3] 9 1.0000000
#> 6: (0.3,0.4] 9 1.0000000
#> 7: (0.4,0.5] 9 1.0000000
#> 8: (0.5,Inf] 9 1.0000000
#>
#> $cumdistr_d3
#> cat cnt pct
#> 1: [0,0.02] 7 0.7777778
#> 2: (0.02,0.05] 7 0.7777778
#> 3: (0.05,0.1] 7 0.7777778
#> 4: (0.1,0.2] 7 0.7777778
#> 5: (0.2,0.3] 8 0.8888889
#> 6: (0.3,0.4] 9 1.0000000
#> 7: (0.4,0.5] 9 1.0000000
#> 8: (0.5,Inf] 9 1.0000000
#>
#> $false_zero
#> [1] 1
#>
#> $false_nonzero
#> [1] 0
#>
#> $exclude_zeros
#> [1] TRUE
#>
# use all cells
ck_cnt_measures(orig = orig, pert = pert, exclude_zeros = FALSE)
#> $overview
#> noise cnt pct
#> 1: -2 1 0.08333333
#> 2: -1 1 0.08333333
#> 3: 0 9 0.75000000
#> 4: 1 1 0.08333333
#>
#> $measures
#> what d1 d2 d3
#> 1: Min 0.000 0.000 0.000
#> 2: Q10 0.000 0.000 0.000
#> 3: Q20 0.000 0.000 0.000
#> 4: Q30 0.000 0.000 0.000
#> 5: Q40 0.000 0.000 0.000
#> 6: Mean 0.333 0.124 0.131
#> 7: Median 0.000 0.000 0.000
#> 8: Q60 0.000 0.000 0.000
#> 9: Q70 0.000 0.000 0.000
#> 10: Q80 0.800 0.160 0.171
#> 11: Q90 1.000 0.277 0.340
#> 12: Q95 1.450 0.607 0.645
#> 13: Q99 1.890 0.921 0.929
#> 14: Max 2.000 1.000 1.000
#>
#> $cumdistr_d1
#> cat cnt pct
#> 1: 0 9 0.7500000
#> 2: 1 11 0.9166667
#> 3: 2 12 1.0000000
#>
#> $cumdistr_d2
#> cat cnt pct
#> 1: [0,0.02] 9 0.7500000
#> 2: (0.02,0.05] 9 0.7500000
#> 3: (0.05,0.1] 9 0.7500000
#> 4: (0.1,0.2] 10 0.8333333
#> 5: (0.2,0.3] 11 0.9166667
#> 6: (0.3,0.4] 11 0.9166667
#> 7: (0.4,0.5] 11 0.9166667
#> 8: (0.5,Inf] 12 1.0000000
#>
#> $cumdistr_d3
#> cat cnt pct
#> 1: [0,0.02] 9 0.7500000
#> 2: (0.02,0.05] 9 0.7500000
#> 3: (0.05,0.1] 9 0.7500000
#> 4: (0.1,0.2] 9 0.7500000
#> 5: (0.2,0.3] 10 0.8333333
#> 6: (0.3,0.4] 11 0.9166667
#> 7: (0.4,0.5] 11 0.9166667
#> 8: (0.5,Inf] 12 1.0000000
#>
#> $false_zero
#> [1] 1
#>
#> $false_nonzero
#> [1] 0
#>
#> $exclude_zeros
#> [1] FALSE
#>
# for an application on a perturbed object, see ?cellkey_pkg