Measures `IL_correl()`

and `IL_variables()`

were proposed by Andrzej Mlodak and are (theoretically) bounded between `0`

and `1`

.

- x
an object coercible to a

`data.frame`

representing the original dataset- xm
an object coercible to a

`data.frame`

representing the perturbed, modified dataset- digits
number digits used for rounding when displaying results

- ...
additional parameter for print-methods; currently ignored

the corresponding information-loss measure

`IL_correl()`

: is a information-loss measure that can be applied to common numerically scaled variables in`x`

and`xm`

. It is based on diagonal entries of inverse correlation matrices in the original and perturbed data.`IL_variables()`

: for common-variables in`x`

and`xm`

the individual distance-functions depend on the class of the variable; specifically these functions are different for numeric variables, ordered-factors and character/factor variables. The individual distances are summed up and scaled by`n * m`

with`n`

being the number of records and`m`

being the number of (common) variables.

Details can be found in the references below

The implementation of `IL_correl()`

differs slightly with the original proposition from Mlodak, A. (2020) as
the constant multiplier was changed to `1 / sqrt(2)`

instead of `1/2`

for better efficiency and interpretability
of the measure.

Mlodak, A. (2020). Information loss resulting from statistical disclosure control of output data, Wiadomosci Statystyczne. The Polish Statistician, 2020, 65(9), 7-27, DOI: 10.5604/01.3001.0014.4121

Mlodak, A. (2019). Using the Complex Measure in an Assessment of the Information Loss Due to the Microdata Disclosure Control, Przegląd Statystyczny, 2019, 66(1), 7-26, DOI: 10.5604/01.3001.0013.8285

```
data("Tarragona", package = "sdcMicro")
res1 <- addNoise(obj = Tarragona, variables = colnames(Tarragona), noise = 100)
IL_correl(x = as.data.frame(res1$x), xm = as.data.frame(res1$xm))
#> Number of records (x): 834 | Number of records (xm): 834
#> Number of common numeric variables: 13
#> Overall information loss: 0.473
res2 <- addNoise(obj = Tarragona, variables = colnames(Tarragona), noise = 25)
IL_correl(x = as.data.frame(res2$x), xm = as.data.frame(res2$xm))
#> Number of records (x): 834 | Number of records (xm): 834
#> Number of common numeric variables: 13
#> Overall information loss: 0.23
# creating test-inputs
n <- 150
x <- xm <- data.frame(
v1 = factor(sample(letters[1:5], n, replace = TRUE), levels = letters[1:5]),
v2 = rnorm(n),
v3 = runif(3),
v4 = ordered(sample(LETTERS[1:3], n, replace = TRUE), levels = c("A", "B", "C"))
)
xm$v1[1:5] <- "a"
xm$v2 <- rnorm(n, mean = 5)
xm$v4[1:5] <- "A"
IL_variables(x, xm)
#> Number of records: 150
#> Number of variables: 4
#> Overall information loss: 0.223
#> Individual information losses for variables:
#> variable loss
#> v1 0.020
#> v2 0.859
#> v3 0.000
#> v4 0.013
```