To be used on categorical data stored as factors. The algorithm randomly changes the values of variables in selected records (usually the risky ones) according to an invariant probability transition matrix or a custom-defined transition matrix.
pram(obj, variables = NULL, strata_variables = NULL, pd = 0.8, alpha = 0.5)
Input data. Allowed input data are objects of class
data.frame
, factor
or sdcMicroObj.
Names of variables in obj
on which post-randomization
should be applied. If obj
is a factor, this argument is ignored. Please note that
pram can only be applied to factor-variables.
names of variables for stratification (will be set automatically for an object of class sdcMicroObj. One can also specify an integer vector or factor that specifies that desired groups. This vector must match the dimension of the input data set, however. For a possible use case, have a look at the examples.
minimum diagonal entries for the generated transition matrix P.
Either a vector of length 1 (which is recycled) or a vector of the same length as
the number of variables that should be postrandomized. It is also possible to set pd
to a numeric matrix. This matrix will be used directly as the transition matrix. The matrix must
be constructed as follows:
the matrix must be a square matrix
the rownames and colnames of the matrix must match the levels (in the same order) of the factor-variable that should be postrandomized.
the rowSums and colSums of the matrix need to equal 1
It is also possible to combine the different ways. For details have a look at the examples.
amount of perturbation for the invariant Pram method. This is a numeric vector
of length 1 (that will be recycled if necessary) or a vector of the same length as the number
of variables. If one specified as transition matrix directly, alpha
is ignored.
a modified sdcMicroObj object or a new object containing original and post-randomized variables (with suffix "_pram").
Deprecated method 'pram_strata' is no longer available in sdcMicro > 4.5.0
https://www.gnu.org/software/glpk/
Kowarik, A. and Templ, M. and Meindl, B. and Fonteneau, F. and Prantner, B.: Testing of IHSN Cpp Code and Inclusion of New Methods into sdcMicro, in: Lecture Notes in Computer Science, J. Domingo-Ferrer, I. Tinnirello (editors.); Springer, Berlin, 2012, ISBN: 978-3-642-33626-3, pp. 63-77. doi:10.1007/978-3-642-33627-0_6
Templ, M. and Kowarik, A. and Meindl, B.: Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. in: Journal of Statistical Software, 67 (4), 1–36, 2015. doi:10.18637/jss.v067.i04
Templ, M.: Statistical Disclosure Control for Microdata: Methods and Applications in R. in: Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4
data(testdata)
# \donttest{
## donttest is necessary because of
## Examples with CPU time > 2.5 times elapsed time
## caused by using C++ code and/or data.table
## using a factor variable as input
res <- pram(as.factor(testdata$roof))
print(res)
#> Number of changed observations:
#> - - - - - - - - - - -
#> x != x_pram : 301 (6.57%)
summary(res)
#> Variable: x
#>
#> ----------------------
#>
#> Frequencies in original and perturbed data:
#> x 2 4 5 6 9 NA
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: Original Frequencies 814 3697 19 34 16 0
#> 2: Frequencies after Perturbation 815 3696 23 33 13 0
#>
#> Transitions:
#> transition Frequency
#> <char> <int>
#> 1: 1 --> 1 685
#> 2: 1 --> 2 123
#> 3: 1 --> 3 1
#> 4: 1 --> 4 3
#> 5: 1 --> 5 2
#> 6: 2 --> 1 124
#> 7: 2 --> 2 3552
#> 8: 2 --> 3 7
#> 9: 2 --> 4 11
#> 10: 2 --> 5 3
#> 11: 3 --> 2 4
#> 12: 3 --> 3 15
#> 13: 4 --> 1 5
#> 14: 4 --> 2 10
#> 15: 4 --> 4 19
#> 16: 5 --> 1 1
#> 17: 5 --> 2 7
#> 18: 5 --> 5 8
#>
## using a data.frame as input
## pram can only be applied to factors
## -- > we have to recode to factors beforehand
testdata$roof <- factor(testdata$roof)
testdata$walls <- factor(testdata$walls)
testdata$water <- factor(testdata$water)
## pram() is applied within subgroups defined by
## variables "urbrur" and "sex"
res <- pram(
obj = testdata,
variables = "roof",
strata_variables = c("urbrur", "sex"))
print(res)
#> Number of changed observations:
#> - - - - - - - - - - -
#> roof != roof_pram : 250 (5.46%)
summary(res)
#> Variable: roof
#>
#> ----------------------
#>
#> Frequencies in original and perturbed data:
#> roof 2 4 5 6 9 NA
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: Original Frequencies 814 3697 19 34 16 0
#> 2: Frequencies after Perturbation 815 3707 21 20 17 0
#>
#> Transitions:
#> transition Frequency
#> <char> <int>
#> 1: 1 --> 1 712
#> 2: 1 --> 2 97
#> 3: 1 --> 3 1
#> 4: 1 --> 4 1
#> 5: 1 --> 5 3
#> 6: 2 --> 1 100
#> 7: 2 --> 2 3581
#> 8: 2 --> 3 7
#> 9: 2 --> 4 3
#> 10: 2 --> 5 6
#> 11: 3 --> 1 1
#> 12: 3 --> 2 5
#> 13: 3 --> 3 13
#> 14: 4 --> 1 2
#> 15: 4 --> 2 16
#> 16: 4 --> 4 16
#> 17: 5 --> 2 8
#> 18: 5 --> 5 8
#>
## default parameters (pd = 0.8 and alpha = 0.5) for the generation
## of the invariant transition matrix will be used for all variables
res1 <- pram(
obj = testdata,
variables = c("roof", "walls", "water"))
print(res1)
#> Number of changed observations:
#> - - - - - - - - - - -
#> roof != roof_pram : 129 (2.82%)
#> walls != walls_pram : 370 (8.08%)
#> water != water_pram : 196 (4.28%)
## specific parameter settings for each variable
res2 <- pram(
obj = testdata,
variables = c("roof", "walls", "water"),
pd = c(0.95, 0.8, 0.9),
alpha = 0.5)
print(res2)
#> Number of changed observations:
#> - - - - - - - - - - -
#> roof != roof_pram : 96 (2.1%)
#> walls != walls_pram : 207 (4.52%)
#> water != water_pram : 189 (4.13%)
## detailed information on pram-parameters (such as the transition matrix 'Rs')
## is stored in the output, eg. for variable 'roof'
#attr(res2, "pram_params")$roof
## we can also specify a custom transition-matrix directly
mat <- diag(length(levels(testdata$roof)))
rownames(mat) <- colnames(mat) <- levels(testdata$roof)
res3 <- pram(
obj = testdata,
variables = "roof",
pd = mat)
print(res3) # of course, nothing has changed!
#> Number of changed observations:
#> - - - - - - - - - - -
#> roof != roof_pram : 0 (0%)
## it is possible use a transition matrix for a variable and use the 'traditional' way
## of specifying a number for the minimal diagonal entries of the transision matrix
## for other variables. In this case we must supply `pd` as list.
res4 <- pram(
obj = testdata,
variables = c("roof", "walls"),
pd = list(mat, 0.5),
alpha = c(NA, 0.5))
print(res4)
#> Number of changed observations:
#> - - - - - - - - - - -
#> roof != roof_pram : 0 (0%)
#> walls != walls_pram : 537 (11.72%)
summary(res4)
#> Variable: roof
#>
#> ----------------------
#>
#> Frequencies in original and perturbed data:
#> roof 2 4 5 6 9 NA
#> <char> <char> <char> <char> <char> <char> <char>
#> 1: Original Frequencies 814 3697 19 34 16 0
#> 2: Frequencies after Perturbation 814 3697 19 34 16 0
#>
#> Transitions:
#> transition Frequency
#> <char> <int>
#> 1: 1 --> 1 814
#> 2: 2 --> 2 3697
#> 3: 3 --> 3 19
#> 4: 4 --> 4 34
#> 5: 5 --> 5 16
#>
#> Variable: walls
#>
#> ----------------------
#>
#> Frequencies in original and perturbed data:
#> walls 2 3 9 NA
#> <char> <char> <char> <char> <char>
#> 1: Original Frequencies 1203 3327 50 0
#> 2: Frequencies after Perturbation 1182 3343 55 0
#>
#> Transitions:
#> transition Frequency
#> <char> <int>
#> 1: 1 --> 1 947
#> 2: 1 --> 2 254
#> 3: 1 --> 3 2
#> 4: 2 --> 1 234
#> 5: 2 --> 2 3068
#> 6: 2 --> 3 25
#> 7: 3 --> 1 1
#> 8: 3 --> 2 21
#> 9: 3 --> 3 28
#>
attr(res4, "pram_params")
#> $roof
#> $roof$Rs
#> 2 4 5 6 9
#> 2 1 0 0 0 0
#> 4 0 1 0 0 0
#> 5 0 0 1 0 0
#> 6 0 0 0 1 0
#> 9 0 0 0 0 1
#>
#> $roof$pd
#> 2 4 5 6 9
#> 2 1 0 0 0 0
#> 4 0 1 0 0 0
#> 5 0 0 1 0 0
#> 6 0 0 0 1 0
#> 9 0 0 0 0 1
#>
#> $roof$alpha
#> [1] NA
#>
#>
#> $walls
#> $walls$Rs
#> 2 3 9
#> 2 0.78415786 0.2150333 0.0008088368
#> 3 0.07775325 0.9154115 0.0068352155
#> 9 0.01946061 0.4548152 0.5257241473
#>
#> $walls$pd
#> [1] 0.5
#>
#> $walls$alpha
#> [1] 0.5
#>
#>
## application to objects of class sdcMicro with default parameters
data(testdata2)
testdata2$urbrur <- factor(testdata2$urbrur)
sdc <- createSdcObj(
dat = testdata2,
keyVars = c("roof", "walls", "water", "electcon", "relat", "sex"),
numVars = c("expend", "income", "savings"),
w = "sampling_weight")
sdc <- pram(
obj = sdc,
variables = "urbrur")
print(sdc, type = "pram")
#> Post-Randomization (PRAM):
#> Variable:urbrur
#> --> final Transition-Matrix:
#> 1 2
#> 1 0.82312481 0.1768752
#> 2 0.07235803 0.9276420
#>
#> Changed observations:
#> variable nrChanges percChanges
#> 1 urbrur 8 8.6
#> ----------------------------------------------------------------------
#>
## this is equal to the previous application. If argument 'variables' is NULL,
## all variables from slot 'pramVars' will be used if possible.
sdc <- createSdcObj(
dat = testdata2,
keyVars = c("roof", "walls", "water", "electcon", "relat", "sex"),
numVars = c("expend", "income", "savings"),
w = "sampling_weight",
pramVars = "urbrur")
sdc <- pram(sdc)
print(sdc, type="pram")
#> Post-Randomization (PRAM):
#> Variable:urbrur
#> --> final Transition-Matrix:
#> 1 2
#> 1 0.81443714 0.1855629
#> 2 0.07591208 0.9240879
#>
#> Changed observations:
#> variable nrChanges percChanges
#> 1 urbrur 14 15.05
#> ----------------------------------------------------------------------
#>
## we can specify transition matrices for sdcMicroObj-objects too
testdata2$roof <- factor(testdata2$roof)
sdc <- createSdcObj(
dat = testdata2,
keyVars = c("roof", "walls", "water", "electcon", "relat", "sex"),
numVars = c("expend", "income", "savings"),
w = "sampling_weight")
mat <- diag(length(levels(testdata2$roof)))
rownames(mat) <- colnames(mat) <- levels(testdata2$roof)
mat[1,] <- c(0.9, 0, 0, 0.05, 0.05)
sdc <- pram(
obj = sdc,
variables = "roof",
pd = mat)
#> Warning: If pram is applied on key variables, the k-anonymity and risk assessment are not useful anymore.
print(sdc, type = "pram")
#> Post-Randomization (PRAM):
#> Variable:roof
#> --> final Transition-Matrix:
#> 2 4 5 6 9
#> 2 0.9 0 0 0.05 0.05
#> 4 0.0 1 0 0.00 0.00
#> 5 0.0 0 1 0.00 0.00
#> 6 0.0 0 0 1.00 0.00
#> 9 0.0 0 0 0.00 1.00
#>
#> Changed observations:
#> variable nrChanges percChanges
#> 1 roof 3 3.23
#> ----------------------------------------------------------------------
#>
## we can also have a look at the transitions
get.sdcMicroObj(sdc, "pram")$transitions
#> $roof
#> transition Frequency
#> <char> <int>
#> 1: 1 --> 1 24
#> 2: 1 --> 4 1
#> 3: 1 --> 5 2
#> 4: 2 --> 2 46
#> 5: 3 --> 3 6
#> 6: 4 --> 4 8
#> 7: 5 --> 5 6
#>
# }