Function primarySuppression()
is used to identify and suppress primary
sensitive table cells in sdcProblem objects.
Argument type
allows to select a rule that should be used to identify
primary sensitive cells. At the moment it is possible to identify and
suppress sensitive table cells using the frequency-rule, the nk-dominance
rule and the p-percent rule.
primarySuppression(object, type, ...)
a sdcProblem object
character vector of length 1 defining the primary suppression rule. Allowed types are:
freq
: apply frequency rule with parameters maxN
and allowZeros
nk
: apply nk-dominance rule with parameters n
, k
p
: apply p-percent rule with parameter p
pq
: apply pq-rule with parameters p
and q
parameters used in the identification of primary sensitive cells. Parameters that can be modified|changed are:
maxN
: numeric vector of length 1 used when applying the frequency rule.
All cells having counts <= maxN
are set as primary suppressed. The default
value of maxN
is 3
.
allowZeros
: logical value defining if empty cells (with frequency = 0)
should be considered sensitive when using the frequency rule. Empty cells are
never considered as sensitive when applying dominance rules; The default
value of allowZeros
is FALSE
so that empty cells are not
considered primary sensitive by default. Such cells (frequency 0) are then
flagged as z
which indicates such a cell may be published but should (internally)
not be used for (secondary) suppression in the heuristic algorithms.
p
: numeric vector of length 1 specifying parameter p
that is used
when applying the p-percent rule with default value of 80
.
pq
: numeric vector of length 2 specifying parameters p
and q
that
are used when applying the pq-rule with the default being c(25
, 50
).
n
: numeric vector of length 1 specifying parameter n
that is used
when applying the nk-dominance rule. Parameter n
is set to 2
by default.
k
: scalar numeric specifying parameter k
that is used
when applying the nk-dominance rule. Parameter n
is set to 85
by default.
numVarName
: character scalar specifying the name
of the numerical variable that should be used to identify cells that are
dominated by dominance rules (p-rule
, pq-rule
or nk-rule
). This setting
is mandatory in package versions >= 0.29
If type
is either 'nk', 'p' or 'pq', it is mandatory to
specify either numVarInd
or numVarName
.
numVarInd
: same as numVarName
but a scalar numeric
specifying the index of the variable is expected. If both numVarName
and numVarInd
are specified, numVarName
is used. The index refers to the
index of the specified numvars in makeProblem()
. This argument is no longer
respected in versions >= 0.29
where numVarName
must be used.
a sdcProblem object
since versions >= 0.29
it is no longer possible to specify underlying
variables for dominance rules ("p"
, "pq"
or "nk"
) by index; these variables must
be set by name using argument numVarName
.
the nk-dominance rule, the p-percent rule and the pq-rule can only
be applied if micro data have been used as input data to function makeProblem()
# load micro data
utils::data("microdata1", package = "sdcTable")
# load problem (as it was created in the example in ?makeProblem
p <- sdc_testproblem(with_supps = FALSE)
# we have a look at the frequency table by gender and region
xtabs(rep(1, nrow(microdata1)) ~ gender + region, data = microdata1)
#> region
#> gender A B C D
#> female 2 19 10 14
#> male 18 14 12 11
# 2 units contribute to cell with region=='A' and gender=='female'
# --> this cell is considered sensitive according the the
# freq-rule with 'maxN' equal to 2!
p1 <- primarySuppression(
object = p,
type = "freq",
maxN = 2
)
# we can also apply a p-percent rule with parameter "p" being 30 as below.
# This is only possible if we are dealing with micro data and we also
# have to specify the name of a numeric variable.
p2 <- primarySuppression(
object = p,
type = "p",
p = 30,
numVarName = "val"
)
#> computing contributing indices | rawdata <--> table; this might take a while
# looking at anonymization states we see, that one cell is primary
# suppressed (sdcStatus == "u")
# the remaining cells are possible candidates for secondary cell
# suppression (sdcStatus == "s") given the frequency rule with
# parameter "maxN = 2".
#
# Applying the p-percent rule with parameter 'p = 30' resulted in
# two primary suppressions.
data.frame(
p1_sdc = getInfo(p1, type = "sdcStatus"),
p2_sdc = getInfo(p2, type = "sdcStatus")
)
#> p1_sdc p2_sdc
#> 1 s s
#> 2 s s
#> 3 s s
#> 4 s s
#> 5 s s
#> 6 u u
#> 7 s s
#> 8 s s
#> 9 s s
#> 10 s s
#> 11 s s
#> 12 s s
#> 13 s s
#> 14 s u
#> 15 s s