primarySuppression() identifies and suppresses primary sensitive table cells
within sdcProblem objects based on specified rules.
Arguments
- object
an sdcProblem object.
- type
character string defining the primary suppression rule. Allowed values are:
"freq": apply frequency rule (usesmaxNandallowZeros)."nk": apply nk-dominance rule (usesnandk)."p": apply p-percent rule (usesp)."pq": apply pq-rule (usespandq).
- ...
parameters for the selected primary suppression rule:
maxN: scalar integerish; cells with counts <=maxNare suppressed. Defaults to3.allowZeros: scalar logical; ifTRUE, empty cells (frequency0) are considered sensitive. Defaults toFALSE. Empty cells not flagged as sensitive are marked asz(published, but not used for secondary suppression).p: scalar integerish; threshold parameter forp-percent orpq-rules. Defaults to80.q: scalar numeric; threshold parameter for pq-rule. Defaults to50.n: scalar integerish; parameternfor the nk-dominance rule. Defaults to2.k: scalar integerish; parameterkfor the nk-dominance rule. Defaults to85.numVarName: scalar character; the name of the numerical variable used for dominance-based rules. This is mandatory fornk,p, andpqrules.
Value
an sdcProblem object.
Details
Currently, the frequency rule, nk-dominance rule, p-percent rule
and pq-rule are supported.
Note: Since version 0.29, dominance-based rules ("p", "pq", or "nk")
require the identification of the numerical variable by name using the
numVarName argument. Identification by index is no longer supported.
Note
The “nk-dominance, p`-percent, and `pq`-rules require microdata as input to
makeProblem().
Cell Status Codes
When an object of class sdcProblem is created, every cell is assigned an
internal anonymization state. These codes track the lifecycle of a
cell throughout the suppression process:
"s"("Publishable"): The cell is safe and can be published (default)."z"("Forced Publishable"): The cell must not be suppressed under any circumstances."u"(*"Primary Suppressed"): The cell is identified as sensitive and requires protection (typically assigned viaprimarySuppression())."x"("Secondary Suppressed"): The cell is suppressed to protect primary sensitive cells (typically assigned viaprotectTable())."w"("Extension Cell"): The cell exists in the object but will never be published. It is treated as a suppressed cell by internal algorithms. This is useful if it is known in advance that specific codes of a hierarchy will be excluded from publication.
Initially in makeProblem(), cells are coded as "s" or "z". Their
status transitions to "u" or "x" as suppression algorithms are applied.
While setInfo() or change_cellstatus() allow for manual state
modifications, primarySuppression() automatically identifies primary
sensitive cells and assigns state "u". In protectTable(), additional
cells are identified and set to "x" to ensure all primary sensitive
cells are adequately protected.
Author
Bernhard Meindl bernhard.meindl@statistik.gv.at
Examples
# load micro data and create a problem instance
utils::data("microdata1", package = "sdcTable")
p <- sdc_testproblem(with_supps = FALSE)
# apply frequency rule
p1 <- primarySuppression(
object = p,
type = "freq",
maxN = 2
)
# apply p-percent rule (requires specifying a numeric variable)
p2 <- primarySuppression(
object = p,
type = "p",
p = 30,
numVarName = "val"
)
# compare results
data.frame(
p1_sdc = getInfo(p1, type = "sdcStatus"),
p2_sdc = getInfo(p2, type = "sdcStatus")
)
#> p1_sdc p2_sdc
#> 1 s s
#> 2 s s
#> 3 s s
#> 4 s s
#> 5 s s
#> 6 u u
#> 7 s s
#> 8 s s
#> 9 s s
#> 10 s s
#> 11 s s
#> 12 s s
#> 13 s s
#> 14 s u
#> 15 s s