SUDA risk measure for data from (stratified) simple random sampling.
suda2(obj, ...)
a data.frame
or a sdcMicroObj-object
see arguments below
variables
Categorical (key) variables. Either the column names or and
index of the variables to be used for risk measurement.
missing
: Missing value coding in the given data set.
DisFraction
: It is the sampling fraction for the simple random
sampling, and the common sampling fraction for stratified sampling. By
default, it's set to 0.01.
original_scores
: if this argument is TRUE
(the default), the
suda-scores are computed as described in paper "SUDA: A Program for Detecting Special
Uniques" by Elliot et al., if FALSE
, the computation of the scores
is slightly different as it was done in the original implementation
of the algorithm by the IHSN.
A modified sdcMicroObj object or the following list
ContributionPercent
: The contribution of each key variable to the SUDA
score, calculated for each row.
score
: The suda score
`disscore: The dis suda score
attribute_contributions:
a data.frame
showing how much of the total
risk is contributed by each variable. This information is stored in the
following two variables:
variable
: containing the name of the variable
contribution
: contains how much risk a variable contributes to the total risk.
attribute_level_contributions
: returns risks of each attribute-level as a
data.frame
with the following three columns:
variable
: the variable name
attribute
: holding relevant level-codes
contribution
: contains the risk of this level within the variable.
Suda 2 is a recursive algorithm for finding Minimal Sample Uniques. The algorithm generates all possible variable subsets of defined categorical key variables and scans them for unique patterns in the subsets of variables. The lower the amount of variables needed to receive uniqueness, the higher the risk of the corresponding observation.
Since version >5.0.2, the computation of suda-scores has changed and is now by default as described in the original paper by Elliot et al.
C. J. Skinner; M. J. Elliot (20xx) A Measure of Disclosure Risk for Microdata. Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 64 (4), pp 855–867.
M. J. Elliot, A. Manning, K. Mayes, J. Gurd and M. Bane (20xx) SUDA: A Program for Detecting Special Uniques, Using DIS to Modify the Classification of Special Uniques
Anna M. Manning, David J. Haglin, John A. Keane (2008) A recursive search algorithm for statistical disclosure assessment. Data Min Knowl Disc 16:165 – 196
Templ, M. Statistical Disclosure Control for Microdata: Methods and Applications in R. Springer International Publishing, 287 pages, 2017. ISBN 978-3-319-50272-4. doi:10.1007/978-3-319-50272-4