Optimal Parameter Choice for Bloom Filter-based Privacy-preserving Record Linkage
Record Linkage is, for most scientific disciplines, an increasingly popular
set of methods to gather or enrich research data for analysis.
Since in most countries, perfectly unique personal identifier numbers
(PIDs) are not available, data linkage is restricted to attributes
like names and birth dates to discriminate between records and
their corresponding real-life entities.</br> However, these identifiers are
often legally required to be encrypted. This gave way to the field
of Privacy-preserving Record Linkage (PPRL). Recently, Bloom filters
have gained much attention in PPRL research.</br> Hindering their
widespread use is the fact that choosing the right parameters for private
linkage operations will, at the moment, require in-depth expert
knowledge about the data, since the quality of Privacy-preserving
Record Linkage using Bloom filters is highly dependent on
the encryption parameter choices.</br> Since there is currently no literature
about the optimal choice for these parameters, this thesis aims
for an optimal choice automation method for best linkage quality
using model estimates based on simulations of the entire parameter
space. After giving an in-depth overview of the state of the art
in PPRL, the approach is described in depth. The resulting models
are then tested using simulated and real-world data sets. Using a
naive approach based on current recommendations is tested against
the encryption parameters resulting from the model estimates. The
results are compared in-depth for each data set.</br> It can be shown
that the optimal parameter choices consistently outperform current
best-practice parameter settings, sometimes drastically. The
thesis concludes with an outlook on open research questions and
closes with updated recommendations for Bloom filter (BF)-based
Privacy-preserving Record Linkage.
Preview
Cite
Rights
Use and reproduction:
This work may be used under aCreative Commons Attribution - NonCommercial - ShareAlike 4.0 License (CC BY-NC-SA 4.0)
.