Quantitative Analysis of Geomasking Methods

Spatial information at the most detailed level (coordinates) has been recognized as important information in various fields such as epidemiology, medicine, and social science. Geographic coordinates are used to identify the relationship between behaviors and environmental factors called social-spatial linkage. However, the release of geographic coordinates makes the identification of the respondents' address relatively easy. Therefore, geographic masking methods (short: geomasking methods) have been proposed to preserve the privacy of the respondents' true location.

Geographic masking methods can be divided into three categories: aggregation, adjusting coordinates, and coordinate replacement. Since the second category contains most geomasking methods, it can be further divided into three subcategories. The first subcategory contains methods that scale, rotate, displace or flip coordinates. The second subcategory contains methods that move points into a random direction and random distance. The last subcategory contains methods that also move locations into a random direction and a random distance but need additional information to be applied.

Decreasing the risk of identifying an address is often accompanied by a decreasing utility of the spatial information. This risk-utility-trade-off can be visualized in a risk-utility map. Various risk and utility measures are used to perform a quantitative analysis of geographic masking methods. Utility measures are preserving descriptive statistics, preserving distances, spatial autocorrelation, and preserving clusters.

Risk, which is commonly assessed using k-anonymity, is addressed in more detail in this thesis. As common in the field of record linkage, it was assumed that an intruder has a data set containing the true location of some respondents. Several methods are used to identify the correct matches between the data set containing the masked locations and the data set containing the true locations. These methods are the minimum distance approach, the Hungarian algorithm, the graph theoretic linkage attack, and the graph matching attack on privacy-preserving record linkage. In addition, taking the mean of several masked coordinates is used, and, for some methods, attempts are made to reverse the displacement of the coordinates caused by the masking methods.

The risk-utility maps show that most masking methods succeed in preserving the utility but also show a high risk of re-identification. The maps also reveal that using k-anonymity alone is not appropriate, as coordinates masked with geomasking methods based on the k-anonymity approach were still re-identified. The only geomasking methods that succeed in hiding the true location while showing good utility preserving properties fall into the category of replacing the coordinates by only allowing the publication of a distance matrix. 


Citation style:
Could not load citation form.


Use and reproduction:
All rights reserved