Research Press Release

Technology: Interaction data may allow identification of anonymized individuals over time

Nature Communications

January 26, 2022

Records of people’s interactions could be used to identify individuals in anonymized datasets across long periods of time, suggests a study published in Nature Communications. The findings suggest current practices when handling this type of data may not meet anonymization standards set by the European Union’s General Data Protection Regulations.

Fine-grained interaction data is collected by messaging apps, mobile phone carriers, social media providers and other apps in order to operate their services or for research purposes. It has been used to study the interaction patterns of individuals, forecast the spatial spread of epidemics, and the effects of friendships on political mobilisation. Under current data protection regulations this data can be shared and sold without the user’s consent, providing it is anonymized.

Yves-Alexandre de Montjoye, Ana-Maria Cretu and colleagues found that people’s interaction data remains stable over long periods of time and that this could be used to identify individuals in anonymized datasets. The authors developed a deep learning-based model, which they trained to identify individuals based on their interaction network, and applied it to a dataset of over 40,000 individuals collected over different periods of time. The model was able to identify 52% of individuals based on their 2-hop interaction network (interactions with individuals twice removed from the target individual). Using an individual’s direct contacts, the model could identify people 15% of the time. As the interactions remain stable over time, the authors were also able to identify 24% of people after 20 weeks using their 2-hop interaction network. When the model was applied to a Bluetooth close-proximity dataset of 587 people it could identify individuals more than 26% of the time. However, the authors note that they do not believe their model would be applicable to contact tracing protocols, such as Google and Apple’s Exposure Notification.

The authors argue their results demonstrate that anonymized and disconnected interaction data may be identifiable over long periods of time, which has implications for compliance with privacy legislations. They suggest that security measures including access controls and privacy-enhancing systems could be used to protect against this.

doi:10.1038/s41467-021-27714-6

Return to research highlights

PrivacyMark System