Mukund Srinath

Analysis and Simplification of Privacy Policies at Scale

September 27, 2023 17:00 UTC

Simplifying documents that no one ever reads.

Privacy policies are the only documents that internet users can use to understand what happens with their data online. Although users report that they care about their digital privacy, they often do not read privacy policies, since understanding them requires a significant investment of time and effort. Attempts to simplify privacy policies using natural language processing have achieved some success, but they face limitations of scalability and generalization. While this puts an onus on researchers and policy regulators to protect users against unfair data practices, they often lack the tools to address this problem at scale. To this end, we present a multifaceted approach to address the online privacy landscape. Our methodology aims to improve user comprehensibility of policies using natural language processing, empower privacy researchers to conduct impactful research, and enable regulators to combat privacy violations at scale. In this talk I will focus on 1) Creation of a large-scale longitudinal corpus of privacy policies and a transformer-based privacy policy language model. 2) Creation of PrivaSeer, a novel privacy policy search engine. 3) Information extraction from privacy policies.

Mukund is a PhD student in the Human Language Technologies lab at Pennsylvania State University. His primary research focuses on building scalable information retrieval and natural language processing systems. He is also interested in and making machine learning models trustworthy and fair.