Data Suppression Needs More Guidelines, UM Professor Says

Recent research shows weakness in some techniques to make data private

By Clara Turnage

New research from the University of Mississippi calls into question the methods that online shopping services use to keep users’ data private.

Online companies—including social media, online shopping, or online portal for medical facilities—collect data from users, which they use to create personalized advertising or refine marketing strategies. Often, they sell this information to other parties after making the data anonymous.

Charles Walter, assistant professor of computer and information science, and recent Ole Miss graduate Thomas Cilloni found that efforts to make this data anonymous can sometimes be reversed. In a study published in the Institute of Electrical and Electronics Engineers Xplore, they used machine learning to reveal personal data from an anonymous data set with up to 80% accuracy.

“Your data identifies you,” Walter said. “It’s incredibly important to recognize this as a problem in order to fix it. That’s why we conduct studies like this—to identify problems we need to fix.”

Data privacy describes how well online entities handle, use and store personal data. Several states—including Colorado, California and Virginia—and the European Union have passed data privacy laws but do not always provide enough regulation to keep users safe, Walter said.

Colorado, California, and Virginia’s laws require users to opt out of data collection. The EU’s General Data Protection Regulation regulates data capture, storage, usage, and sharing not just for EU-based companies, but for any online company that services those countries.

“These laws say data has to be anonymized, but they don’t describe how,” Walter said. “A lot of companies would just pull the names off of (data), but that’s not nearly enough.

“So, the question here is how do you anonymize data effectively? How do you define anonymization?”

In the study, the researchers were able to determine a person’s level of education, income level, and whether they had children with high accuracy. Methods to keep data private should ensure that no individual user can be identified from the available data, said Cilloni, who graduated with a doctoral degree in computer science in 2023.

“You may think that a Social Security number is personally identifiable—and it is—but there are personally identifiable pieces of information that you may not expect,” he said. “Having just a birth date and birth location can very likely identify who you are.

“What we really want is for nobody to be uniquely identified by their data. There must always be a certain level of uncertainty—that’s how we ensure privacy.”

Users should also understand what keeps their data private to foster trust with online business, Cilloni said.

“Understanding how our personal information can be processed lets us make more conscious choices about what we want to share and how,” he said. “In turn, these conscious choices can increase the level of trust we have in the tools we use online.”

While nations and companies better define anonymization, it is important for users to take precautions with their data, too, the researchers said.

Choose which cookies you allow to track your data, read the terms of service, be careful with what information you share on social media, and every time you put data on the internet, make sure it is information you would be comfortable sharing with a stranger, Walter said.

“The key takeaway is a warning,” Cilloni said. “Your personal data that you share online—once it’s out, it’s out. Be careful about what you share with the internet.

“Even years after you share your personal information, that information could be around somewhere. Be mindful of what you share with the internet.”