IBM develops software to reduce personal data in AI training

By William Smith

January 26, 2021

undefined mins

The IBM AI Privacy and Compliance Toolkit allows data scientists to create machine learning models that protect the privacy of training data...

Researchers at US technology giant IBM have developed ways of improving the protection of privacy during the training of artificial intelligence models.

The AI Privacy and Compliance Toolkit allows data scientists to create machine learning models that protect the privacy of training data while following the necessary data protection regulations.

Overcoming AI security issues

The issue is that, even if training data itself is not exposed, AI trained on real data might leak sensitive information if someone is determined enough.

The IBM software, which assesses privacy risk, has applications in industries ranging from fintech to health care to insurance - anywhere that relies on sensitivity training data. The software involves a number of approaches, including differential privacy (DP).

In a blog post, Abigail Goldsteen, Researcher in Data Security & Privacy, IBM Research, said: “Applied during the training process, DP could limit the effect of anyone’s data on the model’s output. It gives robust, mathematical privacy guarantees against potential attacks on a user, while still delivering accurate population statistics. [...] However, DP excels only when there’s just one or a few models to train. That’s because it’s necessary to apply a different method for each specific model type and architecture, making this tool tricky to use in large organizations with a lot of different models.”

The specifics

To counteract that, data can be anonymised before the model is trained. The process involves generalising data, by removing specific values and instead providing a blurred range. IBM’s innovation in its software is to tailor the extent of that process to the needs of the organisation.

“This technology anonymizes machine learning models while being guided by the model itself,” said Goldsteen. “We customize the data generalizations, optimizing them for the model’s specific analysis – resulting in an anonymized model with higher accuracy. The method is agnostic to the specific learning algorithm and can be easily applied to any machine learning model.

AI IBM Cybersecurity privacy

RelatedContent

IBM develops software to reduce personal data in AI training

Overcoming AI security issues

The specifics

Featured Articles

Why Businesses are Building AI Strategy on Amazon Bedrock

Pick N Pay’s Leon Van Niekerk: Evaluating Enterprise AI

AI Agenda at Paris 2024: Revolutionising the Olympic Games

Who is Gurdeep Singh Pall? Qualtrics’ AI Strategy President

Should Tech Leaders be Concerned About the Power of AI?

Andrew Ng Joins Amazon Board to Support Enterprise AI