IBM develops software to reduce personal data in AI training
Researchers at US technology giant IBM have developed ways of improving the protection of privacy during the training of artificial intelligence models.
The AI Privacy and Compliance Toolkit allows data scientists to create machine learning models that protect the privacy of training data while following the necessary data protection regulations.
Overcoming AI security issues
The issue is that, even if training data itself is not exposed, AI trained on real data might leak sensitive information if someone is determined enough.
The IBM software, which assesses privacy risk, has applications in industries ranging from fintech to health care to insurance - anywhere that relies on sensitivity training data. The software involves a number of approaches, including differential privacy (DP).
In , Abigail Goldsteen, Researcher in Data Security & Privacy, IBM Research, said: “Applied during the training process, DP could limit the effect of anyone’s data on the model’s output. It gives robust, mathematical privacy guarantees against potential attacks on a user, while still delivering accurate population statistics. [...] However, DP excels only when there’s just one or a few models to train. That’s because it’s necessary to apply a different method for each specific model type and architecture, making this tool tricky to use in large organizations with a lot of different models.”
To counteract that, data can be anonymised before the model is trained. The process involves generalising data, by removing specific values and instead providing a blurred range. IBM’s innovation in its software is to tailor the extent of that process to the needs of the organisation.
“This technology anonymizes machine learning models while being guided by the model itself,” said Goldsteen. “We customize the data generalizations, optimizing them for the model’s specific analysis – resulting in an anonymized model with higher accuracy. The method is agnostic to the specific learning algorithm and can be easily applied to any machine learning model.