Oct 8, 2020

Data poisoning: a new front in the AI cyber war

Machine Learning
data poisoning
Paddy Smith
4 min
data poisoning
Data poisoning exploits training data to deliberately mislead machine learning algorithms, and it’s on the rise. Here’s what you need to know...

Machine learning is big business. It’s a core element in the design of ‘automagical’ tools, which intelligently parse data to give humans a critical edge in anything from strategy planning for business to identifying the plants in their flower beds. It’s also frequently (and somewhat mistakenly) conflated with artificial intelligence (AI). It’s so effective that few large enterprises are not at least considering its implications for improving data analysis and automating parts of their operational machinery; it’s a core pillar of digital transformation projects.

An attack on the fidelity of machine learning’s ability to correctly identify types of data could be catastrophic now, and has the potential to be apocalyptic in a digitally-transformed future. Which is why data poisoning – the deliberate corruption of machine learning algorithms – is such a critical threat.

What is data poisoning?

Machine learning algorithms are impressive at dealing with large volumes of data, but they must be trained properly using well-labelled, accurate training data. Corrupting the training data leads to algorithmic missteps that are amplified by ongoing data crunching using poor parametric specifications. Data poisoning exploits this weakness by deliberately polluting the training data to mislead the machine learning algorithm and render the output either obfuscatory or harmful.

Data poisoning isn’t new exactly. Early examples, where spam filters where targeted by cybercriminals, were seen as long ago as 2004.

How does data poisoning work?

Data poisoning relies on the inherent weaknesses of machine learning. While human brains are adept at recognising what is important in a pattern and rejecting what is not, software can only work with the basics and is (currently) unable to tune out interference that may be incidental, rather than indicative. To theorise an example: a machine learning program is shown 500 pictures of black dogs labelled as ‘dog’ and 500 pictures of white cats labelled as ‘cat’. Now the algorithm is shown a picture of a white dog. Output: cat.

The training data in the example is woefully inadequate for a real-world scenario. Yet machine learning software has been tricked by simple visual elements such as logos and watermarks precisely because it cannot – as a human would – identify this visual information as being incidental to the pertinent image information.

Similar tricks can be played with numerical and text data sources.

Who are the bad actors in data poisoning?

Just as machine learning can create competitive advantage, it can be used by unscrupulous competitors to frustrate business operations. Think of data poisoning as a new type of corporate espionage, yet instead of finding out your competitor’s secrets, you hide their own information from them, or deliberately lead them to poor interpretations of their own data.

A bad actor could also use data poisoning to obfuscate transactional data at a bank, preventing AI-led identification of money laundering operations, for example. Or it could be used as ransomware, or a tool for activists who want to frustrate a business operation. Financial markets could also be used to profit from data-led swings orchestrated by feeding poisoned data to quantitative analysis software. A data poisoning cyberattack at government or military level might also be possible. A terrorist faction could, theoretically, use data poisoning to subvert AI-led air traffic control at a major airport.

Data poisoning can also be used in software certification, allowing cybercriminals to circumvent cybersecurity by ‘teaching’ the algorithm to treat malicious code tagged in the correct way as clear for deployment.

How does data become poisoned?

Although machine learning is capable of tripping itself up without guidance, to achieve a specific result a human bad actor needs access to the training data. In the case of an organisation using its own data, this requires infiltration. However, a major concern is that ‘pre-packed’ training data could be an easier target, and such data is already in common usage by companies who are managing project costs. It’s also the case that training data could be poisoned on a platform level, where a company opts to use third-party services to manage its AI requirements.

How to eliminate the data poisoning threat?

The best defence against a data poisoning attack is to use your own training data and be vigilant about who labels it and how. But a better holistic defence might be to look at training a secondary tier of AI to spot mistakes in your primary data analysis. Technology companies such as IBM are already white-hatting data poisoning attacks to find solutions.

In the interim of truly effective oversight or solutions it’s worth bearing in mind that, despite all its advances, machine learning is in its infancy. Companies should retain human oversight on data analysis to check for anomalies in algorithmic learning.

One of the best known real-world data poisoning hacks was orchestrated by data scientists at New York University, who were able to train autonomous vehicle software to recognise a stop sign as a speed limit sign. The lesson, for drivers of semi-autonomous cars and the business intelligence community is: keep your eyes on the road and your hands on the wheel.

Share article

Jun 17, 2021

Chinese Firm Taigusys Launches Emotion-Recognition System

Elise Leise
3 min
Critics claim that new AI emotion-recognition platforms like Taigusys could infringe on Chinese citizens’ rights

In a detailed investigative report, the Guardian reported that Chinese tech company Taigusys can now monitor facial expressions. The company claims that it can track fake smiles, chart genuine emotions, and help police curtail security threats. ‘Ordinary people here in China aren’t happy about this technology, but they have no choice. If the police say there have to be cameras in a community, people will just have to live with it’, said Chen Wei, company founder and chairman. ‘There’s always that demand, and we’re here to fulfil it’. 


Who Will Use the Data? 

As of right now, the emotion-recognition market is supposed to be worth US$36bn by 2023—which hints at rapid global adoption. Taigusys counts Huawei, China Mobile, China Unicom, and PetroChina among its 36 clients, but none of them has yet revealed if they’ve purchased the new AI. In addition, Taigusys will likely implement the technology in Chinese prisons, schools, and nursing homes.


It’s not likely that emotion-recognition AI will stay within the realm of private enterprise. President Xi Jinping has promoted ‘positive energy’ among citizens and intimated that negative expressions are no good for a healthy society. If the Chinese central government continues to gain control over private companies’ tech data, national officials could use emotional data for ideological purposes—and target ‘unhappy’ or ‘suspicious’ citizens. 


How Does It Work? 

Taigusys’s AI will track facial muscle movements, body motions, and other biometric data to infer how a person is feeling, collecting massive amounts of personal data for machine learning purposes. If an individual displays too much negative emotion, the platform can recommend him or her for what’s termed ‘emotional support’—and what may end up being much worse. 


Can We Really Detect Human Emotions? 

This is still up for debate, but many critics say no. Psychologists still debate whether human emotions can be separated into basic emotions such as fear, joy, and surprise across cultures or whether something more complex is at stake. Many claim that AI emotion-reading technology is not only unethical but inaccurate since facial expressions don’t necessarily indicate someone’s true emotional state. 


In addition, Taigusys’s facial tracking system could promote racial bias. One of the company’s systems classes faces as ‘yellow, white, or black’; another distinguishes between Uyghur and Han Chinese; and sometimes, the technology picks up certain ethnic features better than others. 


Is China the Only One? 

Not a chance. Other countries have also tried to decode and use emotions. In 2007, the U.S. Transportation Security Administration (TSA) launched a heavily contested training programme (SPOT) that taught airport personnel to monitor passengers for signs of stress, deception, and fear. But China as a nation rarely discusses bias, and as a result, its AI-based discrimination could be more dangerous. 


‘That Chinese conceptions of race are going to be built into technology and exported to other parts of the world is troubling, particularly since there isn’t the kind of critical discourse [about racism and ethnicity in China] that we’re having in the United States’, said Shazeda Ahmed, an AI researcher at New York University (NYU)


Taigusys’s founder points out, on the other hand, that its system can help prevent tragic violence, citing a 2020 stabbing of 41 people in Guangxi Province. Yet top academics remain unconvinced. As Sandra Wachter, associate professor and senior research fellow at the University of Oxford’s Internet Institute, said: ‘[If this continues], we will see a clash with fundamental human rights, such as free expression and the right to privacy’. 


Share article