‘Cheap tricks’ and data mismanagement stymie AI progress
Researchers say “cheap tricks” and data mismanagement are hobbling the ability of AI to live up to its full potential and are promoting bias.
The comments are from a survey of dataset development and its use in ML research called Data and Its Discontents, authored by University of Washington linguists Amandalynne Paullada and Emily M Bender, Mozilla Foundation’s Inioluwa Deborah Raji and Emily Denton and Alex Hanna from Google Research.
In the paper, they say, “Datasets have played a foundational role in the advancement of machine learning research… However, recent work from a breadth of perspectives has revealed the limitations of predominant practices in dataset collection and use.”
The team points to visual and natural language processing as areas where bias has been identified in ML datasets. Underrepresentation of dark skin tones and female pronouns have been found in western data catalogues, while image classification datasets have been found to include pornographic images, leading to their removal.
The report says: “While deep learning models have seemed to achieve remarkable performance on challenging tasks in artificial intelligence, recent work has illustrated how these performance gains may be due largely to ‘cheap tricks’ rather than human-like reasoning capabilities.”
'Much to learn'
It goes on to say that “the machine learning community still has much to learn from other disciplines with respect to how they handle the data of human subjects”.
In a conclusion bearing an epigraph from Toni Cade Bambara (“Not all speed is movement”), the researchers said, “We argue that fixes that focus narrowly on improving datasets by making them more representative or more challenging might miss the more general point raised by these critiques, and we’ll be trapped in a game of dataset whack-a-mole rather than making progress, so long as notions of ‘progress’ are largely defined by performance on datasets.
“At the same time, we wish to recognize and honor the liberatory potential of datasets, when carefully designed, to make visible patterns of injustice in the world such that they may be addressed (see, for example, the work of Data for Black Lives7). Recent work by Register and Ko  illustrates how educational interventions that guide students through the process of collecting their own personal data and running it through machine learning pipelines can equip them with skills and technical literacy toward self-advocacy – a promising lesson for the next generation of machine learning practitioners and for those impacted by machine learning systems.”
The team advocates a more careful approach to data collection in future, even at the expense of large catalogues, in order to protect individual liberties and the effect of technology on people and solutions.
HPE Acquires Determined AI to Accelerate ML Training
Determined AI is a four-year-old company, which only brought its product to market in 2020. It specialises in machine learning (ML), with the aim of training AI models quickly and at any scale. HPE will combine Determined AI’s unique software solution with its AI and high-performance computing (HPC) offerings to enable ML engineers to easily implement and train ML models to provide faster and more accurate insights from their data in almost every industry.
“As we enter the Age of Insight, our customers recognise the need to add machine learning to deliver better and faster answers from their data,” said Justin Hotard, senior vice president and general manager, HPC and Mission Critical Solutions (MCS), HPE. “AI-powered technologies will play an increasingly critical role in turning data into readily available, actionable information to fuel this new era. Determined AI’s unique open source platform allows ML engineers to build models faster and deliver business value sooner without having to worry about the underlying infrastructure. I am pleased to welcome the world-class Determined AI team, who share our vision to make AI more accessible for our customers and users, into the HPE family.”
Delivery AI at scale
According to IDC, the accelerated AI server market, which plays an important role in providing targeted capabilities for image and data-intensive training, is expected to grow by 28% each year and reach $18bn by 2024.
The computing power of HPC is also increasingly being used to train and optimise AI models, in addition to combining with AI to augment workloads such as modeling and simulation. Intersect360 Research notes that the HPC market will grow by more than 40%, reaching almost $55bn in revenue by 2024.
“Over the last several years, building AI applications has become extremely compute, data, and communication intensive. By combining with HPE’s industry-leading HPC and AI solutions, we can accelerate our mission to build cutting edge AI applications and significantly expand our customer reach.” said Evan Sparks, CEO of Determined AI.