What is multimodal AI?

By Paddy Smith
It’s the future of deep learning, but what exactly is multimodal AI, and how is it used...

Multimodal AI isn’t new, but you’ll start hearing the phrase more outside core deep learning development groups. So what is multimodal AI, and why is it being called ‘the future of AI’?

Multimodal AI: the basics

Let’s start with modes. Think of a mode like a human sense. You might see and taste a carrot, for instance. You would be able to identify that you were eating a carrot faster than if you had to eat the carrot blindfolded. You could also identify the carrot if you could see but not taste it. If it was not carrot shaped (eg puree) you might still guess it was carrot from the colour. But if you could eat that puree as well, you could get confirmation from the flavour. That’s multimodal AI in a nutshell. It’s a combination of different inputs, allowing the learning intelligence to infer a more accurate result from multiple inputs.

Multimodal AI: how does it work?

In standard AI, a computer is trained in a specific task. Imaging, say, or language. It’s given a sample of training data, from which it can learn to identify other similar images or words. It’s simpler to train the AI if you are only dealing with one source of information, but the results can be skewed by lack of context or supporting information. In multimodal AI two or more streams of information can be processed, giving the software a better shot at deducing what it’s looking at.

Multimodal AI: what’s the benefit?

Put simply, more accurate results, and less opportunity for machine learning algorithms to accidentally train themselves badly by misinterpreting data inputs. The upshot is a 1+1=3 sort of sum, with greater perceptivity and accuracy allowing for speedier outcomes with a higher value.

Multimodal AI: how does it help businesses?

By recognising context, multimodal AI can give more intelligent insights into business planning. If machinery is being serviced according to predictive maintenance, it’s better if the AI can take the input from various sensors, it might infer that an older piece of equipment does not need servicing as often if the AI is flagging that it works just as well as a newer bit of kit once the temperature stabilises. Or it might understand that a new team is not underperforming when it is engaged in quite heavy training which takes time other teams might throughput as productivity.

Multimodal AI: can it prioritise one input over another?

Yes, and that is crucial to its successful use. Should it look at the carrot or taste the carrot first? Does that change depending on whether the carrot is whole or pureed? Balancing the inputs to be aggregated is the ML skill needed to make the most of multimodal AI.


Featured Articles

US to Form AI Task Force to Confront AI Threats to Safety

The United States aims to form an AI task force to explore AI safety and possible regulations, amid a global debate over developing responsible technology

Wipro to Advance Enterprise Gen AI Adoption with IBM watsonx

Wipro and IBM are expanding their partnership to launch a Gen AI platform, aiming to provide easy and secure AI adoption solutions for businesses

Dr Joy Buolamwini: Helping Tech Giants Recognise AI Biases

With a longstanding commitment to AI ethics, Dr Joy Buolamwini continues to hold AI developers accountable as they work to tackle machine learning biases

Big Tech Companies Agree to Tackle AI Election Fraud

AI Strategy

Microsoft to Offer AI Support to Europe Amid Economic Slump

AI Strategy

Microsoft: Digitally Transforming India via AI Upskilling

AI Strategy