Jan 25, 2021

What is multimodal AI?

Paddy Smith
3 min
multimodal ai
It’s the future of deep learning, but what exactly is multimodal AI, and how is it used...

Multimodal AI isn’t new, but you’ll start hearing the phrase more outside core deep learning development groups. So what is multimodal AI, and why is it being called ‘the future of AI’?

Multimodal AI: the basics

Let’s start with modes. Think of a mode like a human sense. You might see and taste a carrot, for instance. You would be able to identify that you were eating a carrot faster than if you had to eat the carrot blindfolded. You could also identify the carrot if you could see but not taste it. If it was not carrot shaped (eg puree) you might still guess it was carrot from the colour. But if you could eat that puree as well, you could get confirmation from the flavour. That’s multimodal AI in a nutshell. It’s a combination of different inputs, allowing the learning intelligence to infer a more accurate result from multiple inputs.

Multimodal AI: how does it work?

In standard AI, a computer is trained in a specific task. Imaging, say, or language. It’s given a sample of training data, from which it can learn to identify other similar images or words. It’s simpler to train the AI if you are only dealing with one source of information, but the results can be skewed by lack of context or supporting information. In multimodal AI two or more streams of information can be processed, giving the software a better shot at deducing what it’s looking at.

Multimodal AI: what’s the benefit?

Put simply, more accurate results, and less opportunity for machine learning algorithms to accidentally train themselves badly by misinterpreting data inputs. The upshot is a 1+1=3 sort of sum, with greater perceptivity and accuracy allowing for speedier outcomes with a higher value.

Multimodal AI: how does it help businesses?

By recognising context, multimodal AI can give more intelligent insights into business planning. If machinery is being serviced according to predictive maintenance, it’s better if the AI can take the input from various sensors, it might infer that an older piece of equipment does not need servicing as often if the AI is flagging that it works just as well as a newer bit of kit once the temperature stabilises. Or it might understand that a new team is not underperforming when it is engaged in quite heavy training which takes time other teams might throughput as productivity.

Multimodal AI: can it prioritise one input over another?

Yes, and that is crucial to its successful use. Should it look at the carrot or taste the carrot first? Does that change depending on whether the carrot is whole or pureed? Balancing the inputs to be aggregated is the ML skill needed to make the most of multimodal AI.

Share article

Jun 10, 2021

Google is using AI to design faster and improved processors

2 min
Google scientists claim their new method of designing Google’s AI accelerators has the potential to save thousands of hours of human effort

Engineers at Google are now using artificial intelligence (AI) to design faster and more efficient processors, and then using its chip designs to develop the next generation of specialised computers that run the same type of AI algorithms.

Google designs its own computer chips rather than buying commercial products, this allows the company to optimise the chips to run its own software, but the process is time-consuming and expensive, usually taking two to three years to develop.

Floorplanning, a stage of chip design, involves taking the finalised circuit diagram of a new chip and arranging the components into an efficient layout for manufacturing. Although the functional design of the chip is complete at this point, the layout can have a huge impact on speed and power consumption. 

Previously floorplanning has been a highly manual and time-consuming task, says Anna Goldie at Google. Teams would split larger chips into blocks and work on parts in parallel, fiddling around to find small refinements, she says.

Fast chip design

In a new paper, Googlers Azalia Mirhoseini and Anna Goldie, and their colleagues, describe a deep reinforcement-learning system that can create floorplans in under six hours. 

They have created a convolutional neural network system that performs the macro block placement by itself within hours to achieve an optimal layout; the standard cells are automatically placed in the gaps by other software. This ML system should be able to produce an ideal floorplan far faster than humans at the controls. The neural network gradually improves its placement skills as it gains experience, according to the AI scientists. 

In their paper, the Googlers said their neural network is "capable of generalising across chips — meaning that it can learn from experience to become both better and faster at placing new chips — allowing chip designers to be assisted by artificial agents with more experience than any human could ever gain."

Generating a floorplan can take less than a second using a pre-trained neural net, and with up to a few hours of fine-tuning the network, the software can match or beat a human at floorplan design, according to the paper, depending on which metric you use.

"Our method was used to design the next generation of Google’s artificial-intelligence accelerators, and has the potential to save thousands of hours of human effort for each new generation," the Googlers wrote. "Finally, we believe that more powerful AI-designed hardware will fuel advances in AI, creating a symbiotic relationship between the two fields.

Share article