New machine learning models make AI artists even better

AI-powered machine learning models including DALL-E and Midjourney are redefining art. And new models could take these AI artists into the classroom

Video game designer Jason Allen made headlines this year with Théâtre D’opéra Spatial, his submission to the Colorado State Fair’s digital arts competition. Judges awarded him first place and $300 prize, but the artwork also received a sudden flurry of global attention when it was discovered Allen had used AI-powered image generator Midjourney to create the work of art.

Midjourney, DALL-E and DALL-E 2 have brought a wealth of weird and wonderful images to the world as users type in natural language descriptions and share the dream-like results.

DALL-E 2 uses a “diffusion model”, which attempts to take the input text in its entirety and generate an image from that. But the output becomes less accurate as that text becomes more complex; the existing model appears to struggle to understand composition of concepts, and confuses attributes and relations between different objects. 

Scientists from MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL) say they looked at the problem from a different angle by adding models together so they could cooperate, which was seen to produce more creative combinations in the final images.

“DALL-E 2 is good at generating natural images but has difficulty understanding object relations sometimes,” says MIT CSAIL PhD student and co-lead author Shuang Li, “Beyond art and creativity, perhaps we could use our model for teaching. If you want to tell a child to put a cube on top of a sphere, and if we say this in language, it might be hard for them to understand. But our model can generate the image and show them.”

Machine learning models help to learn about language

The team’s model – Composable Diffusion – uses diffusion and compositional operators to combine text descriptions without further training, which more accurately captures text details. One example using this model - which called for “a pink sky” and “a blue mountain on the horizon” and “cherry blossoms in front of the mountain” - produced an accurate image, while the original model returned a blue sky, but gave everything in front of the mountains a pink colour. 

Images using a new method developed by MIT researchers for the request “a train on a bridge and a river under the bridge”

“The fact that our model is composable means that you can learn different portions of the model, one at a time,” says co-lead author and MIT CSAIL PhD student Yilun Du. “You can first learn an object on top of another, then learn an object to the right of another, and then learn something left of another. Since we can compose these together, you can imagine that our system enables us to incrementally learn language, relations, or knowledge, which we think is a pretty interesting direction for future work.”

The research - supported by Raytheon BBN Technologies Corp., Mitsubishi Electric Research Laboratory, and DEVCOM Army Research Laboratory - has received the approval of DALL-E2’s co-creator Mark Chen.

“This is a nice idea that leverages the energy-based interpretation of diffusion models so that old ideas around compositionality using energy-based models can be applied,” says Chen, who is a research scientist at OpenAI, the company behind DALL-E.

Share

Featured Articles

AI Agenda at Paris 2024: Revolutionising the Olympic Games

We attended the IOC Olympic AI Agenda Launch for Olympic Games Paris 2024 to learn about its AI strategy and enterprise partnerships to transform sports

Who is Gurdeep Singh Pall? Qualtrics’ AI Strategy President

Qualtrics has appointed Microsoft veteran Gurdeep Singh Pall as its new President of AI Strategy to transform the company’s AI offerings for customers

Should Tech Leaders be Concerned About the Power of AI?

With insights from Blackstone CEO Steve Schwarzman, we consider if tech leaders are right to be anxious about AI innovation and if regulation is necessary

Andrew Ng Joins Amazon Board to Support Enterprise AI

Machine Learning

GPT-4 Turbo: OpenAI Enhances ChatGPT AI Model for Developers

Machine Learning

Meta Launches AI Tools to Protect Against Online Image Abuse

AI Applications