How Google’s AI Tools Are Transforming Language Learning

Share this article
Share this article
Prioritise Us on Google
Google launches AI tools for learning languages through personalised lessons
Google’s experimental Little Language Lessons platform uses Gemini AI models for personalised, contextual and multimodal language learning experiences

Google has launched an experimental language learning platform that utilises its Gemini AI models, demonstrating potential applications for Gen AI in educational contexts.

Little Language Lessons, a collection of three prototype applications, was developed by a small team of Google engineers to explore how the company's LLMs could create more contextual and personalised language learning experiences.

The platform is a change from traditional language learning methods toward AI-assisted, situation-specific instruction that adapts to individual user needs and environments.

“Learning a new programming language typically begins by building something tangible, instantly putting theory into practice,” says Aaron Wade, Creative Technologist at Google who worked on the project.

Aaron Wade, Creative Technologist at Google

“Learning a new spoken language, on the other hand, often happens in a vacuum – through textbooks or exercises that feel strangely disconnected from the situations where language actually matters.”

The initiative serves as a practical demonstration of Google's Gemini API, a programming interface that allows developers to integrate the firm's AI models into third-party applications, creating potential new revenue streams for Google while expanding its ecosystem of AI-powered tools.

Experiment 1: Google’s Tiny Lesson 

The first experiment in the Little Language Lessons suite, called Tiny Lesson, provides vocabulary and phrases tailored to specific situations that language learners might encounter, such as asking for directions or dealing with a lost passport.

This functionality relies on Gemini's ability to generate structured data outputs in JavaScript Object Notation (JSON) format – a standardised way of organising information that computers can easily process – containing relevant vocabulary, phrases and grammar tips for the specified scenario.

Google uses Gemini to generate personalised, contextual vocabulary, conversations and object labels for language learners

Each lesson involves two separate calls to the Gemini API: one to generate vocabulary and phrases, and another to provide related grammar topics, demonstrating how developers can chain together AI responses to create more comprehensive applications.

Experiment 2: Slang Hang 

The second prototype, Slang Hang, addresses the challenge language learners face in sounding more natural by generating realistic conversations between native speakers.

Users can view the dialogue one message at a time, with explanations for colloquial expressions.

Despite generating the entire conversation and explanations in a single API call, the developers acknowledge the system's limitations, Aaron noting that it “occasionally misuses certain expressions and slang, or even makes them up.

LLMs still aren’t perfect, and for that reason it’s important to cross-reference with reliable sources.”

The application generates unique scenarios for each conversation, creating what the developers describe as ‘emergent storytelling’ where each scene is different – from street vendors chatting with customers to coworkers meeting on public transport.

Youtube Placeholder

Translations between languages are handled by Google's Cloud Translation API, which converts text from one language to another, allowing users to understand conversations in their native language while learning target language expressions.

Experiment 3: Word Cam

The third experiment, Word Cam, demonstrates Gemini's multimodal capabilities by identifying objects in photos and labelling them in the user's target language.

When a user selects an object in a photograph, the application sends a cropped image to Gemini with a prompt to generate descriptors in the target language, showcasing how vision-language models can be applied to real-world educational challenges.

This approach addresses a common frustration for language learners who may know basic vocabulary but struggle with more specific terms for objects in their surroundings.

To enhance the user experience, the team integrated Google's Cloud Text-to-Speech API, which converts written text to spoken audio, allowing users to hear pronunciations in their target language.

However, the developers note limitations in accent representation, particularly for less commonly spoken languages.


Explore the latest edition of AI Magazine and be part of the conversation at our global conference series, Tech & AI LIVE

Discover all our upcoming events and secure your tickets today.

Also sign up to our free weekly newsletter for the latest insights and stories straight into your inbox.


AI Magazine is a BizClik brand