Inside Googleās MedGemma Models for Healthcare AI

The healthcare technology market has experienced rapid growth as medical institutions seek to integrate automated systems into clinical workflows.
However, adoption has been constrained by regulatory requirements, privacy concerns and the need for software that can operate within institutional data governance frameworks.
This regulatory environment has created demand for open-source medical software that healthcare developers can modify, deploy locally and integrate with existing systems without external data sharing.
Major technology companies have responded by releasing specialised healthcare tools that prioritise privacy and customisation over convenience.
Now, Google has released two new models in its MedGemma collection, marking the latest expansion of the technology company’s open-source healthcare software offerings.
What do Google’s MedGemma models do?
Google’s new releases include MedGemma 27B Multimodal, which processes both text and images and MedSigLIP, a lightweight image and text encoder designed for medical applications.
Both models form part of Google’s Health AI Developer Foundations (HAI-DEF) programme, which provides developers with open-source starting points for healthcare software development.
Daniel Golden, Engineering Manager at Google Research and Rory Pilgrim, Product Manager at Google Research, announced the models as part of the company’s broader strategy to accelerate healthcare technology development through open-source tools.
The models address growing demand from healthcare providers for automated systems that can operate within institutional privacy frameworks whilst maintaining clinical accuracy.
MedGemma models demonstrating performance across medical benchmarks
The MedGemma collection now includes variants in 4B and 27B parameter sizes, both accepting image and text inputs whilst producing text outputs.
MedGemma 4B Multimodal achieved a 64.4% score on MedQA, a medical knowledge assessment benchmark.
The model ranks among the best open models under 8 billion parameters, according to Google’s internal evaluation.
In clinical testing, 81% of chest X-ray reports generated by MedGemma 4B received approval from a US board-certified radiologist for accuracy sufficient to support similar patient management decisions.
The 27B text variant scored 87.7% on MedQA, placing it within three points of DeepSeek R1, a leading open model, whilst operating at approximately one-tenth the inference cost.
Google developed these models by training a medically optimised image encoder, subsequently training corresponding 4B and 27B versions of the Gemma 3 model on medical data.
The company retained the general capabilities of Gemma throughout this process, enabling MedGemma to handle tasks combining medical and non-medical information whilst preserving instruction-following abilities in non-English languages.
MedSigLIP targeting classification and retrieval tasks
MedSigLIP is a 400-million parameter image encoder utilising the Sigmoid loss for Language Image Pre-training (SigLIP) architecture.
The model was adapted from SigLIP through training with diverse medical imaging data, including chest X-rays, histopathology patches, dermatology images and fundus images.
The encoder bridges medical images and medical text by encoding them into a common embedding space, enabling comparison between visual and textual medical information.
MedSigLIP maintains performance on natural images whilst adding medical imaging capabilities.
The model supports traditional image classification, zero-shot image classification and semantic image retrieval applications.
Zero-shot classification allows the system to categorise images without specific training examples by comparing image representations to textual class labels.
Developer adoption spanning multiple healthcare applications
Early adopters have deployed MedGemma models across various healthcare applications.
For example, developers at DeepHealth, a Massachusetts-based healthcare technology company, have explored MedSigLIP for chest X-ray triaging and nodule detection applications.
Additionally, researchers at Chang Gung Memorial Hospital in Taiwan found that MedGemma works with traditional Chinese-language medical literature and responds to medical staff questions.
The hospital’s experience demonstrates the models’ multilingual capabilities in clinical settings.
Developers at Tap Health, a healthcare technology company based in Gurgaon, India, have also implemented MedGemma for tasks requiring clinical context sensitivity.
The company’s developers noted the model’s reliability for summarising progress notes and suggesting guideline-aligned clinical recommendations.
Open-source approach addressing privacy and customisation requirements
Google’s open-source distribution strategy addresses specific healthcare industry requirements that distinguish medical software from consumer applications.
The models can be downloaded, modified and fine-tuned to support developers’ specific needs without relying on external APIs.
The open approach enables developers to run models on proprietary hardware within their preferred environments, including Google Cloud Platform or local infrastructure.
This capability addresses privacy concerns and institutional policies that restrict data sharing with external services.
Models are distributed as snapshots with frozen parameters, ensuring stability over time.
This consistency proves crucial for medical applications where reproducibility requirements exceed those of consumer software applications.
Google distributes MedSigLIP and MedGemma through Hugging Face, a popular ML model repository, in the safetensors format.
The company provides detailed implementation notebooks on GitHub for both inference and fine-tuning applications.
When developers require scaling capabilities, MedGemma and MedSigLIP can be deployed through Vertex AI as dedicated endpoints.
Google provides GitHub examples demonstrating inference operations on these endpoints.
Daniel and Rory note that performance benchmarks highlight baseline capabilities, but acknowledge that inaccurate model output remains possible even for domains representing substantial portions of training data.
“All model outputs should be considered preliminary and require independent verification, clinical correlation, and further investigation through established research and development methodologies,” they say.


