Google Lowers Barrier for AI Developers to Use Powerful GPUs
Google has announced a significant update to its Cloud Run service, introducing support for NVIDIA L4 GPUs for users that could dramatically enhance their AI offerings.
This enhancement is designed to make advanced AI capabilities more accessible for a variety of applications, potentially transforming the functionality of everyday software.
“With the addition of NVIDIA L4 Tensor GPU and NVIDIA NIM support, Cloud Run provides users a real-time, fast-scaling AI inference platform to help customers accelerate their AI projects and get their solutions to market faster — with minimal infrastructure management overhead.” says Anne Hecht, Senior Director of Product Marketing, NVIDIA
The new feature enables developers to attach one NVIDIA L4 GPU, equipped with 24GB of vRAM, to their Cloud Run instances on an as-needed basis.
What’s on offer?
This announcement allows for substantial computational power without the burden of maintaining a constant and costly GPU infrastructure, crucial for enabling wider adoption of stronger AI systems.
Furthermore, the service's ability to scale down to zero during periods of inactivity ensures that users are not charged when the service is not in use, providing notable cost savings.
Yet, a key focus of this update is its speed, enabling real-time inference applications. These are AI systems capable of processing and responding to input data with minimal delay, often within milliseconds or seconds.
Real-time inference is crucial for applications that require immediate responses, such as custom chatbots, on-the-fly document summarisation, instant image recognition, and real-time video processing.
This is facilitated by the GPU which supports enhanced compute-intensive tasks, including on-demand image recognition, video transcoding, and 3D rendering.
Building stronger AI services
By leveraging the power of NVIDIA L4 GPUs, developers can now build and deploy AI models that can handle complex tasks swiftly, enhancing user experience and enabling new types of interactive AI-powered services that were previously impractical due to performance limitations.
Developers can use the LLMs of their choice - like open models with up to 9 billion such as Google's Gemma (2B/7B) or Meta's Llama 3 (8B) - with fast token rates.
Additionally, businesses can serve custom fine-tuned Gen AI models, such as tailored image generation, while optimising costs by scaling down when demand decreases.
Instances with an attached L4 GPU can start in approximately five seconds, allowing processes within the container to begin utilising the GPU almost immediately.
Cold-start times for various models, such as Gemma 2b and Llama 3.1, range from 11 to 35 seconds, depending on the specific model and its size.
Impact on user experience
Already, early adopters of this technology have expressed enthusiasm regarding its impact on their AI operations.
“Cloud Run's GPU support has been a game-changer for our real-time inference applications,” says Thomas MENARD, Head of AI - Global Beauty Tech, L’Oreal. “Overall, Cloud Run GPUs have significantly enhanced our ability to provide fast, accurate, and efficient results to our end users.”
Currently, Cloud Run GPUs are available in the us-central1 region, with plans for availability in Europe and Asia by the end of the year.
Google's Cloud Run update significantly lowers the barrier for developers to access advanced AI capabilities. As developers can now scale resources on-demand and pay only for usage, enterprises are able to offer their customers better, AI-enabled services across various sectors, potentially streamlining their internal operations and enhancing user experiences.
******
Make sure you check out the latest edition of AI Magazine and also sign up to our global conference series - Tech & AI LIVE 2024
******
AI Magazine is a BizClik brand