Autoscaling Your AI Agent Under Load
This video demonstrates how to effectively autoscale your AI agent under heavy user load. We simulate a stress test on a decoupled architecture, combining a GPU-powered Gemma LLM with a lightweight ADK agent on Google Cloud Run. Discover how Cloud Run intelligently provisions resources to handle high demand, ensuring graceful scaling and cost efficiency by only scaling the bottleneck component.
Chapters:
0:00 - Introduction: The Challenge of Load
0:19 - Load Testing with Locust
1:31 - Observing Autoscaling in Cloud Run
2:02 - Key Learnings: Decoupling and Cost Efficiency
2:31 - Conclusion
Resources:
Codelab → http://goo.gle/475sUpV
GitHub Repository → http://goo.gle/3KJVc1Y
Google Cloud Run GPU → http://goo.gle/48sn3NV
ADK Documentation → http://goo.gle/3LauFL8
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #LLM #Gemma #ADK #CloudRun
Speakers: Amit Maraj
Products Mentioned: Cloud Run, Gemma, AI Infrastructure, Cloud GPUs
Google Cloud Tech
Helping you build what's next with secure infrastructure, developer tools, APIs, data analytics and machine learning....