Auto-scaling your AI agent under load
Auto-scaling your AI agent under load


Author: Google Cloud Tech – Duration: 00:02:59
This video shows how to automatically scale your AI agent under high user load. We simulate a stress test on a decoupled architecture, combining a GPU-powered Gemma LLM with a lightweight ADK agent on Google Cloud Run. Learn how Cloud Run intelligently provisions resources to meet high demand, ensuring graceful scaling and cost-effectiveness by scaling only the bottleneck component. Chapters: 0:00 – Introduction: The load challenge 0:19 – Load testing with Locust 1:31 – Observing autoscaling in Cloud Run 2:02 – Key learnings: Decoupling and cost effectiveness 2:31 – Conclusion Resources: Codelab → http://goo.gle/475sUpV
GitHub repository → http://goo.gle/3KJVc1Y
Google Cloud Run GPU → http://goo.gle/48sn3NV
ADK Documentation → http://goo.gle/3LauFL8
Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech
#GoogleCloud #LLM #Gemma #ADK #CloudRun Speakers: Amit Maraj Products mentioned: Cloud Run, Gemma, AI Infrastructure, Cloud GPU






