65,000-node Kubernetes AI platform: a reality
65,000-node Kubernetes AI platform: a reality


Author: Google Cloud Tech – Duration: 00:03:06
The size of generative AI models continues to increase, with current models reaching hundreds of billions of parameters and the most advanced approaching 2 trillion. Training such large models on modern accelerators requires clusters exceeding 10,000 nodes. GKE, which currently supports the world’s largest managed Kubernetes clusters with 15,000 nodes, has the capacity to handle these demanding training workloads. In anticipation of further advancements and even larger models, we are introducing support for 65,000 node clusters. This expansion, combined with innovations in accelerator computing power, will enable the training of models with 10 trillion or more parameters. Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech






