Serving AI models at scale with vLLM

13 November 2025 • 01:21

Servir des modèles d'IA à grande échelle avec vLLM

Futur-IA: Serving AI models at scale with vLLM

Author: Google Cloud Tech – Duration: 00:03:08

Unlock the full potential of your AI models by serving them at scale with vLLM. This video addresses common issues like memory inefficiency, high latency under load, and large models, showing how vLLM maximizes the throughput of your existing hardware. Learn about vLLM's innovative features such as PagedAttention, Prefix Caching, Multi-Host Service, and Disaggregated Service, and learn how it seamlessly integrates with Google Cloud GPUs and TPUs for flexible, high-performance AI inference. Chapters: 0:00 – Introduction: The Challenge of Scaling AI 0:25 – 3 Common Problems 1:01 – Solution: vLLM for High-Performance Service 1:13 – vLLM Feature: PagedAttention 1:30 – vLLM Feature: Prefix Caching 1:46 – vLLM Feature: Multi-Host and Disaggregated Server 2:07 – Support for vLLM on Google Cloud (GPU and TPU) 2:29 – vLLM tunable settings 2:46 – Wrap-up resources: Welcome to vLLM → https://goo.gle/49zlRZN

GitHub TPU Inference → https://goo.gle/3JUkBpn

Subscribe to Google Cloud Tech → https://goo.gle/GoogleCloudTech

#GoogleCloud #vLLM #AIInfrastructure Speakers: Don McCasland Products mentioned: AI infrastructure, tensor processing units, cloud GPUs

Tags: Google

Featured tools

Catégorie: Video

Vidnoz AI

Vidnoz AI is a video generator tool that allows teams, businesses, and users to create engaging AI videos quickly and affordably. By eliminating the need for cameras, actors and studios, Vidnoz AI saves time and money. Users have reported saving up to 80% on video creation costs and creating videos 10x faster than before. Main[...]

Catégorie: Developer tools

WP Dev AI

WP Dev AI allows users to effortlessly create custom features for WordPress websites through AI-generated code, eliminating the need for expensive developers. With clear instructions and code snippets accessible at any time, users can effectively improve their WordPress sites without technical expertise. Main Features: AI-powered code generation: Instantly translate feature descriptions into functional code snippets[...]

Catégorie: Image generator

Leonardo.ai

Unleash your creativity with the power of Leonardo Ai. This software allows you to create high-quality visual assets for your projects with unmatched quality, speed and style consistency. It allows you to cultivate originality, offers simplified mastery and boosts innovation, making it an essential tool for various creative activities. Main Features: Image generation: Leonardo's image[...]

Catégorie: Music

Suno.ai

Suno.ai is revolutionary software that allows anyone, from shower singers to professional artists, to create music without the need for musical instruments. With just your imagination, you can create your own songs effortlessly. Suno.ai offers a unique and exciting approach to music creation, making it accessible to everyone. Main Features: Music creation based on imagination:[...]

Submit your AI toolSubmit your AI tool

Popular news

Tags

Serving AI models at scale with vLLM

Serving AI models at scale with vLLM

NEWSLETTER: Recevez le meilleur de l'actu IA!

Featured tools

Vidnoz AI

WP Dev AI

Leonardo.ai

Suno.ai

Useful links

Serving AI models at scale with vLLM

Serving AI models at scale with vLLM

SHARE

SHARE

SHARE

NEWSLETTER: Recevez le meilleur de l'actu IA!

Follow us on social networks (French)

Featured tools

Useful links

Follow us on social media (French)