Build time: prompt caching
Build time: prompt caching


Author: OpenAI – Duration: 00:56:04
Create faster, cheaper, and with lower latency with fast caching. This build hour explains how prompt caching works and how to design your prompts to maximize cache hits. Learn what's actually cached, when caching applies, and how small changes to your prompts can have a big impact on cost and performance. Erika Kettleson (Solutions Engineer) discusses the following topics: • What is prompt caching and why it is important for real-world applications? • How do cache accesses work (prefixes, token thresholds and continuity)? • Best practices such as using Responses API and prompt_cache_key? 👉 Quick Caching Documents: https://platform.openai.com/docs/guides/prompt-caching
👉 Prompt Caching 101 Cookbook: https://developers.openai.com/cookbook/examples/prompt_caching101
👉 Prompt Caching Cookbook 201: https://developers.openai.com/cookbook/examples/prompt_caching_201
👉 Follow the code repository: http://github.com/openai/build-hours
👉 Sign up for upcoming live build times: https://webinar.openai.com/buildhours
00:00 Introduction 02:37 Foundations, Mechanics, API Overview 12:11 Demo: Batch Image Processing 16:55 Demo: Branching Chat 26:02 Demo: Long-Term Compaction 32:39 Overview of Cache Discount Pricing 36:03 Customer Spotlight: Warp 49:37 Q&A






