Context Window
Models & ArchitectureThe maximum amount of text an AI model can consider at once -- including both your input and its response.
Think of the context window like the size of a desk. A small desk means you can only spread out a few pages at a time -- you have to put some away to make room for new ones. A huge desk lets you lay out an entire book and see everything at once.
The context window is the total amount of text a language model can "see" and work with during a single conversation. It is measured in tokens and includes everything: your messages, the model's responses, any system instructions, and any documents you have pasted in. Once you exceed the context window, the model can no longer see the oldest parts of the conversation.
Context window sizes have grown rapidly. Early GPT-3 models had a 4,096 token context window -- about 3,000 words. GPT-4 expanded that to 128K tokens. Claude offers up to 200K tokens. Some newer models push even further. A larger context window means you can have longer conversations without the model forgetting what you said earlier, or you can paste in entire documents for the model to analyze.
In practice, context window size affects how you use AI tools. With a small context window, you might need to summarize previous conversation parts or break a long document into chunks. With a large context window, you can paste in a whole research paper or codebase and ask questions about it directly. This is a big deal for tasks like code review, legal document analysis, or studying long texts.
Keep in mind that a bigger context window does not always mean better performance. Some models struggle to pay equal attention to information in the middle of very long inputs -- a phenomenon researchers call "lost in the middle." Also, processing more tokens takes more time and costs more money. So even if a model supports 200K tokens, you should only use as much as you actually need.
Real-World Examples
- *Claude's 200K token window fitting approximately an entire novel
- *ChatGPT's 128K token window allowing analysis of long documents
- *Gemini offering up to a 1 million token context window for processing massive inputs