back to main blog page

AI Has a Memory Problem

Every conversation you have with AI has an invisible limit. The model can only hold so much information in its head at once: your current message, the conversation history, any documents you've pasted in, and its own instructions. This is the context window. Think of it like working memory: when you're having a conversation with someone, you can reference things said a few minutes ago without repeating them, but if the conversation goes on long enough, the earliest parts start to fade. AI works the same way, except the limit is measured in tokens (a token is about 3/4 of a word). When the window fills up, older information falls off. This is why long conversations with ChatGPT start to feel like the model "forgot" what you told it 30 messages ago. It did.

If you've ever pasted a property listing, a lease, or a long market analysis into ChatGPT and noticed it missed something important from the middle, you've already bumped into this limit. The model didn't skip it on purpose. It ran out of room. Right now, every AI company is racing to announce the biggest context window. Claude just went from 9k to 100k tokens. GPT-3.5 got bumped to 16k. The headlines write themselves: "Claude can now read an entire novel!" "GPT can process 300 pages in one shot!" The implication is that bigger context window = smarter AI. More information in means better output out. But a recent Stanford/UC Berkeley paper tells a very different story.

The Race for Bigger Windows

AI failures are more devious because they're subtle. With ChatGPT's original 4,000-token limit (roughly 3,000 words), you'd hit the wall after a few exchanges. When that happened, the model didn't warn you. It silently dropped your oldest messages and kept responding as if nothing changed. You'd get contradictions, forgotten instructions, answers that sounded confident but had no connection to what you told it ten messages ago. The model wasn't hallucinating, it just lost access to the context and filled the gaps with plausible guesses.

GPT-3.5 expanding to 16,000 tokens means conversations last four times longer before that degradation kicks in. Claude's jump to 100,000 tokens is even more ambitious, at least on paper you could feed it an entire research paper and ask questions about it. These are real improvements. The problem is what happens to information once it's inside the window.

The Stanford/UC Berkeley paper, called "Lost in the Middle," tested how well language models actually use long context. The results are sobering. Models pay strong attention to the beginning and end of their context, but performance degrades by over 30% for information placed in the middle. The recall curve is U-shaped: strong at the edges, weak in the center. This happens even in models explicitly designed for long context, and the paper found that extended-context models were no better at using their input than their shorter-context counterparts. In some cases, the model performed worse with context than with no context at all. The attention mechanism has a fundamental bias toward recent and initial tokens, and simply making the window bigger doesn't fix that.

It's like giving someone a 500-page book and discovering they only absorbed the first and last chapters. A bigger context window means a bigger blind spot. More content means more of it lands in the dead zone between the edges the model actually pays attention to. So why does AI context work at all? Because most real usage doesn't look like searching for a specific fact in a 500-page document. It looks more like a conversation.

From Keywords to Fuzzy Landmarks

In programming, a variable is a named reference to a value: credit_score = 720. Getting it back means typing the exact name. Search engines, databases, Ctrl+F: same constraint. Every traditional system demands the precise term. LLMs don't. "Remember what I said about credit scores?" works twenty messages later. I think of this as fuzzy variables: same concept, except the reference doesn't have to be exact. "That contractor issue," "the numbers from the rental analysis" - close enough, and the model fills in the rest.

This changes more than prompting technique. It changes how we recall context. Traditional tools punished imprecise memory. Forget the exact keyword and the only option was going back to re-read. LLMs work the way normal conversation does: reference an idea loosely and the model reconstructs enough to keep going. Its context window becomes an extension of yours.

Using Context Effectively

Once you understand that context holds shapes rather than exact text, it changes how you use AI. Instead of pasting entire documents into a prompt and hoping the model absorbs everything, give it a rough sketch: the key points, the constraints, the tone you want. Use one session to boil down noisy context into clear signal for another session. Sessions are cheap, context windows are not.

The sketch is only half the equation. The other half is the transformation: make it formal, make it technical, rewrite it for a different audience. The sketch is the fuzzy input. The transformation is the precise instruction. I wrote about this in my 10 ChatGPT Productivity Patterns post as the "transformation" pattern. Context windows are the reason it works so well.

The model doesn't need to recall every detail of your rough notes to transform them into a polished investor memo. It needs the shape: what the deal looks like, what the numbers are, what your concerns are. Then you tell it "rewrite this for my lender" and it produces something clean, structured, and professional. Same notes, different transformation: "summarize this for my partner" gives you a two-paragraph overview. "Turn this into a listing description" gives you marketing copy. Each transformation uses the same fuzzy context but applies a different lens, and because the model only needs the shape of the input (not perfect recall of every word), it works even with imperfect context.

Your Notes Are Your Context Window

This concept extends beyond AI. Your notebook is already an external context window. You offload ideas to text so you don't have to keep them in your head, then reference them fuzzily later. "That idea I had about the Section 8 screening process" or "the notes from the contractor call last Tuesday." You don't remember the exact words. You remember the shape, and your notes fill in the rest. This is exactly what AI context windows do: hold enough of the shape that you can reference it and build on it, without perfect recall of every detail.

The better your notes capture the shape of ideas (not just raw facts, but relationships between ideas and the reasoning behind decisions), the more effectively you can load that context into AI later. Your note-taking system becomes the bridge between your long-term memory and AI's working memory. You think, you capture the shape in notes, you load those notes into context, and AI transforms them into whatever you need. I've been building a system around this idea, and I'll dedicate a post to it once it's further along.

Keeping notes outside the AI systems I interact with lets me control context and prevent drift that happens as content passes through multiple transformation stages. I can also query the same material differently across sessions, improving recall without bloating any single context window. This is a stark contrast to the NotebookLM-inspired systems I see popping up, which treat LLM recall as the source of truth and pay the price when bullshit sneaks in unchecked. The real skill isn't filling the window. It's knowing what to put in it.

Be the first to comment...