Context Rot: How Increasing Input Tokens Impacts LLM Performance
Chroma. Kelly Hong walking through Chroma's research on 18 models — why needle-in-haystack scores are misleading, how performance degrades with ambiguity and distractors, why even simple string-repetition tasks degrade past 500 tokens. Short, evidence-based, and exactly the case the article needs you to take seriously before getting to the engineering moves.
AI Expert note
Model names, pricing and capabilities change quickly. Use this for the decision pattern, then verify current model behavior before adopting it.
What you should get from this
Understand how long context can fail under ambiguity and distractors, then design tests around that risk.
Watch next
Continue through the same learning path with the next curated companion videos.
Related videos
Take it further
Hand-picked external courses that go deeper on this topic.






