I discovered this problem when I noticed that all AI-generated long-form content exhibited the same pattern of degradation around page 60, regardless of context window size. After detailed analysis, I identified four key failure modes:
1. Structural repetition: Systems fall into rigid pattern loops (identical paragraph structures repeated 30+ times)
2. Verbatim recycling: Same phrases appear across different contexts
3. Character voice homogenization: All characters speak/think identically
4. Plot stagnation: Same revelations repeat without progression
These failures occur because LLMs don't truly "remember" what they've written - they operate on a sliding window of recent text, creating an illusion of memory that breaks down in longer works.
My approach uses a multi-model orchestration architecture rather than a single LLM:
- Memory Management System: Structured database of narrative elements outside the generation process
- Character Consistency Layer: Specialized NER system with contextual validation
- Non-Autoregressive Pattern Analysis: Prevents structural repetition
- Dynamic Prompt Generation: Evolving instructions based on accumulated context
The most counterintuitive finding was that quality actually improves with length in our system (measured against standardized publishing metrics). This inverts the traditional degradation curve.
I'd be interested in hearing from others working on LLM coherence issues, particularly at extended context lengths.
1. Structural repetition: Systems fall into rigid pattern loops (identical paragraph structures repeated 30+ times) 2. Verbatim recycling: Same phrases appear across different contexts 3. Character voice homogenization: All characters speak/think identically 4. Plot stagnation: Same revelations repeat without progression
These failures occur because LLMs don't truly "remember" what they've written - they operate on a sliding window of recent text, creating an illusion of memory that breaks down in longer works.
My approach uses a multi-model orchestration architecture rather than a single LLM:
- Memory Management System: Structured database of narrative elements outside the generation process - Character Consistency Layer: Specialized NER system with contextual validation - Non-Autoregressive Pattern Analysis: Prevents structural repetition - Dynamic Prompt Generation: Evolving instructions based on accumulated context
The most counterintuitive finding was that quality actually improves with length in our system (measured against standardized publishing metrics). This inverts the traditional degradation curve.
I'd be interested in hearing from others working on LLM coherence issues, particularly at extended context lengths.
What approaches have you tried?