Why does AI forget everything between sessions?

AI systems are session-based - they process each conversation independently with no persistent state. This is why ChatGPT forgets what you told it, why Siri cant remember your preferences, and why chatbots make you repeat yourself. The solution is a continuity layer - infrastructure that persists, updates, and reconstructs context across sessions. Kenotic Labs builds this layer.

Why is ChatGPT getting worse at remembering things?

ChatGPT and other AI assistants appear to get worse because they lack true continuity - the ability to carry forward what matters, update it when things change, and reconstruct it when needed. Context windows and memory features are band-aids. Real continuity requires a dedicated infrastructure layer that handles persistence, temporal ordering, disambiguation, and reconstruction.

How do I add persistent memory to my AI agent?

Most AI agents fail because they lose state between tasks. Solving that requires a continuity layer that preserves what still matters, keeps track of change, and reconstructs the current situation when needed. Kenotic Labs provides this as infrastructure so AI agents can carry persistent context, temporal reasoning, and disambiguation across sessions.

Why do AI agents fail 80% of the time?

AI agent reliability is fundamentally a memory and state management problem. With 85% per-step accuracy, a 10-step workflow only succeeds 20% of the time. Agents fail because they cannot maintain context across steps, forget previous failures, and lose track of what has changed. A continuity layer solves this by providing persistent state, update tracking, and situation reconstruction.

What is the difference between RAG and continuity?

RAG (Retrieval Augmented Generation) retrieves similar chunks of text. Continuity reconstructs the current living state of a situation - including what changed, what still matters, and what should happen next. RAG finds related past things. Continuity understands the present. This is why RAG still hallucinates 17-33% of the time while deterministic reconstruction achieves near-perfect accuracy.

What is DTCM and why is it not just a database?

DTCM stands for Decomposed Trace Convergence Memory. It is Kenotic Labs' architecture for preserving and reconstructing the living state of a situation. A normal database stores facts and makes the model interpret them again from scratch. DTCM is designed as a continuity architecture, not just storage, so the system can remain oriented to what is active, what changed, and what should happen next.

If this layer became real, what would begin to change?

If the continuity layer becomes real, machines begin to understand why something matters, when action should happen, and what should happen next without being re-instructed from zero every time. In software, that changes how systems behave. In hardware, it points toward a different kind of machine substrate. The larger implication is that new businesses, new product categories, and new operating models become possible once situational coherence exists as infrastructure.

ATANT (Automated Test for Acceptance of Narrative Truth) is the first open evaluation framework for measuring AI continuity. It tests whether an AI system can persist, update, disambiguate, and reconstruct meaningful context across time using 250 narrative tests and 1,835 verification questions across 10 checkpoints. Published by Kenotic Labs.

Why do chatbots make me repeat myself?

90% of customers have to repeat information to chatbots because these systems lack continuity - they cannot carry forward context from previous interactions. Each session starts from zero. A continuity layer would let chatbots remember your history, track what changed, and reconstruct your situation without asking again.

Why cant Siri or Alexa remember anything?

Voice assistants like Siri and Alexa are session-based - they process each command independently. They lack a continuity layer that would let them remember your preferences, track recurring patterns, and build understanding over time. The technology exists to fix this - it requires persistent memory infrastructure that survives across sessions, updates, and device restarts.

RAG Doesn't Solve Hallucination. What Actually Does.

Legal AI tools using RAG hallucinate 17-33% of the time. The problem isn't the model. It's what retrieval actually returns, and what it can't.

RAG reduces hallucination compared to base models, but it doesn't solve it. RAG retrieves similar text chunks. It doesn't know what's current vs. outdated, can't disambiguate overlapping contexts, and can't reconstruct the state of a situation. The actual fix is deterministic reconstruction: a write-path architecture that structures information at storage time so the right context can be rebuilt, not searched for.

RAG was supposed to fix hallucination. Give the model access to real documents. Ground its responses in retrieved facts. Problem solved.

Except it didn't solve it. Stanford's 2025 study found that LexisNexis and Westlaw, two of the most sophisticated RAG-based legal research tools on the market, hallucinate between 17% and 33% of the time. Westlaw's AI-Assisted Research is accurate on just 42% of queries.

These aren't toy demos. These are production tools used by lawyers making decisions that affect people's lives. And one in three to one in six responses is wrong.

RAG helped. It didn't solve the problem. Understanding why requires looking at what retrieval actually does and what it doesn't.

What Does RAG Actually Do?

RAG works in three steps:

Chunk: Split documents into pieces (typically 200-500 tokens each)
Embed: Convert each chunk into a vector (a numeric representation of its meaning)
Retrieve: When a query comes in, find the chunks whose vectors are closest to the query vector, and feed them to the model as context

The model then generates a response using those retrieved chunks as grounding.

This works well for a specific class of question: "Find me something relevant to this query." If the answer exists in a single chunk and the embedding correctly captures its relevance, RAG does its job.

But that's a narrow class of question.

Why Does RAG Still Hallucinate?

RAG fails in predictable ways. Recent research catalogs multiple distinct root causes. The most structural:

1. Chunk splitting destroys context. If a fact spans two chunks, neither chunk contains the complete answer. One bad chunk split can ruin relevance. Documents get sliced in ways that break semantic units, split related concepts, and create fragments too small to be meaningful.

2. Semantic similarity isn't semantic correctness. Vector search finds text that sounds similar to the query. It doesn't verify that the retrieved text actually answers it. "Terminating an employee" and "terminating a software process" are semantically similar. They are not the same thing.

3. Lost in the middle. Even when RAG retrieves the right chunks, models struggle to use them. Research shows a U-shaped performance curve. Models attend to the beginning and end of context but degrade significantly on information in the middle. More retrieved chunks can actually make accuracy worse.

4. No temporal awareness. RAG retrieves chunks regardless of when they were written. If a fact was updated three times, RAG might return the outdated version because its embedding is closer to the query. There's no concept of "this supersedes that."

5. No disambiguation. If your system serves multiple users or contexts, RAG returns the closest vectors regardless of who they belong to. Two users with similar situations get each other's data.

6. Retrieval is not reconstruction. RAG answers "what text is similar to this query?" It cannot answer "what is the current state of this situation?" Those are fundamentally different operations.

What's the Difference Between Retrieval and Reconstruction?

This is the core distinction:

Retrieval searches a corpus and returns similar chunks. It's a read-path operation. The data is stored however it was stored, and search happens at query time.

Reconstruction rebuilds the current state of a situation from structured traces. It's a write-path-first operation. Data is decomposed and structured at storage time so that the right context can be deterministically assembled later.

	RAG (Retrieval)	Deterministic Reconstruction
When structuring happens	Query time (search)	Write time (decomposition)
What it returns	Similar text chunks	The current state of a situation
Update handling	Old and new chunks coexist	Old state is superseded, current state is authoritative
Disambiguation	Returns all similar vectors regardless of source	Traces are scoped to each user/context
Temporal ordering	No awareness of sequence	Tracks what happened when and what's still active
Hallucination source	Wrong chunk retrieved, model confabulates	Deterministic: either the trace exists or it doesn't
Fails when	Query doesn't match available chunk embeddings	Nothing was stored (explicit failure, not silent)

The difference: when RAG fails, it returns something that sounds right but isn't. When reconstruction fails, it returns nothing, because the data either exists in structured form or it doesn't. Silent hallucination vs. explicit absence.

Why Can't You Just Improve RAG?

The industry is trying. Semantic chunking. Re-ranking. Hybrid search. Agentic RAG. Better embeddings. Each iteration improves accuracy incrementally.

But these improvements are all on the read path. They're trying to get better at finding the right chunk at query time. The fundamental issue is that the data was stored as unstructured text, and no amount of search sophistication fully compensates for that.

Consider: if you store a user's situation as raw conversation logs and then try to retrieve the relevant pieces later, you're depending on embedding similarity to reconstruct meaning. That's probabilistic by nature. Sometimes it works. Sometimes it returns the wrong chunk. Sometimes it returns an outdated version. Sometimes it misses context that spans multiple chunks.

If instead you decompose the interaction at write time, extracting who was involved, what happened, when, what the emotional state was, what's still active. Reconstruction at read time is deterministic. You're not searching for similar text. You're assembling structured traces that were explicitly stored for this purpose.

Where Does Fine-Tuning Fit?

Neither RAG nor fine-tuning solves this problem, because they solve different problems:

Fine-tuning changes model behavior: how it writes, what style it uses, what domain it specializes in. It doesn't give the model access to external facts.
RAG gives the model access to external facts, but probabilistically, with all the failure modes above.
Longer context windows let the model hold more raw text, but performance degrades as context length increases (the same "lost in the middle" effect), and a longer window still resets every session.

None of these address the core issue: how do you maintain structured, updateable, living state across time?

That's not a retrieval problem. That's not a training problem. That's an infrastructure problem. It requires a dedicated layer.

What Is This Layer?

A continuity layer sits between the user and the model. It decomposes information into structured traces at write time (who, what, when, emotional state, active vs. resolved) and reconstructs the current situation from those traces at read time. Not "find similar chunks" but assemble the structured state.

This is the same architectural problem behind ChatGPT forgetting across sessions, Character AI losing your story, chatbots making you repeat yourself, and voice assistants that can't remember yesterday. RAG is deployed in all of them as the "memory" solution. It's insufficient in all of them for the same reasons.

What I Built

At Kenotic Labs, I built a write-path-first deterministic architecture called DTCM (Decomposed Trace Convergence Memory). Every interaction is decomposed into five structured traces at write time. At read time, the system reconstructs situational context from those traces. Deterministically, not probabilistically.

I tested it against ATANT, the first open evaluation framework for AI continuity. 250 narrative stories. 1,835 verification questions. 100% accuracy in isolated mode. 96% at 250-story cumulative scale, with 250 different contexts coexisting in one system, correctly disambiguated.

RAG finds similar chunks. DTCM reconstructs the current state. That's the architectural difference.

Follow the research at kenoticlabs.com

Samuel Tanguturi is the founder of Kenotic Labs, building the continuity layer for AI systems. ATANT v1.0, the first open evaluation framework for AI continuity, is available on GitHub.