Should AI Memory Be Stored as Open Engrams or Baked Into Model Weights?

The short answer: AI agent memory should be stored as open, external engrams — not baked into model weights — whenever the memory must be inspectable, correctable, deletable, or portable across tools. Parametric memory (knowledge baked into model weights through fine-tuning or continual training) is faster at inference and can be more token-efficient, but it sacrifices auditability: you cannot read what the model knows, you cannot fix a single wrong fact without retraining, and you cannot prove that deleted knowledge is actually gone. For agent memory — corrections, preferences, conventions, procedures — the properties that matter (readability, reversibility, erasure, portability) are properties that weights cannot provide.

The problem: agents forget what they learn

Every AI agent starts each session with amnesia. You correct its coding style on Monday. On Tuesday, it makes the same mistake. You explain your architecture in Cursor. That night, Claude Code has no idea. The context window resets. The conversation is gone. The model weights have not changed.

There are two fundamentally different approaches to solving this:

Parametric memory — bake the knowledge into the model itself through fine-tuning or continual training. The model’s weights become the memory.
Non-parametric (external) memory — store knowledge outside the model in a structured format (engrams, vectors, knowledge graphs) and retrieve it at inference time. The model stays unchanged; the memory is a separate layer.

This is not a new debate. The retrieval-augmented generation (RAG) literature has explored the tension between parametric knowledge (stored in weights) and non-parametric knowledge (stored in external databases) since 2020. A 2023 survey of RAG (Gao et al., “Retrieval-Augmented Generation for Large Language Models: A Survey,” arXiv:2312.10997) frames the distinction clearly: LLMs “showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes.” RAG addresses this by incorporating knowledge from external databases, allowing “continuous knowledge updates and integration of domain-specific information” without retraining.

Agent memory is the same tradeoff, applied to a harder problem: not just facts, but corrections, preferences, procedures, and conventions that accumulate over time and across sessions.

Parametric memory: fast but opaque

When you fine-tune a model on domain knowledge — or continually retrain it on user context (Notion, Slack, GitHub) — the knowledge becomes part of the model’s weights. At inference time, recall is fast: no retrieval step, no external database, no latency from searching. The model just “knows.”

This approach — sometimes called model-native memory — has real advantages. Retrieval adds latency and can fail (wrong document retrieved, irrelevant context injected). A 2024 paper on Corrective RAG (Yan et al., arXiv:2401.15884) noted that RAG “relies heavily on the relevance of retrieved documents, raising concerns about how the model behaves if retrieval goes wrong.” When memory is in the weights, there is no retrieval step to go wrong.

But parametric memory has structural problems that fine-tuning cannot solve:

You cannot inspect what the model knows. A fine-tuned model is a matrix of billions of numbers. There is no entry for “the deploy key is at ~/.config/deploy” — that fact is distributed across weights in a way no one can read, diff, or audit. You cannot open a file and check what the model remembers.
You cannot correct a single wrong fact. If the model learned something wrong during fine-tuning, you cannot edit one entry. You must retrain — expensive, slow, and itself error-prone. Fine-tuning to remove a fact (machine unlearning) is an active research problem with no production-ready solution.
You cannot prove erasure. GDPR’s right to be forgotten requires demonstrable deletion. When knowledge is in weights, you cannot prove it is gone. You can retrain from scratch (prohibitively expensive) or attempt machine unlearning (unproven). With external engrams, deletion is trivial: remove the entry. The memory is provably gone because it was never in the weights to begin with.
Catastrophic forgetting. Continual training on new knowledge degrades older knowledge — the well-documented catastrophic forgetting problem in neural networks. Each new thing the model learns pushes out something it knew before. External memory does not forget unless you tell it to (via decay functions), and even then the decay is gradual and reversible.
Vendor lock-in. Memory baked into a specific model’s weights is locked to that model. Switch from GPT-4 to Claude, and the memory is gone — the weights do not transfer. External memory is model-agnostic: the same engrams work with any LLM.

Non-parametric memory: open and inspectable

External memory stores knowledge outside the model in a structured format. The open engram format (defined in the Engram Specification, Apache-2.0) represents each learned fact as a human-readable YAML entry:

id: ENG-2026-0702-001
statement: "The API rate limit is 100 req/min, not 1000."
type: behavioral
scope: project:api-gateway
provenance:
  source: session
  observed_at: 2026-07-02

This format has five properties that parametric memory cannot match:

Inspectable — you can read, diff, and version every engram. It is a file, not a number. An operator can open the file and see exactly what the agent has learned.
Instantly correctable — fix a single fact mid-conversation by editing one entry. No retraining. The correction takes effect on the next recall.
Provably deletable — delete the entry and the memory is gone, demonstrably. This is the basis for real (not best-effort) erasure — the foundation of GDPR-grade compliance. You cannot prove erasure from model weights.
Portable — engrams move across agents, tools, and machines. A correction made in Claude Code is available to Cursor, Hermes, or OpenClaw the next time the agent starts. Memory follows the operator, not the vendor.
Auditable at scale — for enterprise and institutional buyers, external memory can carry a verifiable record of who wrote a fact and who used it. PLUR Enterprise implements this today as a tamper-evident, hash-chained audit log (each entry cryptographically linked to the one before it, so altering history breaks the chain), plus a per-engram view of both provenance and recall history — who read this fact, when, via which tool. It is a real foundation for institutional-grade accountability; we will go deeper on it in a future piece.

MemGPT (Packer et al., 2023, arXiv:2310.08560) demonstrated a related idea: treating memory like an operating system manages memory tiers — fast (context window), main (working memory), and archival (long-term storage). The key insight was that memory management is an infrastructure problem, not a model problem. But MemGPT’s format is Letta-specific. The open engram format makes the same architectural choice — external, tiered, managed — but in a format anyone can implement.

When to use which

The honest answer is that both approaches have a place — but they solve different problems.

	Open engrams (external)	Model weights (parametric)
Best for	Corrections, preferences, procedures, conventions	Domain knowledge, language patterns, reasoning skills
Inspect	Read the file	Cannot
Correct	Edit one entry	Retrain
Delete	Remove entry — provable	Cannot prove erasure
Portability	Works across models	Locked to model
Latency	Retrieval adds ~50-200ms	Instant (in-weights)
Token cost	Retrieved context uses tokens	No retrieval tokens
Update speed	Instant (write a file)	Slow (retrain)
GDPR compliance	Provably deletable	Not provably deletable

For agent memory — the things an agent learns through interaction that should persist across sessions and tools — external engrams are the right choice. The knowledge is personal, contextual, and needs to be correctable. For domain expertise — deep knowledge of a field that improves the model’s reasoning — fine-tuning or domain-specific models remain valuable. These are complementary, not competing.

The relationship runs deeper than “pick one.” A typed, labeled, provenance-tagged engram store is also a clean fine-tuning corpus — the data is already the kind of curated signal a training run wants. As retraining gets cheaper (LoRA, distillation, smaller base models), it becomes plausible to periodically fold a distilled snapshot of stable engrams into weights for speed, while the open engram store stays the correctable, auditable source of truth behind it. That is a direction the field is heading, not a shipped pipeline today — but it reframes the question in this piece’s title: not a permanent fork between two architectures, but engrams as the record of truth that a model can, sometimes, be periodically retrained from.

The mistake is using parametric memory for things that should be external. When a user corrects an agent’s behavior, that correction is a fact — not a weight. When a preference is expressed, it is a configuration — not a parameter. When a procedure is learned, it is a recipe — not a gradient. Memory that must be readable, fixable, deletable, and portable should be stored in a format that is readable, fixable, deletable, and portable.

The emerging consensus

The research literature is converging on hybrid approaches. The 2024 survey of agent memory mechanisms (Zhang et al., arXiv:2404.13501) identified multiple memory architectures — parametric, non-parametric, and hybrid — and noted that “the key component to support agent-environment interactions is the memory of the agents,” with no single approach dominating. What is clear is that the memory layer is separating from the model layer: agents need infrastructure for memory, not just bigger context windows.

The practical implication: if you are building an agent that learns over time, store its memory as open, external engrams. If you are training a model for domain expertise, fine-tune. Do not confuse the two — and do not bake into weights what you might need to read, fix, or forget.

FAQ

Should AI memory be stored as engrams or model weights? For agent memory (corrections, preferences, procedures, conventions), store as open external engrams. For domain expertise and reasoning skills, model weights remain valuable. The two are complementary — do not bake into weights what you need to read, fix, or delete.

What is parametric memory in AI? Knowledge stored in a model’s weights through fine-tuning or continual training. It is fast at inference but cannot be inspected, individually corrected, or provably deleted.

What is non-parametric (external) memory? Knowledge stored outside the model in a structured format (engrams, vectors, knowledge graphs) and retrieved at inference time. It is inspectable, correctable, deletable, and portable across models.

Can you prove erasure from model weights? No. When knowledge is baked into weights, there is no reliable way to prove it has been removed. Machine unlearning is an active research problem. External engrams can be deleted by removing the entry — the erasure is provable because the knowledge was never in the weights.

What is catastrophic forgetting? When a neural network trained on new knowledge degrades in performance on older knowledge. This is a fundamental risk of continual training / parametric memory. External memory does not suffer from catastrophic forgetting — old entries persist unless explicitly decayed or deleted.

Sources

Gao, Y. et al. “Retrieval-Augmented Generation for Large Language Models: A Survey.” arXiv:2312.10997, December 2023. https://arxiv.org/abs/2312.10997
Yan, S. et al. “Corrective Retrieval Augmented Generation.” arXiv:2401.15884, January 2024. https://arxiv.org/abs/2401.15884
Packer, C. et al. “MemGPT: Towards LLMs as Operating Systems.” arXiv:2310.08560, October 2023. https://arxiv.org/abs/2310.08560
Zhang, Z. et al. “A Survey on the Memory Mechanism of Large Language Model based Agents.” arXiv:2404.13501, April 2024. https://arxiv.org/abs/2404.13501
The Engram Specification, v2.1, March 2026. https://plur.ai/spec.html (Apache-2.0)
PLUR — Open source memory for AI agents. Apache-2.0. https://github.com/plur-ai/plur