Memory in RAG apps

Memory is any persisted context your app reuses across interactions to improve relevance and personalization. In this series, I use “memory” deliberately as a layered concept: from short‑term chat history up to durable organizational knowledge.

Types of memory I use

Session memory (short‑term)
- Recent turns of the chat that ground immediate follow‑ups.
- Often summarized into a compact vector to keep prompts small.
Individual (user) memory
- Stable facts and evolving preferences about a person (projects, tools, style).
- Kept small, curated, and easy to explain (“why did the system know this?”).
Team memory
- Shared artifacts and terms for a workgroup (labels, norms, running initiatives).
- Queryable with team scope and access control.
Company memory
- Policies, product docs, handbooks — authoritative and broadly shared.
- Treated as the canonical knowledge layer, often indexed separately.

Why split? Each layer has different freshness, access, and blast radius. Mixing them blindly risks leaking private data or polluting global knowledge.

Developer considerations (don’t skip!)

Authorization and scope
- Always enforce “who can see what” at retrieval time (user, team, tenant).
- Consider document‑level ACLs and filter‑first retrieval (before vectors).
Privacy and PII
- Only extract/store what you truly need.
- Mask or pseudonymize where possible; consider client‑side encryption for sensitive fields.
Retention and TTL
- Session memory decays quickly; user/team memories should have explicit TTL or review workflows.
Provenance and corrections
- Track who/what created a memory and when; enable user edits and deletions.
Quality and drift
- Summaries rot; periodically re‑summarize or prune.
Multitenancy and partitioning
- Partition by tenant/user/team to keep costs predictable and queries fast.

Minimal data shapes (contracts)

Session memory
- id, sessionId, userId, summary, contentVector[dim], createdAt, ttl
User memory
- id, userId, kind (preference|skill|entity), value, source, vector?, createdAt, updatedAt
Team memory
- id, teamId, title, value, tags, acl[groupIds], vector?, createdAt

These are logical shapes — you’ll adapt them per store.

Retrieval patterns that work

Filter‑first, then vector
- Use partition keys and ACL filters to shrink the candidate set.
- Order by VectorDistance with TOP N to control RU and latency.
Hybrid signals
- Blend recency, frequency, and vector similarity for better results.
Personalization
- Start with user memory, then team, then company — narrow to relevant scope.
Safety
- Recheck permissions on the final set; avoid echoing sensitive values.

Memory architecture at a glance

In the next post, I’ll show this implemented with Azure Cosmos DB: vector indexes, ACL filtering, TTL, and C# snippets.