MemoryRouterMemoryRouter

Overview

Memory for every user in your AI product.

Memory for every user in your AI product

MemoryRouter gives every user in your AI product their own private memory vault. Your companion, coach, tutor, support agent, sales assistant, or internal copilot can remember the user across sessions without building vector infrastructure, key management, or retrieval pipelines.

Your App
  ├─ User A → Memory Key A → Vault A
  ├─ User B → Memory Key B → Vault B
  └─ User C → Memory Key C → Vault C

One key identifies one vault. Every request made with that key retrieves and stores memory only for that user.

Start here

GuideWhat you get
Quickstart: Add memory to your AI appThe app-shaped user_id → memory_key integration path.
ArchitectureWhere MemoryRouter sits between your app, user vaults, and model providers.
Core conceptsMemory Keys, vaults, retrieval, storage, sessions, and provider pass-through.
Local inference modeUse /memory/prepare and /memory/ingest when your app calls the model provider directly.
Security & data isolationUser isolation, deletion, key rotation, provider flow, and compliance posture.
Product patternsCompanion apps, coaches, tutors, support agents, sales assistants, and copilots.
OpenClaw integrationThe mr-memory plugin path for OpenClaw agents.

Two ways to integrate

There are two ways to give your users memory. Most teams start with proxy mode.

Point your existing AI SDK at MemoryRouter and pass the user's Memory Key. We do everything in one round trip: retrieve the right memories, inject them into the prompt, call the model provider, return the response, and store the new context.

Your App  ──(Memory Key + messages)──▶  MemoryRouter ──▶  Model provider

                                     retrieve + store

Why default to proxy mode:

  • One call instead of three. You send the request, we handle retrieval, the provider call, and storage.
  • Fewer network round trips, so lower total latency.
  • No retrieval, injection, or storage code in your app. Swap the base URL and you are done.

Local inference mode

You keep the model call. MemoryRouter is retrieval and storage only. Your app calls /memory/prepare to get memory, calls your own provider directly, then calls /memory/ingest to store the exchange.

Your App ──(messages)──▶ /memory/prepare ──▶ memory context
Your App ──▶ Model provider (your keys, your routing)
Your App ──(full exchange)──▶ /memory/ingest ──▶ stored

Why choose local inference mode:

  • You already own provider routing, streaming, retries, evals, or logging and want to keep model traffic inside your stack.
  • You run your own model gateway or self-hosted models.
  • You want MemoryRouter handling only user-scoped memory, not proxying inference.

The tradeoff is more round trips and a little glue code. Both modes use the same Memory Keys and the same per-user vaults, so you can start with proxy mode and move later without changing the user's memory boundary.

See Local inference mode for the full prepare / ingest flow with request and response shapes.

The model

MemoryRouter is a user memory layer for AI products:

  1. Your app maps each internal user to a Memory Key.
  2. Proxy mode: your app sends the user's AI request through MemoryRouter, and we retrieve, inject, call the provider, and store in one trip. Local inference mode: your app calls /memory/prepare, calls its own provider, then calls /memory/ingest.
  3. Either way, memory is scoped to that user's private vault.

Your product gets continuity. Your users stop starting over. Your infrastructure stays simple.

Import existing memories

Your users are not starting from zero. If you already have chat history, profiles, notes, support tickets, or transcripts, load them into a user's vault so memory works from the first message.

POST /v1/memory/upload accepts JSONL, one memory per line, authenticated with that user's Memory Key.

curl -X POST https://api.memoryrouter.ai/v1/memory/upload \
  -H "Authorization: Bearer mk_user_123" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary '
{"content": "Prefers concise answers and trains at 6am", "role": "user"}
{"content": "Launching a new product next week", "role": "user"}
{"content": "Asked for help with onboarding flow", "role": "user", "timestamp": 1733000000000}
'

Each line needs a content string. role and timestamp are optional. Send the key for the user whose vault you are loading, and the memories land in that vault only. This is how teams migrate off an in-house vector store or backfill a brand-new user from existing data.

See User lifecycle for when to upload (signup, migration, backfill) and API Reference for the full upload contract.

Memory follows the user, not the session

A Memory Key is durable identity. The same key retrieves the same vault no matter where the request comes from: web, mobile, a background job, or a different model entirely.

  • A user can start on web and continue on mobile with full continuity.
  • You can switch the user from GPT to Claude to Gemini and their memory carries over, because memory lives in the vault, not the model.
  • Sessions are optional. Use X-Session-ID only when you want to scope memory to a single conversation thread.

Keep one stable Memory Key per user for the life of their account and the memory just follows them.

Built for multi-user products

  • Private vault per user: Memory retrieval is scoped to the Memory Key on the request.
  • OpenAI-compatible API: Swap the base URL and keep your existing SDK flow.
  • Provider pass-through: Use stored provider keys or pass provider keys per request for BYOK.
  • Usage-based memory: Cost follows actual memory activity, not total registered accounts.
  • Integrations included: OpenClaw, Open WebUI, CLI upload, native provider endpoints, and direct memory search.

Common paths

I am building a companion app

Start with Product patterns, then implement Quickstart. The key concept is stable identity: keep the user's Memory Key stable for the lifetime of their account.

I am replacing an internal memory stack

Read Architecture, Cost model, and User lifecycle. You can import existing memories through the upload endpoint documented in API Reference.

I am using OpenClaw

Go straight to OpenClaw. The plugin handles local retrieval, storage, upload, and workspace sync for OpenClaw agents.

In short

  • Every user gets a private vault. One Memory Key, one vault, isolated by design.
  • Two ways to integrate. Proxy mode for the fastest path, local inference mode when you own the model call.
  • Bring your existing data. Upload history, profiles, and transcripts so memory works from message one.
  • Memory follows the user. Across sessions, devices, and model providers.
  • Pay for usage, not seats. Cost tracks real memory activity, not your registered account count.

Ready to build? Start with the Quickstart.

On this page