Architecture
How MemoryRouter adds user-scoped memory between your app and model providers.
Architecture
MemoryRouter is a user memory layer for AI products. It can sit in the model path as a proxy, or stay out of the model path and serve retrieval plus storage endpoints.
User
↓
Your App
↓ Authorization: Bearer mk_user_a
MemoryRouter
├─ retrieve relevant memories from Vault A
├─ add memory context to the provider request
├─ call OpenAI, Anthropic, Google, OpenRouter, or another provider
└─ store useful new context back into Vault A
↓
Model responseUser-scoped vaults
User A → Memory Key A → Vault A
User B → Memory Key B → Vault B
User C → Memory Key C → Vault CThe Memory Key on the request determines which vault is searched and updated. User A's request does not search User B's vault because the request authenticates against Vault A.
Two integration modes
MemoryRouter supports both proxy mode and local inference mode.
Proxy mode
MemoryRouter sits in the model path.
Your App
↓ Memory Key
MemoryRouter
├─ retrieves from the user's vault
├─ calls the model provider
└─ stores the completed exchange
↓
ResponseThis is the fastest integration. Swap the base URL, pass the user's Memory Key, and MemoryRouter handles retrieval, provider call, and storage.
Local inference mode
Your app keeps the model call.
Your App
├─ POST /v1/memory/prepare -> retrieve user memory
├─ call OpenAI, Anthropic, Gemini, or your own model directly
└─ POST /v1/memory/ingest -> store the completed exchangeThis is the controlled integration. MemoryRouter acts as retrieval and storage only. Your provider keys, routing, streaming, retries, logs, and evals stay inside your app.
Use Local inference mode when you want memory without proxying provider traffic through MemoryRouter.
Provider pass-through
MemoryRouter adds memory, then forwards the request to your provider.
Your App
├─ Memory Key: identifies the user vault
└─ Provider Key: identifies the model provider account
MemoryRouter
├─ retrieves memory
├─ calls provider
└─ stores new memoryYou can store provider keys in the dashboard or pass provider keys on each request with X-Memory-Key for BYOK.
Native provider endpoints
MemoryRouter does not force every provider through one translation layer. It exposes native endpoints where needed:
| Provider | Endpoint |
|---|---|
| OpenAI-compatible providers | POST /v1/chat/completions |
| Anthropic | POST /v1/messages |
| Google Gemini | POST /v1/models/:model:generateContent |
Responses stay provider-native. Memory metadata is handled outside the response body where possible.
Retrieval and storage loop
In proxy mode:
- Retrieve: Search the user's vault for relevant memories.
- Inject: Add high-signal memory context to the provider request.
- Forward: Send the request to the selected provider.
- Return: Return the provider response to your app.
- Store: Store useful new context for future requests.
In local inference mode:
- Prepare: Your app calls
/v1/memory/prepareto retrieve memory. - Infer: Your app injects that memory and calls the provider directly.
- Ingest: Your app calls
/v1/memory/ingestwith the completed exchange.
Sessions
Use X-Session-ID when you want a request to target or search a session-specific namespace. For most user-product integrations, the Memory Key is the durable user identity and sessions are optional.
Integration shapes
- Direct API: Your app calls
https://api.memoryrouter.ai/v1with a Memory Key per user. - OpenClaw: The plugin retrieves and injects memory locally inside OpenClaw.
- Open WebUI: Configure MemoryRouter as an OpenAI-compatible provider.
- CLI: Upload existing docs, transcripts, or knowledge into a vault.