Architecture

MemoryRouter is a user memory layer for AI products. It can sit in the model path as a proxy, or stay out of the model path and serve retrieval plus storage endpoints.

User
 ↓
Your App
 ↓  Authorization: Bearer mk_user_a
MemoryRouter
 ├─ retrieve relevant memories from Vault A
 ├─ add memory context to the provider request
 ├─ call OpenAI, Anthropic, Google, OpenRouter, or another provider
 └─ store useful new context back into Vault A
 ↓
Model response

User-scoped vaults

User A → Memory Key A → Vault A
User B → Memory Key B → Vault B
User C → Memory Key C → Vault C

The Memory Key on the request determines which vault is searched and updated. User A's request does not search User B's vault because the request authenticates against Vault A.

Two integration modes

MemoryRouter supports both proxy mode and local inference mode.

Proxy mode

MemoryRouter sits in the model path.

Your App
 ↓ Memory Key
MemoryRouter
 ├─ retrieves from the user's vault
 ├─ calls the model provider
 └─ stores the completed exchange
 ↓
Response

This is the fastest integration. Swap the base URL, pass the user's Memory Key, and MemoryRouter handles retrieval, provider call, and storage.

Local inference mode

Your app keeps the model call.

Your App
 ├─ POST /v1/memory/prepare  -> retrieve user memory
 ├─ call OpenAI, Anthropic, Gemini, or your own model directly
 └─ POST /v1/memory/ingest   -> store the completed exchange

This is the controlled integration. MemoryRouter acts as retrieval and storage only. Your provider keys, routing, streaming, retries, logs, and evals stay inside your app.

Use Local inference mode when you want memory without proxying provider traffic through MemoryRouter.

Provider pass-through

MemoryRouter adds memory, then forwards the request to your provider.

Your App
  ├─ Memory Key: identifies the user vault
  └─ Provider Key: identifies the model provider account

MemoryRouter
  ├─ retrieves memory
  ├─ calls provider
  └─ stores new memory

You can store provider keys in the dashboard or pass provider keys on each request with X-Memory-Key for BYOK.

Native provider endpoints

MemoryRouter does not force every provider through one translation layer. It exposes native endpoints where needed:

Provider	Endpoint
OpenAI-compatible providers	`POST /v1/chat/completions`
Anthropic	`POST /v1/messages`
Google Gemini	`POST /v1/models/:model:generateContent`

Responses stay provider-native. Memory metadata is handled outside the response body where possible.

Retrieval and storage loop

In proxy mode:

Retrieve: Search the user's vault for relevant memories.
Inject: Add high-signal memory context to the provider request.
Forward: Send the request to the selected provider.
Return: Return the provider response to your app.
Store: Store useful new context for future requests.

In local inference mode:

Prepare: Your app calls /v1/memory/prepare to retrieve memory.
Infer: Your app injects that memory and calls the provider directly.
Ingest: Your app calls /v1/memory/ingest with the completed exchange.

Sessions

Use X-Session-ID when you want a request to target or search a session-specific namespace. For most user-product integrations, the Memory Key is the durable user identity and sessions are optional.

Integration shapes

Direct API: Your app calls https://api.memoryrouter.ai/v1 with a Memory Key per user.
OpenClaw: The plugin retrieves and injects memory locally inside OpenClaw.
Open WebUI: Configure MemoryRouter as an OpenAI-compatible provider.
CLI: Upload existing docs, transcripts, or knowledge into a vault.

Architecture

On this page