API Reference
Endpoints for provisioning user memory, retrieving and storing context, and calling model providers through MemoryRouter.
API Reference
Base URL: https://api.memoryrouter.ai
MemoryRouter gives every user in your product a private memory vault. You authenticate each request with that user's Memory Key, and MemoryRouter scopes retrieval and storage to that user's vault.
There are two ways to integrate:
- Proxy mode: send your model request to MemoryRouter and it retrieves memory, calls the provider, and stores the result in one trip. See the provider endpoints below.
- Local inference mode: your app calls the provider directly and uses
/v1/memory/prepareand/v1/memory/ingestfor memory only. See Local inference mode.
A Memory Key looks like mk_xxxxxxxx. It both authenticates the request and identifies the vault.
Authentication
Pass the user's Memory Key on every request. Three header styles are accepted so existing SDKs work without changes:
| Header | Style | Use |
|---|---|---|
Authorization: Bearer mk_xxx | Standard | OpenAI SDK, most clients |
x-api-key: mk_xxx | Anthropic style | Anthropic SDK |
X-Memory-Key: mk_xxx | Pass-through | Send the Memory Key here and put a provider key in Authorization for BYOK |
Memory mode suffixes (mk_xxx:read, mk_xxx:write, mk_xxx:off) are stripped before lookup, so mk_abc:read authenticates as mk_abc. See Memory control.
Keys
POST /v1/keys
Create a Memory Key. Authenticate with an existing Memory Key from your account. The new key belongs to your account and inherits the provider keys you have stored, so a backend can mint one key per end user.
curl -X POST https://api.memoryrouter.ai/v1/keys \
-H "Authorization: Bearer $MEMORYROUTER_ACCOUNT_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "user:42" }'| Field | Type | Required | Description |
|---|---|---|---|
name | string | No | Label for the key. Defaults to New Key. Use it to map keys to your internal user id. |
Response (201):
{
"key": "mk_xxxxxxxxxxxxxxxx",
"name": "user:42",
"created_at": "2026-06-05T21:00:00.000Z"
}Store only the returned key mapped to your internal user_id. Keep the account key you authenticate with server-side only. See the Quickstart for the full getOrCreateMemoryKey pattern and User lifecycle for rotation and deletion.
Memory endpoints
Use these for local inference mode: your app calls the model provider directly and MemoryRouter handles retrieval plus storage only. They do not call a model and are provider-agnostic.
| Endpoint | Purpose |
|---|---|
POST /v1/memory/prepare | Retrieve relevant user memory before inference |
POST /v1/memory/ingest | Store the completed exchange after inference |
POST /v1/memory/search | Search a vault directly and get ranked memories |
POST /v1/memory/upload | Bulk import existing memories from JSONL |
GET /v1/memory/stats | Memory statistics for a key |
POST /v1/memory/warmup | Pre-load a vault for a faster first request |
DELETE /v1/memory | Clear memory for a key or session |
POST /v1/memory/prepare
Retrieve formatted memory context for the user's vault. Send the conversation messages. MemoryRouter builds the retrieval query from the recent non-system turns, so you do not write a query string. The response is a plain text block you inject into your own prompt.
curl -X POST https://api.memoryrouter.ai/v1/memory/prepare \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/json" \
-H "X-Session-ID: conversation_abc" \
-d '{
"messages": [
{ "role": "user", "content": "What should I focus on today?" }
],
"density": "default"
}'| Field | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | Conversation messages. The retrieval query is built from the recent non-system turns. Send the same messages you are about to send the model. |
session_id | string | No | Session grouping. Also accepted as X-Session-ID. |
density | string | No | Retrieval density: low, default, high, or xhigh. |
context_limit | number | No | Explicit override for how many memory chunks to retrieve. |
embeddings | string | No | Embedding model override. Also accepted as X-Embedding-Model. |
Response:
{
"context": "<memory_context>\n[MEMORY - 2 days ago (Wed, Jun 3, 9:14 AM)] User prefers concise coaching and trains at 6am.\n</memory_context>\n\nThe above are retrieved memories from past conversations. Use them as background context, do not respond to them directly.",
"memories_found": 1,
"memory_tokens": 24,
"retrieval_tokens": 24,
"tokens_billed": 120,
"metrics": { "total_ms": 42 }
}The context field is a ready-to-inject text block. Drop it into your system prompt before calling your provider. If no relevant memory is found, context is null.
POST /v1/memory/ingest
Store a completed interaction in the user's vault after your app has called the provider directly. Returns 202 immediately and stores in the background.
curl -X POST https://api.memoryrouter.ai/v1/memory/ingest \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/json" \
-H "X-Session-ID: conversation_abc" \
-d '{
"model": "openai/gpt-5.5",
"messages": [
{ "role": "user", "content": "What should I focus on today?" },
{ "role": "assistant", "content": "Focus on the launch checklist and your 6am training block." }
]
}'| Field | Type | Required | Description |
|---|---|---|---|
messages | array | Yes | The user and assistant turns to store. |
session_id | string | No | Session grouping. Also accepted as X-Session-ID. |
model | string | No | Model name, stored as usage metadata. |
embeddings | string | No | Embedding model override. Also accepted as X-Embedding-Model. |
Response (202):
{
"accepted": true,
"queued": true,
"retrieval_tokens": 27,
"response_tokens": 14,
"message": "Ingest accepted for background processing"
}POST /v1/memory/search
Search a vault directly. Returns matching memories ranked by semantic similarity across all time windows.
curl -X POST https://api.memoryrouter.ai/v1/memory/search \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/json" \
-d '{ "query": "what does the user prefer", "limit": 20 }'| Field | Type | Required | Description |
|---|---|---|---|
query | string | Yes | Search query text. |
limit | number | Yes | Number of results to return. |
Add X-Session-ID to search a session vault instead of core memory.
Response:
{
"query": "what does the user prefer",
"sessionId": null,
"memoryKey": "mk_user-123",
"totalMemories": 20,
"tokenCount": 4832,
"windowBreakdown": { "hot": 3, "working": 8, "longterm": 9 },
"memories": [
{
"id": "memory_abc123",
"role": "user",
"content": "I prefer dark mode and concise responses",
"score": 0.82,
"window": "longterm",
"timestamp": "2026-01-20T14:30:00Z",
"source": "core"
}
]
}This is useful for data export: pull a user's memories with their key and include them in your own export package.
POST /v1/memory/upload
Bulk import existing memories from JSONL, one memory per line. This is how you backfill a new user or migrate off an in-house memory store.
curl -X POST https://api.memoryrouter.ai/v1/memory/upload \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/x-ndjson" \
--data-binary @memories.jsonlJSONL format (one object per line):
{"content": "User prefers dark mode", "role": "user", "timestamp": 1733000000000}
{"content": "The meeting is scheduled for Friday at 3pm"}
{"content": "Customer is interested in the enterprise plan", "role": "assistant"}| Field | Type | Required | Default | Description |
|---|---|---|---|---|
content | string | Yes | The memory text to store. | |
role | string | No | user | user, assistant, or system. |
timestamp | number | No | now | Unix timestamp in milliseconds. |
Response:
{
"status": "complete",
"memoryKey": "mk_user-123",
"vault": "core",
"stats": {
"inputItems": 150,
"memories": 150,
"stored": 150,
"failed": 0
},
"message": "Stored 150 memories from 150 items"
}Limits: maximum 10,000 lines per upload. Split larger files into batches. Add X-Session-ID to load into a session vault.
GET /v1/memory/stats
Memory statistics for the authenticated key.
curl https://api.memoryrouter.ai/v1/memory/stats \
-H "Authorization: Bearer mk_user-123"POST /v1/memory/warmup
Pre-load a vault into memory for a faster first request. Useful after a cold start. Add X-Session-ID to warm a session vault.
curl -X POST https://api.memoryrouter.ai/v1/memory/warmup \
-H "Authorization: Bearer mk_user-123"DELETE /v1/memory
Clear memory for a key. Run this as part of your own account-deletion flow.
# Clear all memory for the key
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
-H "Authorization: Bearer mk_user-123"
# Clear only a specific session
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
-H "Authorization: Bearer mk_user-123" \
-H "X-Session-ID: session-123"
# Full reset (allows re-embedding with new dimensions)
curl -X DELETE "https://api.memoryrouter.ai/v1/memory?reset=true" \
-H "Authorization: Bearer mk_user-123"Provider endpoints (proxy mode)
In proxy mode, point your provider's SDK at MemoryRouter and pass the user's Memory Key. MemoryRouter retrieves memory, calls the provider, and stores the result.
| Provider | Endpoint | SDK |
|---|---|---|
| OpenAI, xAI, DeepSeek, Mistral, Cerebras, OpenRouter | POST /v1/chat/completions | OpenAI SDK |
| Anthropic | POST /v1/messages | Anthropic SDK |
| Google Gemini | POST /v1/models/:model:generateContent | Google AI SDK |
Responses stay provider-native. MemoryRouter does not reshape the provider's response body.
POST /v1/chat/completions
Works with the OpenAI SDK and any OpenAI-compatible provider.
curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.5",
"messages": [{ "role": "user", "content": "My name is Alice" }]
}'from openai import OpenAI
client = OpenAI(
base_url="https://api.memoryrouter.ai/v1",
api_key="mk_user-123",
)
response = client.chat.completions.create(
model="openai/gpt-5.5",
messages=[{"role": "user", "content": "My name is Alice"}],
)import OpenAI from 'openai'
const client = new OpenAI({
baseURL: 'https://api.memoryrouter.ai/v1',
apiKey: 'mk_user-123'
})
const response = await client.chat.completions.create({
model: 'openai/gpt-5.5',
messages: [{ role: 'user', content: 'My name is Alice' }]
})| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model id (e.g., openai/gpt-5.5, anthropic/claude-opus-4.8) |
messages | array | Yes | Message objects |
stream | boolean | No | Enable streaming |
temperature | number | No | Sampling temperature (0-2) |
max_tokens | number | No | Maximum tokens to generate |
top_p | number | No | Nucleus sampling |
frequency_penalty | number | No | Frequency penalty (-2 to 2) |
presence_penalty | number | No | Presence penalty (-2 to 2) |
stop | string/array | No | Stop sequences |
POST /v1/messages
Native Anthropic format. Use the Anthropic SDK directly. MemoryRouter accepts and returns Anthropic's request and response format unchanged.
curl -X POST https://api.memoryrouter.ai/v1/messages \
-H "x-api-key: mk_user-123" \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-opus-4.8",
"max_tokens": 1024,
"messages": [{ "role": "user", "content": "My name is Alice" }]
}'import Anthropic from '@anthropic-ai/sdk'
const client = new Anthropic({
baseURL: 'https://api.memoryrouter.ai',
apiKey: 'mk_user-123'
})
const response = await client.messages.create({
model: 'claude-opus-4.8',
max_tokens: 1024,
messages: [{ role: 'user', content: 'My name is Alice' }]
})Native Anthropic parameters apply: model, messages, max_tokens (required), system, temperature, top_p, top_k, stop_sequences, stream.
POST /v1/models/:model:generateContent
Native Google Gemini format. Use Google's AI SDK directly.
curl -X POST "https://api.memoryrouter.ai/v1/models/gemini-3.5-flash:generateContent" \
-H "Authorization: Bearer mk_user-123" \
-H "Content-Type: application/json" \
-d '{
"contents": [{ "role": "user", "parts": [{ "text": "My name is Alice" }] }],
"generationConfig": { "maxOutputTokens": 1024 }
}'Streaming uses :streamGenerateContent. Native Google parameters apply under contents, systemInstruction, generationConfig, and safetySettings.
Memory control
Control memory behavior per request with key suffixes, headers, query parameters, or body fields.
Key suffixes (easiest)
Append a mode to the key. No headers or code changes. Ideal for platforms where you can only set an API key (Open WebUI, LibreChat, coding tools).
| Key format | Retrieve | Store | Use case |
|---|---|---|---|
mk_xxx | Yes | Yes | Normal (default) |
mk_xxx:read | Yes | No | Use memory without storing new turns |
mk_xxx:write | No | Yes | Store without retrieving (bulk import) |
mk_xxx:off | No | No | Stateless, no memory |
The suffix is stripped before authentication, so mk_abc:read authenticates as mk_abc.
Especially useful for coding tools. In Claude Code, Cursor, or Windsurf, use mk_xxx:read so the AI recalls your past decisions without flooding the vault with generated code:
export ANTHROPIC_BASE_URL=https://api.memoryrouter.ai/v1
export ANTHROPIC_API_KEY=mk_xxx:readHeaders
| Header | Values | Default | Description |
|---|---|---|---|
X-Memory-Mode | on, off, read, write | on | Memory operation mode |
X-Memory-Store | true, false | true | Store user input |
X-Memory-Store-Response | true, false | true | Store assistant response |
X-Session-ID | string | Group requests into a session vault | |
X-Memory-Key | mk_xxx | Pass-through auth (use with a provider key in Authorization) |
Query parameters
| Parameter | Example | Effect |
|---|---|---|
?memory=off | /v1/chat/completions?memory=off | Disable memory entirely |
?mode=read | /v1/chat/completions?mode=read | Read-only, do not store this exchange |
?store=false | /v1/chat/completions?store=false | Do not store user input |
Body fields
These are stripped before the request is forwarded to the provider:
{
"model": "openai/gpt-5.5",
"messages": [],
"memory": false,
"memory_mode": "read",
"memory_store": false,
"memory_store_response": false,
"session_id": "user-123-chat-456"
}Per-message control
Exclude a single message from storage with "memory": false:
{
"messages": [
{ "role": "user", "content": "Remember this", "memory": true },
{ "role": "user", "content": "Do not store this", "memory": false }
]
}Sessions
Sessions group related conversations. When you pass X-Session-ID (or session_id), memory is scoped to that session in addition to the user's core vault.
- Each session gets its own memory space.
- Core memory (no session) holds persistent, cross-session context.
- Session memory is recalled alongside core memory.
- Clear one session without touching core:
DELETE /v1/memorywithX-Session-ID.
For most user-product integrations the Memory Key is the durable user identity and sessions are optional.
Models
GET /v1/models
List models available from your configured providers.
curl https://api.memoryrouter.ai/v1/models \
-H "Authorization: Bearer mk_user-123"Returns providers and models, the default model, and a catalog timestamp. Use the full provider-prefixed name (e.g., openai/gpt-5.5, anthropic/claude-opus-4.8, google/gemini-3.5-flash).
Usage
GET /v1/account/usage
Token usage for the authenticated key over a date range (defaults to the last 30 days).
curl "https://api.memoryrouter.ai/v1/account/usage?start=2026-01-01&end=2026-02-01" \
-H "Authorization: Bearer mk_user-123"Returns the key, the period, and aggregated request and token counts. To break usage down per end user, query each user's key, since one key maps to one user vault.
Pass-through endpoints
These forward to the provider without memory processing:
| Endpoint | Provider | Description |
|---|---|---|
POST /v1/audio/transcriptions | OpenAI | Whisper transcription |
POST /v1/audio/speech | OpenAI | Text to speech |
POST /v1/images/generations | OpenAI | Image generation |
POST /v1/embeddings | OpenAI | Text embeddings |
curl -X POST https://api.memoryrouter.ai/v1/audio/transcriptions \
-H "Authorization: Bearer mk_user-123" \
-F "file=@audio.mp3" \
-F "model=whisper-1"Provider keys (BYOK)
Pass a provider key directly instead of storing one in the dashboard. Identify the user vault with X-Memory-Key:
curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
-H "X-Memory-Key: mk_user-123" \
-H "Authorization: Bearer sk-your-openai-key" \
-H "Content-Type: application/json" \
-d '{ "model": "openai/gpt-5.5", "messages": [] }'Or keep the Memory Key in Authorization and pass the provider key in X-Provider-Key:
curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
-H "Authorization: Bearer mk_user-123" \
-H "X-Provider-Key: sk-your-openai-key" \
-d '...'Response headers
MemoryRouter adds timing headers to chat responses:
| Header | Description |
|---|---|
X-MR-Processing-Ms | MemoryRouter processing time |
X-Provider-Response-Ms | Time waiting for the provider |
X-Total-Ms | End-to-end request time |
X-Memory-Tokens-Retrieved | Tokens of memory retrieved |
X-Memory-Tokens-Injected | Tokens of memory injected into the prompt |
X-Session-ID | Echo of the session id, if provided |
Health
GET /health
No authentication required.
curl https://api.memoryrouter.ai/healthSemantic-temporal memory
MemoryRouter indexes memory by meaning and by time. Recent context is weighted higher, while important facts persist long term. Retrieval balances immediate context, recent history, and long-term memory automatically. No configuration required.
Error codes
| Code | Meaning | Common causes |
|---|---|---|
| 400 | Bad request | Missing required fields, invalid JSON |
| 401 | Unauthorized | Invalid Memory Key, missing provider key |
| 402 | Payment required | No card on file (upload) or balance exhausted |
| 413 | Payload too large | Upload exceeds 10,000 lines |
| 429 | Rate limited | Too many requests |
| 500 | Internal error | Server-side issue |
| 502 | Provider error | Upstream provider failed |
Error response:
{
"error": "No API key configured for provider: anthropic",
"hint": "Add your anthropic API key in your account settings, or pass X-Provider-Key header"
}