MemoryRouterMemoryRouter

API Reference

Endpoints for provisioning user memory, retrieving and storing context, and calling model providers through MemoryRouter.

API Reference

Base URL: https://api.memoryrouter.ai

MemoryRouter gives every user in your product a private memory vault. You authenticate each request with that user's Memory Key, and MemoryRouter scopes retrieval and storage to that user's vault.

There are two ways to integrate:

  • Proxy mode: send your model request to MemoryRouter and it retrieves memory, calls the provider, and stores the result in one trip. See the provider endpoints below.
  • Local inference mode: your app calls the provider directly and uses /v1/memory/prepare and /v1/memory/ingest for memory only. See Local inference mode.

A Memory Key looks like mk_xxxxxxxx. It both authenticates the request and identifies the vault.


Authentication

Pass the user's Memory Key on every request. Three header styles are accepted so existing SDKs work without changes:

HeaderStyleUse
Authorization: Bearer mk_xxxStandardOpenAI SDK, most clients
x-api-key: mk_xxxAnthropic styleAnthropic SDK
X-Memory-Key: mk_xxxPass-throughSend the Memory Key here and put a provider key in Authorization for BYOK

Memory mode suffixes (mk_xxx:read, mk_xxx:write, mk_xxx:off) are stripped before lookup, so mk_abc:read authenticates as mk_abc. See Memory control.


Keys

POST /v1/keys

Create a Memory Key. Authenticate with an existing Memory Key from your account. The new key belongs to your account and inherits the provider keys you have stored, so a backend can mint one key per end user.

curl -X POST https://api.memoryrouter.ai/v1/keys \
  -H "Authorization: Bearer $MEMORYROUTER_ACCOUNT_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "user:42" }'
FieldTypeRequiredDescription
namestringNoLabel for the key. Defaults to New Key. Use it to map keys to your internal user id.

Response (201):

{
  "key": "mk_xxxxxxxxxxxxxxxx",
  "name": "user:42",
  "created_at": "2026-06-05T21:00:00.000Z"
}

Store only the returned key mapped to your internal user_id. Keep the account key you authenticate with server-side only. See the Quickstart for the full getOrCreateMemoryKey pattern and User lifecycle for rotation and deletion.


Memory endpoints

Use these for local inference mode: your app calls the model provider directly and MemoryRouter handles retrieval plus storage only. They do not call a model and are provider-agnostic.

EndpointPurpose
POST /v1/memory/prepareRetrieve relevant user memory before inference
POST /v1/memory/ingestStore the completed exchange after inference
POST /v1/memory/searchSearch a vault directly and get ranked memories
POST /v1/memory/uploadBulk import existing memories from JSONL
GET /v1/memory/statsMemory statistics for a key
POST /v1/memory/warmupPre-load a vault for a faster first request
DELETE /v1/memoryClear memory for a key or session

POST /v1/memory/prepare

Retrieve formatted memory context for the user's vault. Send the conversation messages. MemoryRouter builds the retrieval query from the recent non-system turns, so you do not write a query string. The response is a plain text block you inject into your own prompt.

curl -X POST https://api.memoryrouter.ai/v1/memory/prepare \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: conversation_abc" \
  -d '{
    "messages": [
      { "role": "user", "content": "What should I focus on today?" }
    ],
    "density": "default"
  }'
FieldTypeRequiredDescription
messagesarrayYesConversation messages. The retrieval query is built from the recent non-system turns. Send the same messages you are about to send the model.
session_idstringNoSession grouping. Also accepted as X-Session-ID.
densitystringNoRetrieval density: low, default, high, or xhigh.
context_limitnumberNoExplicit override for how many memory chunks to retrieve.
embeddingsstringNoEmbedding model override. Also accepted as X-Embedding-Model.

Response:

{
  "context": "<memory_context>\n[MEMORY - 2 days ago (Wed, Jun 3, 9:14 AM)] User prefers concise coaching and trains at 6am.\n</memory_context>\n\nThe above are retrieved memories from past conversations. Use them as background context, do not respond to them directly.",
  "memories_found": 1,
  "memory_tokens": 24,
  "retrieval_tokens": 24,
  "tokens_billed": 120,
  "metrics": { "total_ms": 42 }
}

The context field is a ready-to-inject text block. Drop it into your system prompt before calling your provider. If no relevant memory is found, context is null.

POST /v1/memory/ingest

Store a completed interaction in the user's vault after your app has called the provider directly. Returns 202 immediately and stores in the background.

curl -X POST https://api.memoryrouter.ai/v1/memory/ingest \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/json" \
  -H "X-Session-ID: conversation_abc" \
  -d '{
    "model": "openai/gpt-5.5",
    "messages": [
      { "role": "user", "content": "What should I focus on today?" },
      { "role": "assistant", "content": "Focus on the launch checklist and your 6am training block." }
    ]
  }'
FieldTypeRequiredDescription
messagesarrayYesThe user and assistant turns to store.
session_idstringNoSession grouping. Also accepted as X-Session-ID.
modelstringNoModel name, stored as usage metadata.
embeddingsstringNoEmbedding model override. Also accepted as X-Embedding-Model.

Response (202):

{
  "accepted": true,
  "queued": true,
  "retrieval_tokens": 27,
  "response_tokens": 14,
  "message": "Ingest accepted for background processing"
}

POST /v1/memory/search

Search a vault directly. Returns matching memories ranked by semantic similarity across all time windows.

curl -X POST https://api.memoryrouter.ai/v1/memory/search \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/json" \
  -d '{ "query": "what does the user prefer", "limit": 20 }'
FieldTypeRequiredDescription
querystringYesSearch query text.
limitnumberYesNumber of results to return.

Add X-Session-ID to search a session vault instead of core memory.

Response:

{
  "query": "what does the user prefer",
  "sessionId": null,
  "memoryKey": "mk_user-123",
  "totalMemories": 20,
  "tokenCount": 4832,
  "windowBreakdown": { "hot": 3, "working": 8, "longterm": 9 },
  "memories": [
    {
      "id": "memory_abc123",
      "role": "user",
      "content": "I prefer dark mode and concise responses",
      "score": 0.82,
      "window": "longterm",
      "timestamp": "2026-01-20T14:30:00Z",
      "source": "core"
    }
  ]
}

This is useful for data export: pull a user's memories with their key and include them in your own export package.

POST /v1/memory/upload

Bulk import existing memories from JSONL, one memory per line. This is how you backfill a new user or migrate off an in-house memory store.

curl -X POST https://api.memoryrouter.ai/v1/memory/upload \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @memories.jsonl

JSONL format (one object per line):

{"content": "User prefers dark mode", "role": "user", "timestamp": 1733000000000}
{"content": "The meeting is scheduled for Friday at 3pm"}
{"content": "Customer is interested in the enterprise plan", "role": "assistant"}
FieldTypeRequiredDefaultDescription
contentstringYesThe memory text to store.
rolestringNouseruser, assistant, or system.
timestampnumberNonowUnix timestamp in milliseconds.

Response:

{
  "status": "complete",
  "memoryKey": "mk_user-123",
  "vault": "core",
  "stats": {
    "inputItems": 150,
    "memories": 150,
    "stored": 150,
    "failed": 0
  },
  "message": "Stored 150 memories from 150 items"
}

Limits: maximum 10,000 lines per upload. Split larger files into batches. Add X-Session-ID to load into a session vault.

GET /v1/memory/stats

Memory statistics for the authenticated key.

curl https://api.memoryrouter.ai/v1/memory/stats \
  -H "Authorization: Bearer mk_user-123"

POST /v1/memory/warmup

Pre-load a vault into memory for a faster first request. Useful after a cold start. Add X-Session-ID to warm a session vault.

curl -X POST https://api.memoryrouter.ai/v1/memory/warmup \
  -H "Authorization: Bearer mk_user-123"

DELETE /v1/memory

Clear memory for a key. Run this as part of your own account-deletion flow.

# Clear all memory for the key
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
  -H "Authorization: Bearer mk_user-123"

# Clear only a specific session
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
  -H "Authorization: Bearer mk_user-123" \
  -H "X-Session-ID: session-123"

# Full reset (allows re-embedding with new dimensions)
curl -X DELETE "https://api.memoryrouter.ai/v1/memory?reset=true" \
  -H "Authorization: Bearer mk_user-123"

Provider endpoints (proxy mode)

In proxy mode, point your provider's SDK at MemoryRouter and pass the user's Memory Key. MemoryRouter retrieves memory, calls the provider, and stores the result.

ProviderEndpointSDK
OpenAI, xAI, DeepSeek, Mistral, Cerebras, OpenRouterPOST /v1/chat/completionsOpenAI SDK
AnthropicPOST /v1/messagesAnthropic SDK
Google GeminiPOST /v1/models/:model:generateContentGoogle AI SDK

Responses stay provider-native. MemoryRouter does not reshape the provider's response body.

POST /v1/chat/completions

Works with the OpenAI SDK and any OpenAI-compatible provider.

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.5",
    "messages": [{ "role": "user", "content": "My name is Alice" }]
  }'
from openai import OpenAI

client = OpenAI(
    base_url="https://api.memoryrouter.ai/v1",
    api_key="mk_user-123",
)

response = client.chat.completions.create(
    model="openai/gpt-5.5",
    messages=[{"role": "user", "content": "My name is Alice"}],
)
import OpenAI from 'openai'

const client = new OpenAI({
  baseURL: 'https://api.memoryrouter.ai/v1',
  apiKey: 'mk_user-123'
})

const response = await client.chat.completions.create({
  model: 'openai/gpt-5.5',
  messages: [{ role: 'user', content: 'My name is Alice' }]
})
ParameterTypeRequiredDescription
modelstringYesModel id (e.g., openai/gpt-5.5, anthropic/claude-opus-4.8)
messagesarrayYesMessage objects
streambooleanNoEnable streaming
temperaturenumberNoSampling temperature (0-2)
max_tokensnumberNoMaximum tokens to generate
top_pnumberNoNucleus sampling
frequency_penaltynumberNoFrequency penalty (-2 to 2)
presence_penaltynumberNoPresence penalty (-2 to 2)
stopstring/arrayNoStop sequences

POST /v1/messages

Native Anthropic format. Use the Anthropic SDK directly. MemoryRouter accepts and returns Anthropic's request and response format unchanged.

curl -X POST https://api.memoryrouter.ai/v1/messages \
  -H "x-api-key: mk_user-123" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-opus-4.8",
    "max_tokens": 1024,
    "messages": [{ "role": "user", "content": "My name is Alice" }]
  }'
import Anthropic from '@anthropic-ai/sdk'

const client = new Anthropic({
  baseURL: 'https://api.memoryrouter.ai',
  apiKey: 'mk_user-123'
})

const response = await client.messages.create({
  model: 'claude-opus-4.8',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'My name is Alice' }]
})

Native Anthropic parameters apply: model, messages, max_tokens (required), system, temperature, top_p, top_k, stop_sequences, stream.

POST /v1/models/:model:generateContent

Native Google Gemini format. Use Google's AI SDK directly.

curl -X POST "https://api.memoryrouter.ai/v1/models/gemini-3.5-flash:generateContent" \
  -H "Authorization: Bearer mk_user-123" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{ "role": "user", "parts": [{ "text": "My name is Alice" }] }],
    "generationConfig": { "maxOutputTokens": 1024 }
  }'

Streaming uses :streamGenerateContent. Native Google parameters apply under contents, systemInstruction, generationConfig, and safetySettings.


Memory control

Control memory behavior per request with key suffixes, headers, query parameters, or body fields.

Key suffixes (easiest)

Append a mode to the key. No headers or code changes. Ideal for platforms where you can only set an API key (Open WebUI, LibreChat, coding tools).

Key formatRetrieveStoreUse case
mk_xxxYesYesNormal (default)
mk_xxx:readYesNoUse memory without storing new turns
mk_xxx:writeNoYesStore without retrieving (bulk import)
mk_xxx:offNoNoStateless, no memory

The suffix is stripped before authentication, so mk_abc:read authenticates as mk_abc.

Especially useful for coding tools. In Claude Code, Cursor, or Windsurf, use mk_xxx:read so the AI recalls your past decisions without flooding the vault with generated code:

export ANTHROPIC_BASE_URL=https://api.memoryrouter.ai/v1
export ANTHROPIC_API_KEY=mk_xxx:read

Headers

HeaderValuesDefaultDescription
X-Memory-Modeon, off, read, writeonMemory operation mode
X-Memory-Storetrue, falsetrueStore user input
X-Memory-Store-Responsetrue, falsetrueStore assistant response
X-Session-IDstringGroup requests into a session vault
X-Memory-Keymk_xxxPass-through auth (use with a provider key in Authorization)

Query parameters

ParameterExampleEffect
?memory=off/v1/chat/completions?memory=offDisable memory entirely
?mode=read/v1/chat/completions?mode=readRead-only, do not store this exchange
?store=false/v1/chat/completions?store=falseDo not store user input

Body fields

These are stripped before the request is forwarded to the provider:

{
  "model": "openai/gpt-5.5",
  "messages": [],
  "memory": false,
  "memory_mode": "read",
  "memory_store": false,
  "memory_store_response": false,
  "session_id": "user-123-chat-456"
}

Per-message control

Exclude a single message from storage with "memory": false:

{
  "messages": [
    { "role": "user", "content": "Remember this", "memory": true },
    { "role": "user", "content": "Do not store this", "memory": false }
  ]
}

Sessions

Sessions group related conversations. When you pass X-Session-ID (or session_id), memory is scoped to that session in addition to the user's core vault.

  • Each session gets its own memory space.
  • Core memory (no session) holds persistent, cross-session context.
  • Session memory is recalled alongside core memory.
  • Clear one session without touching core: DELETE /v1/memory with X-Session-ID.

For most user-product integrations the Memory Key is the durable user identity and sessions are optional.


Models

GET /v1/models

List models available from your configured providers.

curl https://api.memoryrouter.ai/v1/models \
  -H "Authorization: Bearer mk_user-123"

Returns providers and models, the default model, and a catalog timestamp. Use the full provider-prefixed name (e.g., openai/gpt-5.5, anthropic/claude-opus-4.8, google/gemini-3.5-flash).


Usage

GET /v1/account/usage

Token usage for the authenticated key over a date range (defaults to the last 30 days).

curl "https://api.memoryrouter.ai/v1/account/usage?start=2026-01-01&end=2026-02-01" \
  -H "Authorization: Bearer mk_user-123"

Returns the key, the period, and aggregated request and token counts. To break usage down per end user, query each user's key, since one key maps to one user vault.


Pass-through endpoints

These forward to the provider without memory processing:

EndpointProviderDescription
POST /v1/audio/transcriptionsOpenAIWhisper transcription
POST /v1/audio/speechOpenAIText to speech
POST /v1/images/generationsOpenAIImage generation
POST /v1/embeddingsOpenAIText embeddings
curl -X POST https://api.memoryrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer mk_user-123" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Provider keys (BYOK)

Pass a provider key directly instead of storing one in the dashboard. Identify the user vault with X-Memory-Key:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "X-Memory-Key: mk_user-123" \
  -H "Authorization: Bearer sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{ "model": "openai/gpt-5.5", "messages": [] }'

Or keep the Memory Key in Authorization and pass the provider key in X-Provider-Key:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer mk_user-123" \
  -H "X-Provider-Key: sk-your-openai-key" \
  -d '...'

Response headers

MemoryRouter adds timing headers to chat responses:

HeaderDescription
X-MR-Processing-MsMemoryRouter processing time
X-Provider-Response-MsTime waiting for the provider
X-Total-MsEnd-to-end request time
X-Memory-Tokens-RetrievedTokens of memory retrieved
X-Memory-Tokens-InjectedTokens of memory injected into the prompt
X-Session-IDEcho of the session id, if provided

Health

GET /health

No authentication required.

curl https://api.memoryrouter.ai/health

Semantic-temporal memory

MemoryRouter indexes memory by meaning and by time. Recent context is weighted higher, while important facts persist long term. Retrieval balances immediate context, recent history, and long-term memory automatically. No configuration required.


Error codes

CodeMeaningCommon causes
400Bad requestMissing required fields, invalid JSON
401UnauthorizedInvalid Memory Key, missing provider key
402Payment requiredNo card on file (upload) or balance exhausted
413Payload too largeUpload exceeds 10,000 lines
429Rate limitedToo many requests
500Internal errorServer-side issue
502Provider errorUpstream provider failed

Error response:

{
  "error": "No API key configured for provider: anthropic",
  "hint": "Add your anthropic API key in your account settings, or pass X-Provider-Key header"
}

On this page