MemoryRouterMemoryRouter

API Reference

Complete endpoint documentation for the MemoryRouter API with native support for OpenAI, Anthropic, and Google.

Base URL: https://api.memoryrouter.ai

MemoryRouter provides native endpoints for each major AI provider. Use your provider's SDK with zero code changes—just swap the base URL.


Provider Endpoints

ProviderEndpointSDK Compatibility
OpenAI, xAI, DeepSeek, Mistral, Cerebras, OpenRouterPOST /v1/chat/completionsOpenAI SDK
AnthropicPOST /v1/messagesAnthropic SDK
Google GeminiPOST /v1/models/:model:generateContentGoogle AI SDK

OpenAI-Compatible Endpoint

POST /v1/chat/completions

Works with OpenAI SDK and any OpenAI-compatible provider (xAI, DeepSeek, Mistral, Cerebras, OpenRouter).

curl:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.1",
    "messages": [{"role": "user", "content": "My name is Alice"}]
  }'

Python (OpenAI SDK):

from openai import OpenAI

client = OpenAI(
    base_url="https://api.memoryrouter.ai/v1",
    api_key="mk_xxxxxxxxxxxxxxxx"
)

response = client.chat.completions.create(
    model="openai/gpt-5.1",
    messages=[{"role": "user", "content": "My name is Alice"}]
)
print(response.choices[0].message.content)

TypeScript:

import OpenAI from 'openai';

const client = new OpenAI({
  baseURL: 'https://api.memoryrouter.ai/v1',
  apiKey: 'mk_xxxxxxxxxxxxxxxx'
});

const response = await client.chat.completions.create({
  model: 'openai/gpt-5.1',
  messages: [{ role: 'user', content: 'My name is Alice' }]
});

Provider-Native Parameters:

ParameterTypeRequiredDescription
modelstringYesModel identifier (e.g., openai/gpt-5.1, anthropic/claude-sonnet-5)
messagesarrayYesArray of message objects
streambooleanNoEnable streaming
temperaturenumberNoSampling temperature (0-2)
max_tokensnumberNoMaximum tokens to generate
top_pnumberNoNucleus sampling
frequency_penaltynumberNoFrequency penalty (-2 to 2)
presence_penaltynumberNoPresence penalty (-2 to 2)
stopstring/arrayNoStop sequences

Response:

{
  "id": "chatcmpl-abc123",
  "choices": [{
    "message": {"role": "assistant", "content": "Nice to meet you, Alice!"},
    "finish_reason": "stop"
  }],
  "usage": {"prompt_tokens": 12, "completion_tokens": 15, "total_tokens": 27}
}

Anthropic Native Endpoint

POST /v1/messages

Native Anthropic format. Use the Anthropic SDK directly—MemoryRouter accepts Anthropic's request format and returns Anthropic's response format unchanged.

curl:

curl -X POST https://api.memoryrouter.ai/v1/messages \
  -H "x-api-key: mk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-sonnet-5",
    "max_tokens": 1024,
    "messages": [{"role": "user", "content": "My name is Alice"}]
  }'

Python (Anthropic SDK):

from anthropic import Anthropic

client = Anthropic(
    base_url="https://api.memoryrouter.ai",
    api_key="mk_xxxxxxxxxxxxxxxx"
)

response = client.messages.create(
    model="claude-sonnet-5",
    max_tokens=1024,
    messages=[{"role": "user", "content": "My name is Alice"}]
)
print(response.content[0].text)

TypeScript:

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic({
  baseURL: 'https://api.memoryrouter.ai',
  apiKey: 'mk_xxxxxxxxxxxxxxxx'
});

const response = await client.messages.create({
  model: 'claude-sonnet-5',
  max_tokens: 1024,
  messages: [{ role: 'user', content: 'My name is Alice' }]
});

Provider-Native Parameters:

ParameterTypeRequiredDescription
modelstringYesAnthropic model (e.g., claude-sonnet-5)
messagesarrayYesArray of message objects
max_tokensnumberYesMaximum tokens to generate
systemstringNoSystem prompt
temperaturenumberNoSampling temperature (0-1)
top_pnumberNoNucleus sampling
top_knumberNoTop-k sampling
stop_sequencesarrayNoStop sequences
streambooleanNoEnable streaming

Supported Models:

  • claude-opus-4.5
  • claude-sonnet-5
  • claude-haiku-4.5

Response (Native Anthropic format):

{
  "id": "msg_abc123",
  "type": "message",
  "role": "assistant",
  "content": [{"type": "text", "text": "Nice to meet you, Alice!"}],
  "model": "claude-sonnet-5",
  "stop_reason": "end_turn",
  "usage": {"input_tokens": 12, "output_tokens": 15}
}

Google Gemini Native Endpoint

POST /v1/models/:model:generateContent

Native Google format. Use Google's AI SDK directly.

curl:

curl -X POST "https://api.memoryrouter.ai/v1/models/gemini-2.5-pro:generateContent" \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"role": "user", "parts": [{"text": "My name is Alice"}]}],
    "generationConfig": {"maxOutputTokens": 1024}
  }'

Python (Google AI SDK):

import google.generativeai as genai

# Configure with MemoryRouter endpoint
genai.configure(
    api_key="mk_xxxxxxxxxxxxxxxx",
    transport="rest",
    client_options={"api_endpoint": "api.memoryrouter.ai"}
)

model = genai.GenerativeModel("gemini-2.5-pro")
response = model.generate_content("My name is Alice")
print(response.text)

Streaming:

curl -X POST "https://api.memoryrouter.ai/v1/models/gemini-2.5-pro:streamGenerateContent" \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{"role": "user", "parts": [{"text": "Tell me a story"}]}]
  }'

Provider-Native Parameters:

ParameterTypeRequiredDescription
contentsarrayYesArray of content objects
systemInstructionobjectNoSystem instruction
generationConfig.temperaturenumberNoSampling temperature
generationConfig.topPnumberNoNucleus sampling
generationConfig.topKnumberNoTop-k sampling
generationConfig.maxOutputTokensnumberNoMax tokens
generationConfig.stopSequencesarrayNoStop sequences
safetySettingsarrayNoSafety settings

Response (Native Google format):

{
  "candidates": [{
    "content": {
      "parts": [{"text": "Nice to meet you, Alice!"}],
      "role": "model"
    },
    "finishReason": "STOP"
  }],
  "usageMetadata": {"promptTokenCount": 12, "candidatesTokenCount": 15}
}

Memory Control

Control memory behavior per-request using headers, query parameters, or body parameters. All are stripped before forwarding to the provider.

Headers

HeaderValuesDefaultDescription
X-Memory-Modeon, off, read, writeonMemory operation mode
X-Memory-Storetrue, falsetrueStore user input
X-Memory-Store-Responsetrue, falsetrueStore assistant response
X-Session-IDstringGroup conversations into sessions
X-Memory-Keymk_xxxAlternative auth (use with Authorization: Bearer <provider-key>)

Query Parameters (shortcuts)

ParameterExampleEffect
?memory=off/v1/chat/completions?memory=offDisable memory entirely
?mode=read/v1/chat/completions?mode=readRead-only (don't store this exchange)
?store=false/v1/chat/completions?store=falseDon't store user input

Body Parameters

Include in request body—they're stripped before forwarding:

{
  "model": "openai/gpt-5.1",
  "messages": [...],
  "memory": false,
  "memory_mode": "read",
  "memory_store": false,
  "memory_store_response": false,
  "session_id": "user-123-chat-456"
}
ParameterTypeDescription
memorybooleanfalse disables memory
memory_modestringon, off, read, write
memory_storebooleanStore user input
memory_store_responsebooleanStore assistant response
session_idstringSession identifier

Memory Modes Explained

ModeRetrieveStoreUse Case
onNormal operation (default)
readUse memory but don't add to it (testing, one-off queries)
writeStore without retrieval (bulk import, backfill)
offStateless request (no memory at all)

Per-Message Memory Control

Exclude specific messages from storage:

{
  "messages": [
    {"role": "user", "content": "Remember this", "memory": true},
    {"role": "user", "content": "Don't remember this", "memory": false}
  ]
}

Session Management

Sessions group related conversations. Memory is scoped to sessions when X-Session-ID is provided.

With header:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer mk_xxx" \
  -H "X-Session-ID: user-123-project-456" \
  -d '{"model": "openai/gpt-4o", "messages": [...]}'

With body parameter:

{
  "model": "openai/gpt-5.1",
  "messages": [...],
  "session_id": "user-123-project-456"
}

How it works:

  • Each session gets its own memory space
  • Core memory (no session) stores persistent, cross-session context
  • Session memory is recalled alongside core memory
  • Clear a session without affecting core: DELETE /v1/memory with X-Session-ID

Response Headers

MemoryRouter adds timing headers to every response:

HeaderDescription
X-MR-Processing-MsTotal MemoryRouter processing time
X-Provider-Response-MsTime waiting for AI provider
X-Total-MsEnd-to-end request time
X-Session-IDEcho of session ID (if provided)

Memory Management

GET /v1/memory/stats

Get memory statistics for your key.

curl https://api.memoryrouter.ai/v1/memory/stats \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

Response:

{
  "key": "mk_xxxxxxxxxxxxxxxx",
  "memories": 1247,
  "total_tokens": 89432,
  "oldest": "2024-01-15T10:30:00Z",
  "newest": "2024-02-04T15:02:00Z"
}

DELETE /v1/memory

Clear all memory for your key.

# Clear all memory
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

# Clear only a specific session
curl -X DELETE https://api.memoryrouter.ai/v1/memory \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "X-Session-ID: session-123"

# Full reset (complete memory wipe)
curl -X DELETE "https://api.memoryrouter.ai/v1/memory?reset=true" \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

POST /v1/memory/warmup

Pre-load memory for faster first request. Useful after cold starts.

curl -X POST https://api.memoryrouter.ai/v1/memory/warmup \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

With session:

curl -X POST https://api.memoryrouter.ai/v1/memory/warmup \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "X-Session-ID: session-123"

Response:

{
  "status": "warm",
  "key": "mk_xxxxxxxxxxxxxxxx",
  "memories_loaded": 1336,
  "warmup_ms": 45
}

POST /v1/memory/upload

Bulk import memories from JSONL format.

Requirements: Payment method on file

curl -X POST https://api.memoryrouter.ai/v1/memory/upload \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "Content-Type: text/plain" \
  --data-binary @memories.jsonl

JSONL Format (one JSON object per line):

{"content": "User prefers dark mode", "role": "user", "timestamp": 1706900000000}
{"content": "The meeting is scheduled for Friday at 3pm"}
{"content": "Customer is interested in enterprise plan", "role": "assistant"}
FieldTypeRequiredDefaultDescription
contentstringYesThe memory text to store
rolestringNouseruser, assistant, or system
timestampnumberNonowUnix timestamp in milliseconds

Response:

{
  "status": "complete",
  "memoryKey": "mk_xxxxxxxxxxxxxxxx",
  "vault": "core",
  "stats": {
    "total": 150,
    "processed": 150,
    "failed": 0
  },
  "message": "Successfully stored 150 memories"
}

Limits:

  • Maximum 10,000 lines per upload
  • Split larger files into batches

Pass-Through Endpoints

These forward directly to providers without memory processing:

EndpointProviderDescription
POST /v1/audio/transcriptionsOpenAIWhisper transcription
POST /v1/audio/speechOpenAIText-to-speech
POST /v1/images/generationsOpenAIDALL-E image generation
POST /v1/embeddingsOpenAIText embeddings
curl -X POST https://api.memoryrouter.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -F "file=@audio.mp3" \
  -F "model=whisper-1"

Note: The /v1/embeddings endpoint passes through to OpenAI. MemoryRouter handles memory processing internally.


GET /v1/models

List available models based on your configured provider API keys. Returns 90+ models across 4 providers.

curl https://api.memoryrouter.ai/v1/models \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

Response:

{
  "providers": [
    {
      "provider": "OpenAI",
      "models": ["openai/gpt-5.2", "openai/gpt-5.1", "openai/gpt-4.5", "openai/o3", "openai/o1-pro", "openai/o1", "..."]
    },
    {
      "provider": "Anthropic",
      "models": ["anthropic/claude-opus-4.5", "anthropic/claude-sonnet-5", "anthropic/claude-haiku-4.5", "..."]
    },
    {
      "provider": "Google",
      "models": ["google/gemini-3-pro", "google/gemini-3-flash", "google/gemini-2.5-pro", "google/gemini-2.5-flash", "..."]
    },
    {
      "provider": "Meta",
      "models": ["meta/llama-4-maverick", "meta/llama-4-scout", "meta/llama-3.3-70b", "..."]
    },
    {
      "provider": "Mistral",
      "models": ["mistral/mistral-large-3", "mistral/ministral-3-14b", "mistral/mistral-small-3.2", "..."]
    },
    {
      "provider": "xAI",
      "models": ["x-ai/grok-4", "x-ai/grok-4-fast", "x-ai/grok-3", "x-ai/grok-3-mini", "..."]
    }
  ],
  "models": ["openai/gpt-5.1", "anthropic/claude-sonnet-5", "google/gemini-3-flash", "meta/llama-4-maverick", "..."],
  "default": "openai/gpt-5.1",
  "catalog_updated": "2026-02-04T19:28:00Z"
}

Supported Providers & Popular Models:

ProviderPopular Models
OpenAIgpt-5.2, gpt-5.1, gpt-4.5, o3, o1-pro, o1
Anthropicclaude-opus-4.5, claude-sonnet-5, claude-haiku-4.5
Googlegemini-3-pro, gemini-3-flash, gemini-2.5-pro, gemini-2.5-flash
Metallama-4-maverick, llama-4-scout, llama-3.3-70b
Mistralmistral-large-3, ministral-3-14b, mistral-small-3.2
xAIgrok-4, grok-4-fast, grok-3, grok-3-mini

Use the full model name with provider prefix (e.g., openai/gpt-5.1, anthropic/claude-sonnet-5).


Account Usage

GET /v1/account/usage

Get token usage for your memory key.

curl "https://api.memoryrouter.ai/v1/account/usage?start=2024-01-01&end=2024-02-01" \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx"

Response:

{
  "key": "mk_xxxxxxxxxxxxxxxx",
  "period": {"start": "2024-01-01", "end": "2024-02-01"},
  "total_requests": 1547,
  "total_input_tokens": 234567,
  "total_output_tokens": 189234,
  "total_memory_tokens": 89432,
  "by_model": {
    "openai/gpt-5.1": {"requests": 1200, "input_tokens": 180000},
    "anthropic/claude-sonnet-5": {"requests": 347, "input_tokens": 54567}
  }
}

Semantic-Temporal Memory

MemoryRouter uses a semantic-temporal architecture — memories are indexed by meaning and time. Recent context is weighted higher, but important facts persist.

The system automatically balances:

  • Immediate context — Current conversation
  • Recent history — Last few days of interactions
  • Long-term memory — Persistent facts and preferences

No configuration required. The right memories surface at the right time.


Health Check

GET /health

No authentication required.

curl https://api.memoryrouter.ai/health

Response:

{
  "status": "healthy",
  "timestamp": "2024-02-03T14:22:00Z"
}

Error Codes

CodeMeaningCommon Causes
400Bad RequestMissing required fields, invalid JSON
401UnauthorizedInvalid memory key, missing provider API key
402Payment RequiredNo card on file (for upload endpoint)
413Payload Too LargeUpload exceeds 10,000 lines
429Rate LimitedToo many requests
500Internal ErrorServer-side issue
502Provider ErrorUpstream AI provider failed

Error Response Format:

{
  "error": "No API key configured for provider: anthropic",
  "hint": "Add your anthropic API key in your account settings, or pass X-Provider-Key header"
}

Pass-Through Authentication

For advanced integrations, you can pass your provider API key directly instead of storing it:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "X-Memory-Key: mk_xxxxxxxxxxxxxxxx" \
  -H "Authorization: Bearer sk-your-openai-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-5.1", "messages": [...]}'

Or use X-Provider-Key:

curl -X POST https://api.memoryrouter.ai/v1/chat/completions \
  -H "Authorization: Bearer mk_xxxxxxxxxxxxxxxx" \
  -H "X-Provider-Key: sk-your-openai-key" \
  -d '...'

This enables zero-configuration integrations where your existing code already has provider keys.

On this page