toolkami

effective long context, at lower cost.

Research-backed context folding for LLM agents with an easy to use drop-in API.

Get started Talk to us

Measured savings

Toolkami reduces context bloat and improves recall across long-running agent workflows.

Context length

Up to 100x

Recall quality

~28-33%

Cost reduction

Up to 2x

How it works

Context is treated as part of environment that the LLM can symbolically interact with.

Context Query

Toolkami

LLM

Query

REPL

Context

Response

Why Toolkami?

Built for teams that need reliable long context with predictable token costs.

Research based folding

Context folding cuts token bloat while preserving the details that matter to the task.

Dev friendly API

Drop in endpoints for retrieval, folding, and summaries that keep agents focused.

Secure implementation

Locked down REPL with zero data retention.

Integration

Drop-in replacement for existing LLM calls with the same interface.

Examples

Drop-in

LLM OpenAI Codex Claude

from litellm import completion

endpoint_id = "your-endpoint-id"

response = completion(
    api_base=f"https://toolkami.com/endpoints/{endpoint_id}/openai/v1",
    model="openai/gpt-4o",
    messages=[{"content": "Hello, how are you?", "role": "user"}],
)

from openai import OpenAI

endpoint_id = "your-endpoint-id"

client = OpenAI(
    base_url=f"https://toolkami.com/endpoints/{endpoint_id}/openai/v1"
)

response = client.responses.create(
    model="gpt-5.2",
    instructions="You are a coding assistant that talks like a pirate.",
    input="How do I check if a Python object is an instance of a class?",
)

print(response.output_text)

export ENDPOINT_ID="your-endpoint-id"
export OPENAI_BASE_URL="https://toolkami.com/${ENDPOINT_ID}/openai/v1"

codex # start the client

# WIP: Claude integration is in progress.
# We're finalizing a drop-in adapter for Anthropic SDKs.

Get started Talk to us