How to Cut Your OpenClaw API Costs by 90% (6 Config Changes)

March 2, 2026

Running OpenClaw is great until you check your API bill.

The default configuration is designed for maximum capability — which means it uses the most expensive model for everything, loads your entire conversation history on every message, and sends heartbeat checks through your paid API. That's fine for testing. It's terrible for your wallet.

The good news? You can fix this in about 15 minutes with a few config changes and some system prompt rules. No complex tooling, no scripts, no infrastructure changes. Just smarter defaults.

Here's what we'll cover — and what each optimization actually saves you:

Optimization	Monthly Savings
Lean session initialization	$80-100
Smart model routing	$40-60
Local heartbeat checks	$5-15
Rate limits and budget caps	Prevents runaway costs
Workspace organization	Indirect (fewer wasted tokens)
Prompt caching	$50-70

Combined, these take a typical setup from $300-500/month down to $30-50/month.

Let's get into it.

1. Stop Loading Your Entire History on Every Message

This is the single biggest cost driver most people don't realize they have.

By default, OpenClaw loads around 50KB of context on every single message — past conversations, tool outputs, memory files, everything. That's 2-3 million tokens burned per session before you've even said anything useful. At Sonnet pricing, that's roughly $4/day just on context loading.

The fix: Add a session initialization rule to your agent's system prompt that tells it exactly what to load and what to skip.

SESSION INITIALIZATION RULE:

On every session start:
1. Load ONLY these files:
   - SOUL.md
   - USER.md
   - IDENTITY.md
   - memory/YYYY-MM-DD.md (today's notes, if they exist)

2. DO NOT auto-load:
   - MEMORY.md
   - Session history
   - Prior messages
   - Previous tool outputs

3. When user asks about prior context:
   - Use memory_search() on demand
   - Pull only the relevant snippet
   - Don't load the whole file

4. Update memory/YYYY-MM-DD.md at end of session with:
   - What you worked on
   - Decisions made
   - Next steps

This saves 80% on context overhead.

What changes: Your session starts with ~8KB of context instead of 50KB. History loads only when the agent actually needs it — on-demand, not by default.

Impact: Session cost drops from ~$0.40 to ~$0.05.

2. Use the Right Model for the Right Job

If you're running Sonnet for everything, you're paying 12x more than you need to for most tasks.

Think about what your agent actually does in a typical day: checking file status, running commands, formatting text, answering quick questions. Haiku handles all of this perfectly — and it costs $0.00025 per 1K tokens compared to Sonnet's $0.003 per 1K tokens.

Reserve Sonnet for the tasks that genuinely need it: architecture decisions, complex debugging, security analysis, production code review.

Step 1: Set Haiku as default in your config.

Your OpenClaw config lives at ~/.openclaw/openclaw.json. Update it:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-haiku-4-5"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet"
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku"
        }
      }
    }
  }
}

Step 2: Add a routing rule to your system prompt.

MODEL SELECTION RULE:

Default: Always use Haiku
Switch to Sonnet ONLY when:
- Architecture decisions
- Production code review
- Security analysis
- Complex debugging or multi-step reasoning
- Strategic multi-project decisions

When in doubt: Try Haiku first.

Impact: Model costs drop from $50-70/month to $5-10/month.

3. Move Heartbeat Checks Off Your Paid API

OpenClaw sends periodic heartbeat checks to verify your agent is alive and responsive. By default, these go through your paid API — which means you're burning tokens just to check if your agent is running.

If you're running heartbeats every minute, that's 1,440 paid API calls per day doing nothing useful.

The fix is simple: route heartbeats to a free local model using Ollama.

Step 1: Install Ollama and pull a lightweight model.

# macOS / Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a small, fast model for heartbeats
ollama pull llama3.2:3b

Step 2: Update your OpenClaw config to route heartbeats locally.

Add a heartbeat section to your ~/.openclaw/openclaw.json:

{
  "heartbeat": {
    "every": "1h",
    "model": "ollama/llama3.2:3b",
    "session": "main",
    "target": "slack",
    "prompt": "Check: Any blockers, opportunities, or progress updates needed?"
  }
}

Step 3: Make sure Ollama is running.

ollama serve
# In another terminal, verify:
ollama run llama3.2:3b "respond with OK"

Impact: Heartbeat costs drop from $5-15/month to $0/month. Zero API calls for monitoring.

4. Set Rate Limits and Budget Caps

Even with all the optimizations above, an automated workflow gone wrong can still burn through your budget in minutes. Rate limits are your safety net.

Add this to your system prompt:

RATE LIMITS:

- 5 seconds minimum between API calls
- 10 seconds between web searches
- Max 5 searches per batch, then 2-minute break
- Batch similar work (one request for 10 items, not 10 separate requests)
- If you hit a 429 error: STOP, wait 5 minutes, retry

DAILY BUDGET: $5 (warning at 75%)
MONTHLY BUDGET: $200 (warning at 75%)

This doesn't directly save money — it prevents catastrophic cost spikes. The kind where you wake up to a $200 bill because your agent got into a retry loop overnight.

5. Organize Your Workspace to Minimize Token Waste

Your workspace file structure directly affects how many tokens your agent consumes. If everything is dumped into root-level files, the agent loads more context than it needs.

A clean structure looks like this:

~/.openclaw/
├── SOUL.md              ← Core personality and principles (always loaded)
├── USER.md              ← Your info and preferences (always loaded)
├── IDENTITY.md          ← Agent identity (always loaded)
├── TOOLS.md             ← Available tools reference (loaded on demand)
├── memory/
│   ├── 2026-02-28.md    ← Yesterday's notes (loaded on demand)
│   └── 2026-03-01.md    ← Today's notes (always loaded)
└── projects/
    └── [PROJECT]/
        └── REFERENCE.md ← Project docs (loaded on demand, good cache candidate)

The principle: Separate stable content (personality, identity, preferences) from dynamic content (daily notes, project work). Stable content gets cached. Dynamic content doesn't.

This structure naturally reduces token usage because the agent only loads what's relevant to the current task, and cached content costs a fraction of fresh content.

6. Enable Prompt Caching for Repetitive Content

Every time your agent starts a session, it sends the same system prompt — your SOUL.md, USER.md, IDENTITY.md. That's the same 5-8KB of tokens, paid in full, every single time.

Anthropic's prompt caching gives you a 90% discount on cached content. If you're sending the same system prompt 100 times a day, you pay full price once and 10% for the other 99 times.

Enable caching in your config:

{
  "agents": {
    "defaults": {
      "cache": {
        "enabled": true,
        "ttl": "5m",
        "priority": "high"
      },
      "models": {
        "anthropic/claude-sonnet-4-5": {
          "alias": "sonnet",
          "cache": true
        },
        "anthropic/claude-haiku-4-5": {
          "alias": "haiku",
          "cache": false
        }
      }
    }
  }
}

Why cache Sonnet but not Haiku? Haiku is already so cheap that the caching overhead isn't worth the savings. Sonnet has larger prompts and higher per-token costs — caching makes a real difference there.

To maximize cache hits:

Batch your requests within 5-minute windows — make multiple API calls in quick succession so they share cached context
Don't edit SOUL.md or USER.md mid-session — any change invalidates the cache. Save your prompt updates for maintenance windows
Keep reference docs in separate files — stable project docs in their own cached files, daily notes uncached and separate

Real-world impact: If you're running 50 API calls per day with a 5KB system prompt, caching takes the system prompt cost from ~$0.75/week down to ~$0.02/week.

How to Verify It's Working

After making these changes, start a session and check your status:

openclaw shell
session_status

You should see:

Context size: 2-8KB (not 50KB+)
Model: Haiku (not Sonnet)
Heartbeat: Ollama/local (not API)

If your daily costs are in the $0.10-0.50 range, everything is working.

Troubleshooting:

Context still large → Make sure the session initialization rule is in your system prompt
Still using Sonnet for everything → Check your openclaw.json syntax and file path
Heartbeat errors → Verify Ollama is running (ollama serve)
Costs haven't dropped → Confirm your system prompt is actually being loaded

The Bottom Line

What	Before	After
Session initialization	$0.40/session	$0.05/session
Model routing	$50-70/month	$5-10/month
Heartbeat checks	$5-15/month	$0/month
Prompt caching	Full price every call	90% discount on repeated content
Combined monthly cost	$300-500	$30-50

Six config changes. ~15 minutes of work. 90%+ cost reduction.

The intelligence is in the prompt, not the infrastructure. You don't need complex scripts or monitoring dashboards. You need smarter defaults — and now you have them.

Running OpenClaw and want a dedicated server without the DevOps headache? ActivateClaw provisions a fully configured Hetzner VPS with OpenClaw ready to go in minutes — one flat monthly fee, SSL included, your server, your data.