Context Window and Token Budget

BioCortex is context-window aware: it knows each model’s limits and enforces token budgets on every LLM call so long conversations and large plans do not cause overflow or undefined behavior. Working memory size can be auto-calibrated from the active model.

Why This Matters

Models have different context sizes (e.g. Qwen3-max 262K, Claude 200K, smaller models 8K–32K).
Prompts include: system prompt, working memory, episodic context, task, and (in synthesis) per-step results. Without budgets, switching models or running long pipelines can exceed the context window.
Guards ensure that combined input is truncated to a safe budget before calling the LLM, and that the Synthesizer does not pack too much result text into one request.

Model Context Window Table

A table MODEL_CONTEXT_WINDOWS (in biocortex.config) maps model identifiers to:

max_input_tokens
max_output_tokens

Examples: qwen3-max, qwen-max, claude-sonnet-4-20250514, gpt-4o, etc. Resolution uses exact match, then prefix match, then substring; unknown models default to e.g. 128K input / 4K output.

Token Estimation

estimate_tokens(text) uses a hybrid heuristic:
- Non-CJK: ~1 token per 4 characters.
- CJK: ~2 tokens per character.
Slight overestimate for safety. Used for working memory compression, budget checks, and episodic truncation.

Auto-Calibration of Working Memory

memory.working_memory_max_tokens can be set to -1 (sentinel).
At config build time, _calibrate_memory_budget() sets it to 60% of the reasoning model’s max input, clamped between 16K and 600K.
So when you switch the reasoning model (e.g. to Qwen3-max), working memory size adapts automatically.

Per-Call Budget (BaseAgent)

Every agent (Planner, Executor, Critic, Synthesizer) inherits from BaseAgent, which:

_get_input_budget(role) — For the LLM role (reasoning/coder/fast), returns:
- max_input - output_reserve - 512
- output_reserve = min(max_output, config.max_tokens) * 0.15
_invoke_llm — Before calling the LLM:
- Estimates total tokens for: system prompt, extra context (e.g. working memory, episodic context), user message.
- If total exceeds the budget: truncates extra context and user message (optionally keeping the tail); only truncates the system prompt if still over.
- Priority: keep system prompt as intact as possible, then balance context and message.

So every LLM call is guarded against overflow.

Synthesizer Per-Step Budget

The Synthesizer receives all step results to produce the final report. For long DAGs, concatenating every step can exceed the coder model’s context.

_compute_per_step_char_budget(num_steps):
- Takes the coder model’s max input.
- Subtracts fixed reserves for system prompt and task description.
- Divides the remainder by num_steps.
- Converts to characters (×4) and clamps to e.g. [500, 8000] per step.
Each step’s result string is truncated to that length while preserving head and tail (so the beginning and end of each result remain visible). This keeps the total context within the model’s window.

Summary

Model table + token estimation + per-role input budget + truncation in _invoke_llm prevent context overflow on all agent calls.
Working memory can auto-calibrate to the reasoning model.
Synthesizer uses a dynamic per-step character budget so long pipelines still fit in the coder’s context.

Next Steps

Memory System — Working memory and compression.
Multi-Agent Pipeline — Where budgets are applied.
Configuration — Overriding model table and memory settings.

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

Context window and budget

Context Window and Token Budget

Why This Matters

Model Context Window Table

Token Estimation

Auto-Calibration of Working Memory

Per-Call Budget (BaseAgent)

Synthesizer Per-Step Budget

Summary

Next Steps

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

​Context Window and Token Budget

​Why This Matters

​Model Context Window Table

​Token Estimation

​Auto-Calibration of Working Memory

​Per-Call Budget (BaseAgent)

​Synthesizer Per-Step Budget

​Summary

​Next Steps

Context Window and Token Budget

Why This Matters

Model Context Window Table

Token Estimation

Auto-Calibration of Working Memory

Per-Call Budget (BaseAgent)

Synthesizer Per-Step Budget

Summary

Next Steps