Multi-Agent Pipeline

The DAG-parallel execution strategy uses four specialized agents in a fixed order: Planner → Executor → Critic → Synthesizer. This page describes each agent and how they collaborate.

Pipeline Overview

Task (natural language)
        │
        ▼
┌───────────────────┐
│     Planner       │  → TaskDAG (nodes + dependencies)
└───────────────────┘
        │
        ▼
┌───────────────────┐
│    Executor       │  → Code per node, run in sandbox, collect artifacts
└───────────────────┘
        │
        ▼
┌───────────────────┐
│     Critic        │  → Validate each result; retry or mark failed
└───────────────────┘
        │
        ▼
┌───────────────────┐
│   Synthesizer     │  → Final report with provenance
└───────────────────┘

Execution is level-by-level: all nodes in the same topological level run in parallel. Failed nodes can be retried (reflection-guided, deep retry, or LATM); downstream dependents of a failed node are skipped.

Planner Agent

Input: Natural language task, optional episodic context (past similar analyses), tool hints from retrieval. Output: A TaskDAG — a directed acyclic graph where each node has:

name, description
task_type (e.g. analysis, retrieval, reasoning, validation, synthesis, visualization, preprocessing, annotation)
domain (biological domain)
tools_needed, depends_on, language, priority

Process:

The reasoning LLM is invoked with a structured prompt that specifies the JSON output format.
Nodes are created with temporary dependency names.
Dependencies are resolved to node IDs; edges are added.
The graph is validated with e.g. networkx.is_directed_acyclic_graph().

Episodic context (from build_episodic_context()) is injected into the Planner prompt so it can reuse successful pipeline patterns and avoid past mistakes.

Executor Agent

Input: A single DAG node (subtask), full task description, tool registry (or retrieved tools). Output: Executable code, execution result, and collected artifacts. Process:

Tool selection — Hybrid retriever (or LLM) selects tools relevant to this node.
Code generation — The coder LLM generates code that calls the selected tools and uses outputs from dependency nodes.
Execution — Code runs in the SandboxExecutor (subprocess or Docker), with timeout and resource limits.
Artifact collection — Output files (plots, tables, etc.) are detected by extension and copied to the run directory.

When no suitable tool exists, the ToolCreatorAgent (LATM) can design, validate, and register a new tool for this and future runs.

Critic Agent

Input: Execution result (stdout, stderr, artifacts, code). Output: A ValidationResult with:

passed (boolean)
quality_score (0–1)
issues (list of specific problems)
retry_guidance (for the Executor or self-refinement engine)

Validation criteria:

Scientific correctness
Completeness
Consistency with upstream results
Statistical validity (e.g. value ranges)
Error detection (exceptions, empty outputs)

The Critic also performs a holistic final validation of all results against the original task after all nodes complete.

Synthesizer Agent

Input: TaskDAG with results per node, original task description. Output: A structured scientific report (Markdown) with:

Title and executive summary
Methodology
Key findings
Generated data/artifacts (with paths/links)
Limitations and recommended next steps

Each finding is linked to its originating DAG node (provenance). The Synthesizer uses a per-step character budget so long pipelines fit within the coder model’s context window: it dynamically computes a budget from the number of steps and truncates each step’s result text while preserving head and tail.

Self-Correction Integration

When the Critic marks a result as failed or low-quality:

Level 1 — Reflection-guided repair: LLM reflects on the failure (root cause, fix strategy), then generates a targeted fix. A progressive error chain prevents repeating the same failed approach. Up to 3 standard retries.
Level 2 — Deep retry: Reasoning model does a deeper analysis; may use successful peer code as reference; generates an alternative approach. Up to 2 deep retries.
Level 3 — Plan revision: If multiple branches fail, the SelfRefineEngine can suggest a revised plan that keeps successful steps and replaces failed ones.
Level 4 — Report refinement: After synthesis, the report is iteratively self-reviewed and improved until a quality threshold is met.

See Self-Correction for details.

Token and Context Guards

BaseAgent (used by all agents) truncates system prompt, extra context, and user message to stay within the input budget for the active LLM role (see Context Window and Budget).
Synthesizer uses a per-step character budget so the combined context of all step results does not exceed the coder model’s context window.

Next Steps

Self-Correction — Reflection, deep retry, and report refinement.
Hybrid Retrieval — How the Executor gets its tool set.
Context Window and Budget — Token budgets and truncation.

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

Multi agent pipeline

Multi-Agent Pipeline

Pipeline Overview

Planner Agent

Executor Agent

Critic Agent

Synthesizer Agent

Self-Correction Integration

Token and Context Guards

Next Steps

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

​Multi-Agent Pipeline

​Pipeline Overview

​Planner Agent

​Executor Agent

​Critic Agent

​Synthesizer Agent

​Self-Correction Integration

​Token and Context Guards

​Next Steps

Multi-Agent Pipeline

Pipeline Overview

Planner Agent

Executor Agent

Critic Agent

Synthesizer Agent

Self-Correction Integration

Token and Context Guards

Next Steps