Multi-Agent Pipeline

The DAG-parallel execution strategy uses four specialized agents in a fixed order: Planner → Executor → Critic → Synthesizer. This page describes each agent and how they collaborate.

Pipeline Overview

Task (natural language)


┌───────────────────┐
│     Planner       │  → TaskDAG (nodes + dependencies)
└───────────────────┘


┌───────────────────┐
│    Executor       │  → Code per node, run in sandbox, collect artifacts
└───────────────────┘


┌───────────────────┐
│     Critic        │  → Validate each result; retry or mark failed
└───────────────────┘


┌───────────────────┐
│   Synthesizer     │  → Final report with provenance
└───────────────────┘
Execution is level-by-level: all nodes in the same topological level run in parallel. Failed nodes can be retried (reflection-guided, deep retry, or LATM); downstream dependents of a failed node are skipped.

Planner Agent

Input: Natural language task, optional episodic context (past similar analyses), tool hints from retrieval. Output: A TaskDAG — a directed acyclic graph where each node has:
  • name, description
  • task_type (e.g. analysis, retrieval, reasoning, validation, synthesis, visualization, preprocessing, annotation)
  • domain (biological domain)
  • tools_needed, depends_on, language, priority
Process:
  1. The reasoning LLM is invoked with a structured prompt that specifies the JSON output format.
  2. Nodes are created with temporary dependency names.
  3. Dependencies are resolved to node IDs; edges are added.
  4. The graph is validated with e.g. networkx.is_directed_acyclic_graph().
Episodic context (from build_episodic_context()) is injected into the Planner prompt so it can reuse successful pipeline patterns and avoid past mistakes.

Executor Agent

Input: A single DAG node (subtask), full task description, tool registry (or retrieved tools). Output: Executable code, execution result, and collected artifacts. Process:
  1. Tool selection — Hybrid retriever (or LLM) selects tools relevant to this node.
  2. Code generation — The coder LLM generates code that calls the selected tools and uses outputs from dependency nodes.
  3. Execution — Code runs in the SandboxExecutor (subprocess or Docker), with timeout and resource limits.
  4. Artifact collection — Output files (plots, tables, etc.) are detected by extension and copied to the run directory.
When no suitable tool exists, the ToolCreatorAgent (LATM) can design, validate, and register a new tool for this and future runs.

Critic Agent

Input: Execution result (stdout, stderr, artifacts, code). Output: A ValidationResult with:
  • passed (boolean)
  • quality_score (0–1)
  • issues (list of specific problems)
  • retry_guidance (for the Executor or self-refinement engine)
Validation criteria:
  1. Scientific correctness
  2. Completeness
  3. Consistency with upstream results
  4. Statistical validity (e.g. value ranges)
  5. Error detection (exceptions, empty outputs)
The Critic also performs a holistic final validation of all results against the original task after all nodes complete.

Synthesizer Agent

Input: TaskDAG with results per node, original task description. Output: A structured scientific report (Markdown) with:
  • Title and executive summary
  • Methodology
  • Key findings
  • Generated data/artifacts (with paths/links)
  • Limitations and recommended next steps
Each finding is linked to its originating DAG node (provenance). The Synthesizer uses a per-step character budget so long pipelines fit within the coder model’s context window: it dynamically computes a budget from the number of steps and truncates each step’s result text while preserving head and tail.

Self-Correction Integration

When the Critic marks a result as failed or low-quality:
  1. Level 1 — Reflection-guided repair: LLM reflects on the failure (root cause, fix strategy), then generates a targeted fix. A progressive error chain prevents repeating the same failed approach. Up to 3 standard retries.
  2. Level 2 — Deep retry: Reasoning model does a deeper analysis; may use successful peer code as reference; generates an alternative approach. Up to 2 deep retries.
  3. Level 3 — Plan revision: If multiple branches fail, the SelfRefineEngine can suggest a revised plan that keeps successful steps and replaces failed ones.
  4. Level 4 — Report refinement: After synthesis, the report is iteratively self-reviewed and improved until a quality threshold is met.
See Self-Correction for details.

Token and Context Guards

  • BaseAgent (used by all agents) truncates system prompt, extra context, and user message to stay within the input budget for the active LLM role (see Context Window and Budget).
  • Synthesizer uses a per-step character budget so the combined context of all step results does not exceed the coder model’s context window.

Next Steps