Multi-Agent Pipeline
The DAG-parallel execution strategy uses four specialized agents in a fixed order: Planner → Executor → Critic → Synthesizer. This page describes each agent and how they collaborate.Pipeline Overview
Planner Agent
Input: Natural language task, optional episodic context (past similar analyses), tool hints from retrieval. Output: A TaskDAG — a directed acyclic graph where each node has:name,descriptiontask_type(e.g. analysis, retrieval, reasoning, validation, synthesis, visualization, preprocessing, annotation)domain(biological domain)tools_needed,depends_on,language,priority
- The reasoning LLM is invoked with a structured prompt that specifies the JSON output format.
- Nodes are created with temporary dependency names.
- Dependencies are resolved to node IDs; edges are added.
- The graph is validated with e.g.
networkx.is_directed_acyclic_graph().
build_episodic_context()) is injected into the Planner prompt so it can reuse successful pipeline patterns and avoid past mistakes.
Executor Agent
Input: A single DAG node (subtask), full task description, tool registry (or retrieved tools). Output: Executable code, execution result, and collected artifacts. Process:- Tool selection — Hybrid retriever (or LLM) selects tools relevant to this node.
- Code generation — The coder LLM generates code that calls the selected tools and uses outputs from dependency nodes.
- Execution — Code runs in the SandboxExecutor (subprocess or Docker), with timeout and resource limits.
- Artifact collection — Output files (plots, tables, etc.) are detected by extension and copied to the run directory.
Critic Agent
Input: Execution result (stdout, stderr, artifacts, code). Output: A ValidationResult with:passed(boolean)quality_score(0–1)issues(list of specific problems)retry_guidance(for the Executor or self-refinement engine)
- Scientific correctness
- Completeness
- Consistency with upstream results
- Statistical validity (e.g. value ranges)
- Error detection (exceptions, empty outputs)
Synthesizer Agent
Input: TaskDAG with results per node, original task description. Output: A structured scientific report (Markdown) with:- Title and executive summary
- Methodology
- Key findings
- Generated data/artifacts (with paths/links)
- Limitations and recommended next steps
Self-Correction Integration
When the Critic marks a result as failed or low-quality:- Level 1 — Reflection-guided repair: LLM reflects on the failure (root cause, fix strategy), then generates a targeted fix. A progressive error chain prevents repeating the same failed approach. Up to 3 standard retries.
- Level 2 — Deep retry: Reasoning model does a deeper analysis; may use successful peer code as reference; generates an alternative approach. Up to 2 deep retries.
- Level 3 — Plan revision: If multiple branches fail, the SelfRefineEngine can suggest a revised plan that keeps successful steps and replaces failed ones.
- Level 4 — Report refinement: After synthesis, the report is iteratively self-reviewed and improved until a quality threshold is met.
Token and Context Guards
- BaseAgent (used by all agents) truncates system prompt, extra context, and user message to stay within the input budget for the active LLM role (see Context Window and Budget).
- Synthesizer uses a per-step character budget so the combined context of all step results does not exceed the coder model’s context window.
Next Steps
- Self-Correction — Reflection, deep retry, and report refinement.
- Hybrid Retrieval — How the Executor gets its tool set.
- Context Window and Budget — Token budgets and truncation.