Self-Correction

BioCortex implements a four-level self-correction hierarchy so that failures are handled by reflection and targeted fixes instead of blind retries. This reduces error propagation and improves success rates on complex pipelines.

Levels Overview

LevelWhenWhat happens
1Code execution failsReflection-guided code repair (diagnose → fix → re-run).
2Output is low-qualityCritic-driven output refinement.
3Multiple DAG branches failPlan revision (keep success, replace failed parts).
4After synthesisReport self-improvement until quality threshold.

Level 1: Reflection-Guided Code Repair

When code fails (exception or Critic failure):
  1. Reflect — The LLM is asked to produce a structured Reflection:
    • Error summary
    • Root cause analysis
    • Concrete fix strategy
    • Confidence
    • Reusable lesson
  2. Fix — A new code version is generated based on this diagnosis, not a generic retry.
  3. Progressive error chain — Previously tried fixes and their outcomes are passed into the next attempt so the LLM does not repeat the same failed approach.
  4. Retry limit — Standard retries (e.g. up to 3). If all fail, deep retry is triggered.

Deep retry

  • Uses the reasoning model (not just coder) for deeper analysis.
  • Can use successful code from peer nodes (same DAG level) as reference.
  • Generates a different approach (e.g. different library or algorithm).
  • Typically up to 2 deep retries before marking the node failed.

Level 2: Output Quality Refinement

When code runs but the Critic marks the result as low-quality (empty, suspicious values, incomplete):
  • The Critic’s structured feedback (issues, retry_guidance) is passed to the Executor.
  • The LLM generates targeted improvements to the code or parameters.
  • Re-execution and re-validation follow.

Level 3: Plan Revision

When multiple nodes (e.g. whole branches) fail:
  • The SelfRefineEngine can suggest a revised execution plan.
  • Successful steps are preserved.
  • Failed components are replaced with alternative approaches (e.g. different tools or steps).
  • The revised plan is then executed like a new DAG.

Level 4: Report Self-Refinement

After the Synthesizer produces the first report:
  1. Self-review — The system (or LLM) scores the report on criteria: completeness, accuracy, clarity, actionability, scientific rigor.
  2. Targeted improvement — If below threshold (e.g. 0.8), generate an improved version.
  3. Quality scoring — Re-score; repeat until threshold is met or no further gain.
  4. Degradation detection — If a new version scores lower, revert to the previous one.
This yields a final report that meets a consistent quality bar.

Integration with the Pipeline

  • Executor triggers Level 1 (and deep retry) when execution or validation fails.
  • Critic triggers Level 2 when validation fails with actionable feedback.
  • Orchestrator / SelfRefineEngine triggers Level 3 when multiple nodes fail.
  • Synthesizer (or a post-step) triggers Level 4 after the first draft report.

Configuration

  • Max standard retries, max deep retries.
  • Quality threshold and max iterations for report refinement.
  • Whether to use reasoning model for deep retry.
See Configuration and biocortex.core.self_refine (or equivalent).

Next Steps