Self-Correction
BioCortex implements a four-level self-correction hierarchy so that failures are handled by reflection and targeted fixes instead of blind retries. This reduces error propagation and improves success rates on complex pipelines.Levels Overview
| Level | When | What happens |
|---|---|---|
| 1 | Code execution fails | Reflection-guided code repair (diagnose → fix → re-run). |
| 2 | Output is low-quality | Critic-driven output refinement. |
| 3 | Multiple DAG branches fail | Plan revision (keep success, replace failed parts). |
| 4 | After synthesis | Report self-improvement until quality threshold. |
Level 1: Reflection-Guided Code Repair
When code fails (exception or Critic failure):-
Reflect — The LLM is asked to produce a structured Reflection:
- Error summary
- Root cause analysis
- Concrete fix strategy
- Confidence
- Reusable lesson
- Fix — A new code version is generated based on this diagnosis, not a generic retry.
- Progressive error chain — Previously tried fixes and their outcomes are passed into the next attempt so the LLM does not repeat the same failed approach.
- Retry limit — Standard retries (e.g. up to 3). If all fail, deep retry is triggered.
Deep retry
- Uses the reasoning model (not just coder) for deeper analysis.
- Can use successful code from peer nodes (same DAG level) as reference.
- Generates a different approach (e.g. different library or algorithm).
- Typically up to 2 deep retries before marking the node failed.
Level 2: Output Quality Refinement
When code runs but the Critic marks the result as low-quality (empty, suspicious values, incomplete):- The Critic’s structured feedback (issues, retry_guidance) is passed to the Executor.
- The LLM generates targeted improvements to the code or parameters.
- Re-execution and re-validation follow.
Level 3: Plan Revision
When multiple nodes (e.g. whole branches) fail:- The SelfRefineEngine can suggest a revised execution plan.
- Successful steps are preserved.
- Failed components are replaced with alternative approaches (e.g. different tools or steps).
- The revised plan is then executed like a new DAG.
Level 4: Report Self-Refinement
After the Synthesizer produces the first report:- Self-review — The system (or LLM) scores the report on criteria: completeness, accuracy, clarity, actionability, scientific rigor.
- Targeted improvement — If below threshold (e.g. 0.8), generate an improved version.
- Quality scoring — Re-score; repeat until threshold is met or no further gain.
- Degradation detection — If a new version scores lower, revert to the previous one.
Integration with the Pipeline
- Executor triggers Level 1 (and deep retry) when execution or validation fails.
- Critic triggers Level 2 when validation fails with actionable feedback.
- Orchestrator / SelfRefineEngine triggers Level 3 when multiple nodes fail.
- Synthesizer (or a post-step) triggers Level 4 after the first draft report.
Configuration
- Max standard retries, max deep retries.
- Quality threshold and max iterations for report refinement.
- Whether to use reasoning model for deep retry.
biocortex.core.self_refine (or equivalent).
Next Steps
- Multi-Agent Pipeline — Where Critic and Executor plug in.
- Context Window and Budget — Token limits during reflection and refinement.