Hypothesis strategy — scientific reasoning engine
TL;DR — For open-ended scientific questions (“What causes X?”, “Propose a mechanism for X”), BioCortex selects the Hypothesis strategy. It runs an iterative generate → test → evaluate → refine loop and produces a report with an explicit evidence chain, not a one-shot LLM guess.
Why a Hypothesis strategy?
Many biomedical questions are open scientific hypothesis problems. They are neither “run this fixed pipeline” nor “search for the best analysis path”; they need:- Multiple candidate mechanisms / hypotheses
- Designed evidence collection
- Evaluation of supporting vs contradicting evidence
- Rejection of failed hypotheses and refinement of survivors
- A final conclusion with calibrated confidence
| Strategy | Best for | Core idea |
|---|---|---|
| SimpleReAct | Single-step Q&A, format conversion | ReAct loop, direct execution |
| DAG Parallel | Multi-step omics workflows | Decompose into a DAG and run in parallel |
| MCTS | Unknown optimal analysis path | Monte Carlo tree search over paths |
| Hypothesis | Open mechanistic science questions | Multi-round hypothesis generate → test → refine |
Design influences
The Hypothesis engine borrows ideas from three families of systems:From Biomni A1
From PantheonOS Evolution
rejected_history): falsified directions are recorded so later rounds avoid the same failed ideas and do not waste compute.
From CellType CLI EvidenceReasoner
End-to-end flow
Core data structures
Hypothesis — one hypothesis object
EvidenceItem — one piece of evidence
HypothesisStatus state machine
Confidence scoring
Formula
Rationale
- Start at 0.5 — No evidence ⇒ neutral prior.
- 1.5× penalty on opposition — Contradictions should weigh more than weak support (falsifiability).
- Normalize by total evidence — Prevents “stacking” many low-quality supporting snippets.
Example
Four test types
_design_test() picks a modality from hypothesis content and context:
| Type | When used | Implementation |
|---|---|---|
code | Data files (.h5ad, .csv, …) and testable stats/plots | Sandbox Python execution |
literature | Mechanism needs literature grounding | PubMed + Semantic Scholar |
knowledge_graph | Gene–pathway–disease relations | BioKnowledgeGraph neighborhood |
reasoning | No data/tools or quick sanity check | LLM reasoning over biology knowledge |
EvidenceItem; _evaluate_evidence() assigns polarity.
Refinement
From round 2 onward,_refine_hypotheses() runs on hypotheses still ACTIVE:
- Prior wording moves to
refinement_history - Status becomes
REFINED - Later tests target the tighter claim
Rejected history
_design_test() prompt includes:
Final report outline
When Hypothesis auto-triggers
Phase-1 heuristics in strategy routing look for patterns such as:| Pattern / cue | Example phrasing |
|---|---|
hypothesis | “Generate a hypothesis about X” |
propose mechanism | “Mechanism by which FOXP3 suppresses T cells” |
what causes | “What causes temozolomide resistance in glioblastoma?” |
mechanism of/behind/for | “Mechanism of resistance to KRAS G12C inhibitors” |
why does/do + disease/resistance | “Why do tumors evade immune checkpoint blockade?” |
novel biomarker | “Novel biomarkers for early pancreatic cancer” |
how does … resist/evade/escape | “How does GBM escape temozolomide?” |
propose … target/candidate/pathway | “Candidate therapeutic targets for ALS” |
hypothesis_score ≥ 2 and wins over other strategy scores, Hypothesis is chosen; otherwise Phase-2 LLM classification can confirm.
Tunable parameters
| Parameter | Default | Meaning |
|---|---|---|
max_rounds | 4 | Maximum iteration rounds |
max_hypotheses | 5 | Cap on initial hypotheses |
convergence_threshold | 0.80 | Early stop when net confidence reaches this |
min_rounds | 2 | Minimum rounds even if convergence looks early |
Force Hypothesis mode
Compared to MCTS
| Aspect | MCTS | Hypothesis |
|---|---|---|
| Search object | Analysis path (tools + order) | Scientific claim (which mechanism) |
| Feedback | Simulated path value from LLM | Polarity scores from real evidence |
| Output | Best path → report | Best-supported hypothesis + chain → report |
| State | Tree nodes (visits / reward) | Hypothesis objects (confidence / evidence / status) |
| Stop | Fixed budget | Converged confidence or max rounds |
| Fit | “How should I analyze this dataset?” | “Why does this biological phenomenon occur?” |
Code locations
| File | Role |
|---|---|
biocortex/core/hypothesis.py | HypothesisReasoner, dataclasses, main loop |
biocortex/core/orchestrator.py | _run_hypothesis(), context + packaging |
biocortex/core/strategy.py | _HYPOTHESIS_INDICATORS, heuristics + LLM routing |
biocortex/tools/search_tool.py | pubmed_search(), semantic_scholar_search() for literature tests |
Future improvements
- Data-aware generation — If
.h5adis attached, inject a data summary (cells, genes, clusters) into round-0 generation. - Deeper KG — Fully wire
knowledge_graphtests toBioKnowledgeGraph.context()and 2-hop neighborhoods. - Cross-hypothesis evidence — Share evidence across H2/H3 when relevant.
- Calibration — Tune
convergence_thresholdand the1.5factor using BixBench-style evals.
Related docs
- Strategy routing — How strategies are chosen
- Multi-agent pipeline — DAG-parallel workflows
- Hybrid retrieval — Tool + web/literature retrieval
- Knowledge graph — BioKnowledgeGraph structure and queries