Knowledge Graph

The BioKnowledgeGraph is a typed, directed graph that stores biological entities and their relationships. It supports retrieval expansion, memory integration, dependency ordering, multimodal entity linking, and KG-grounded verification (hallucination detection).

Role in BioCortex

Hybrid retrieval (Stage 2): 2-hop BFS from tool candidates discovers related tools via entity–tool and domain–tool links.
Memory: Semantic memory facts can be stored as graph nodes/edges; episodic content can reference entities.
Planner: Tool dependency ordering can use graph structure.
Multimodal fusion: Entity linking across modalities uses the same graph.
KGGroundingValidator: Each claim in the final report is checked against the graph → grounded / inferred / ungrounded.

Node and Edge Types

Node types (examples):
gene, protein, drug, disease, pathway, GO term, cell type, organism, tool, dataset, protocol, publication, domain. Edge types (examples):

Biological: interacts_with, regulates, inhibits, activates, encodes, expressed_in, associated_with, targets, treats
Tool: requires_tool, produces_for, compatible_with, same_domain
Data: uses_data, produces_data, cited_in
Ontology: subclass_of, has_function, involved_in

Nodes and edges can carry attributes: name, type, properties, source, confidence, timestamp.

Persistence and Ontology Integration

Persistence: Graph is serialized (e.g. JSON: node list + edge list + metadata) so it can be loaded/saved.
Gene Ontology: OBO parsing adds terms and is_a (subclass) relationships; obsolete terms can be filtered.
KEGG: Pathway data adds pathway–gene associations (e.g. involved_in).
Auto-learning: After analyses, biological entities are extracted from task and findings (regex NER: genes, GO terms, UniProt, KEGG, PDB, species) and added as nodes with co-occurrence and tool linkages.

KG-Grounded Verification (Hallucination Detection)

KGGroundingValidator runs after report generation:

Claim extraction — LLM (or heuristic) extracts discrete factual claims from the report.
Entity extraction — Regex NER finds biological entities in each claim.
KG path verification — For each pair of entities, compute shortest path in the graph. Classify:
- Grounded (✅): Direct path (1–2 hops).
- Inferred (⚠️): Longer path (3+ hops).
- Ungrounded (❌): No path.
- Trivial (ℹ️): No biological entities.
Confidence — Per-claim and overall grounding confidence; report is annotated with evidence chains and triple references.

This gives users explicit confidence levels for interpreting AI-generated analyses.

Implementation

Backend: e.g. NetworkX DiGraph with node/edge attributes.
Subgraph extraction for LLM context: bounded BFS (depth, max nodes), output as natural-language triples: entity_A —[edge_type]→ entity_B.

Next Steps

Hybrid Retrieval — Stage 2 expansion using the graph.
Memory System — Semantic memory and graph integration.

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

Knowledge graph

Knowledge Graph

Role in BioCortex

Node and Edge Types

Persistence and Ontology Integration

KG-Grounded Verification (Hallucination Detection)

Implementation

Next Steps

Getting Started

Core Framework

Tools & Extensions

Web & Automation

Deployment & auth

Advanced

Reference

​Knowledge Graph

​Role in BioCortex

​Node and Edge Types

​Persistence and Ontology Integration

​KG-Grounded Verification (Hallucination Detection)

​Implementation

​Next Steps

Knowledge Graph

Role in BioCortex

Node and Edge Types

Persistence and Ontology Integration

KG-Grounded Verification (Hallucination Detection)

Implementation

Next Steps