Provenance and Reproducibility
BioCortex records full provenance for every analysis and can export it as a reproducible Jupyter notebook or JSON.What Is Recorded
The ProvenanceTracker captures:- Per step: step ID, node name, timestamps, executed code, language, input/output file paths (with SHA256 hashes), parameters, tools used, package versions, stdout/stderr, success/failure.
- Environment: Python version, platform, hostname, key package versions.
- DAG structure: nodes and edges so the execution order and dependencies are preserved.
Export Formats
JSON
Machine-readable full audit trail: all steps, hashes, and metadata. Use for compliance, debugging, or custom tooling.Jupyter Notebook
- Each provenance step becomes a documented code cell (with step name, status, tools, inputs/outputs, errors).
- The notebook can be run cell-by-cell to reproduce the analysis.
- One-click re-execution of the entire pipeline in a standard environment.
Where It Fits in the Pipeline
The Executor (or sandbox layer) reports each execution to the ProvenanceTracker. The Orchestrator/Synthesizer can attach the provenance to the final report or make it available via the Web UI (e.g. “Download notebook” or “View provenance”).Configuration
- Enable/disable provenance.
- Output directory for JSON and notebook exports.
- Which metadata to include (hashes, env, package versions).
biocortex.core.provenance (or equivalent) and Configuration.