Provenance and Reproducibility

BioCortex records full provenance for every analysis and can export it as a reproducible Jupyter notebook or JSON.

What Is Recorded

The ProvenanceTracker captures:
  • Per step: step ID, node name, timestamps, executed code, language, input/output file paths (with SHA256 hashes), parameters, tools used, package versions, stdout/stderr, success/failure.
  • Environment: Python version, platform, hostname, key package versions.
  • DAG structure: nodes and edges so the execution order and dependencies are preserved.

Export Formats

JSON

Machine-readable full audit trail: all steps, hashes, and metadata. Use for compliance, debugging, or custom tooling.

Jupyter Notebook

  • Each provenance step becomes a documented code cell (with step name, status, tools, inputs/outputs, errors).
  • The notebook can be run cell-by-cell to reproduce the analysis.
  • One-click re-execution of the entire pipeline in a standard environment.

Where It Fits in the Pipeline

The Executor (or sandbox layer) reports each execution to the ProvenanceTracker. The Orchestrator/Synthesizer can attach the provenance to the final report or make it available via the Web UI (e.g. “Download notebook” or “View provenance”).

Configuration

  • Enable/disable provenance.
  • Output directory for JSON and notebook exports.
  • Which metadata to include (hashes, env, package versions).
See biocortex.core.provenance (or equivalent) and Configuration.