Getting Started with Workflow Canvas
What is Workflow Canvas?
Workflow Canvas is a reproducible computational pipeline system built around wfc (Workflow Canvas CLI), its CLI. It solves the problem of managing multi-step analysis pipelines where you need:
Contracts — Methods declare their inputs, outputs, and parameters via
method.yaml. Modules enforce required outputs. The system catches wiring errors and missing outputs before you waste time on a long run.Caching — Each step is fingerprinted by its source code, parameters, and upstream inputs. Unchanged steps are skipped automatically. The system refuses to run on uncommitted code (
DirtyRepositoryError) to ensure reproducibility.Lineage — Every run is recorded in a SQLite database. You can trace any output back through its full DAG ancestry, including cache hits which appear as audit rows in the lineage.
Snakemake execution — Pipelines are defined as JSON and compiled to Snakefiles. Snakemake handles parallelism and dependency resolution.
Canvas UI — A visual interface for building and inspecting pipelines (separate feature).
wfc manages the full lifecycle: register your modules, methods, and data samples, define a pipeline, run it, and query lineage — all from the command line.
Prerequisites
Before you begin, make sure you have:
Python 3.11+ — Required. The config parser uses
tomllibfrom the standard library (added in 3.11). Installing wfc also gives youpip, which ships with Python.Docker — Required. Methods always run inside a container, so a working Docker installation is a hard requirement: nothing runs without it. On Windows and macOS this means Docker Desktop; on Linux, the Docker Engine and its daemon.
wfc initpre-flights Docker for you andwfc doctorchecks it any time, but neither can install it — that part is on you.Git — Required, but only locally. wfc records a commit for every run so results are reproducible, and it refuses to run on uncommitted code. All you need is the
gitcommand and a local identity (a name and email);wfc initeven sets a repo-local fallback identity if you have none configured. There is no GitHub account, no login, and no network involved — wfc never pushes anywhere. The git requirement is purely about a clean local history.
A few things you do not need to install separately:
DVC ships with wfc — it is a dependency, installed automatically when you
pip install workflow-canvas. wfc uses it as the content-addressed store for your run outputs and registered samples.wfc initconfigures a local archive for you; you never have to set DVC up by hand.Snakemake also ships with wfc and runs your pipelines under the hood — wfc generates the Snakefile and invokes it for you.
Container environments are built ahead of time from a backend such as a base image, pixi, or conda, but those tools belong to the environment-build step, not to running wfc itself. See [[registering-an-environment]] for how environments are built and named.
Installation
# Install Workflow Canvas (includes the wfc CLI, plus DVC and Snakemake)
pip install workflow-canvas
# Verify the installation
wfc --help
You should see the wfc CLI help output listing available commands, including init, doctor, register-module, register-method, register-sample, register-env, and run-pipeline.
Your First Pipeline
This walkthrough covers the happy path from project initialization to pipeline execution.
1. Initialize a project
wfc init --dir ./my_project
cd my_project
wfc init is a guided setup wizard that leaves you with a project that can actually run. Run it with no extra flags and it walks you through setup interactively; the goal is that when it finishes you can register a method and run a pipeline without any further hand-configuration. It does four things:
Scaffolds the project structure —
.wfc/(config + database),modules/,methods/,data/samples/,.runs/, and a.gitignore. Themodules/andmethods/directories start empty; you register your own in the next steps.Configures a backup archive for your outputs. Every project gets one — the wizard only asks where it should live, with a sensible default you can accept by pressing Enter (
~/.wfc/archives/<project>, kept outside the repo so it survives). This is wired up as a live DVC archive; you do not configure DVC yourself.Sets up git. If the directory is not already a git repository, the wizard runs
git initand makes a clean initial commit of the scaffold. That gives the run-gate a real starting commit and a clean tree, so your first run is not blocked by a missingHEADor a “dirty repository” error. If you have no git identity configured, it sets a repo-local one for you so the commit always lands — you can change it later withgit config.Pre-flights Docker and prints a health summary at the end, so you immediately know whether your project is ready to run or what is still missing.
The wizard is idempotent: it is always safe to re-run. It never re-asks questions you have already answered and never clobbers existing config — each step checks “does this already exist?” first. That makes recovery simple: if a tool was missing, install it, re-run wfc init to finish only what is left, and run wfc doctor to confirm.
For scripts and CI, run it non-interactively:
wfc init --dir ./my_project --yes # accept all defaults, no prompts
wfc init --dir ./my_project --archive /data/archives/my_project --yes
--yes accepts every default (including the git-identity fallback), and --archive PATH sets the output archive location without prompting.
One door for “why won’t this run?” If a run ever refuses to start, run
wfc doctor. It checks git, the output archive, and Docker, prints a health table, and exits non-zero if anything is broken — handy both at your terminal and as a CI gate. When a run is blocked, wfc points you atwfc doctorrather than dumping a raw error.
About the archive. The archive stores your outputs as content-addressed blobs, indexed by the database in
.wfc/(which is deliberately not tracked in git). To keep archived outputs recoverable, back up your.wfc/directory along with the archive folder.
Tip: Run
wfc seedto populate the project with demo modules, methods, and sample data for experimentation.
2. Register a module
Modules group related methods under a domain name with output contracts. You can define contracts in a module.yaml file or pass them via CLI:
# From module.yaml (recommended):
wfc register-module --name cell_analysis --module-dir modules/cell_analysis
# Or from CLI JSON:
wfc register-module --name cell_analysis \
--contracts '[{"type": "output", "name": "result", "value_type": ".csv", "required": true}]'
3. Register a method
Methods are individual analysis scripts. Registration AST-scans the script for public functions, parses method.yaml for contracts, validates environment resolution, checks outputs against module contracts, and git-commits the source:
# Nested method under a module:
wfc register-method modules/cell_analysis/preprocess --module cell_analysis
# Flat standalone method:
wfc register-method methods/aggregate --module my_module
A method declares the container environment it runs in (via env: in its method.yaml), and that environment must be built and registered with wfc register-env before the method can run. See [[registering-an-environment]] for the details.
Two Pythons, by design. The wfc engine runs in its own environment on the host machine; your method runs in its own container environment, which contains only your declared dependencies (and, optionally, the pure-stdlib
wfc-clientpackage). wfc never imports your method’s code and your method never imports the wfc engine — they communicate only throughWFC_*environment variables and files in the run directory. Because that contract is plain env vars and files, any recorded run can be reproduced later regardless of which client version (or none) the method was authored with.
4. Register a sample
Samples are your input data. Registration content-hashes the file and stores it in the DVC cache. The data/samples/ directory is an ephemeral workspace — files are restored lazily by Snakemake at execution time, not copied at registration:
wfc register-sample --name CFPAC_ERKi --source /data/raw/cfpac_erki.csv
5. Define and run a pipeline
Pipelines are JSON files with nodes, links, and samples. Each node references a registered method; links wire outputs to named input slots:
{
"nodes": [
{"id": "filter_ctrl", "method": "csv_filter", "module": "csv_tools",
"params": {"column": "condition", "values": ["control"]}}
],
"links": [
{"source": "filter_ctrl", "target": "analyze", "target_slot": "data"}
],
"samples": ["CFPAC_ERKi"]
}
Run it:
wfc run-pipeline --pipeline pipeline.json --cores 4
This parses and validates the pipeline (cycle detection, slot wiring), generates a Snakefile, and executes via Snakemake. Each step checks git state, checks the cache (skipping the step on a hit), runs in its container if needed, archives output, and records the run in the database. If a run won’t start, wfc doctor will tell you why.
Next Steps
Now that you have a working pipeline, explore further:
[[registering-an-environment]] — Build and register the container environment your methods run in. Every method needs one, and
wfc doctorchecks that Docker is ready for it.[[authoring-a-method-script]] — Write methods with the
wfc-clientdecorator (recommended) or the canonicalWFC_*env-var + file contract, and declare contracts inmethod.yaml.[[writing-contracts]] — Declare and enforce the inputs and outputs that wire your pipeline together correctly.
[[project-anatomy]] — Understand the directory structure, the database, the config file, and how modules, methods, and runs are organized.
[[canvas]] — Build and inspect pipelines visually instead of by hand-editing JSON.
[[run-and-inspect-results]] — Find a run’s outputs and trace its lineage after it completes.