Project Anatomy

Directory Structure

After wfc init, your project has the following layout:

my_project/
  .wfc/
    wf-canvas.toml        # Project config (DB path, pixi root, DVC archive URL)
    wfc.db                # SQLite database — single source of truth
    envs.json             # Registered container environments (git-tracked env manifest)
  modules/                # Empty after init; populated by register-module / register-method
  methods/                # Empty after init; populated by register-method (flat standalone)
  data/
    samples/              # Ephemeral workspace for restored sample files
                          #   DVC-managed; lazily restored at execution time
  .runs/                  # Sentinels + transient staging (bytes live in the DVC archive)
    sentinels/            #   zero-byte Snakemake DAG-wiring sentinels
    {run_id}/             #   transient staging; archive pass moves bytes to DVC archive
  .dvc/                   # DVC working directory (always present after init)
    cache/files/md5/      #   local content-addressed cache
  .gitignore              # Pre-configured for wfc artifacts

After registering modules and methods, the tree fills in:

modules/
  cell_analysis/          # Created by register-module
    module.yaml           #   module-level output contracts
    preprocess/           # Created by register-method
      preprocess.py
      method.yaml
methods/
  aggregate/              # Created by register-method (flat standalone)
    aggregate.py
    method.yaml

Key points:

  • modules/ holds nested module/method hierarchies. Each module directory can contain a module.yaml for output contracts and subdirectories for each method.

  • methods/ holds flat standalone methods that belong to a module but live outside the module directory.

  • data/samples/ is NOT a permanent storage location. Files are restored here lazily by the Snakemake restore_sample rule at pipeline execution time. The actual data lives in the DVC archive.

  • .runs/sentinels/ holds zero-byte Snakemake DAG-wiring sentinels (no output bytes). .runs/{run_id}/ is a transient staging directory — methods write outputs there, then the archive pass moves them to the DVC archive, which is the sole permanent store. There is no .runs/workspace/ tree.

  • .dvc/ is always created by wfc init. It holds the local DVC cache. The long-term archive lives at the url configured in wf-canvas.toml (default: ~/.wfc/archives/<project>), which is outside the repo so outputs survive repo re-creation.

  • .wfc/envs.json IS tracked in git. It is the manifest of your registered container environments — each env’s backend, build spec, and resolved image content digest. Committing it makes your environments part of the project’s reproducible record (a lockfile for environments): a collaborator who checks out the repo gets the exact image digests your methods were validated against. See [[registering-an-environment]].

  • .wfc/wfc.db is NOT tracked in git (listed in .gitignore). It is the index that maps every output to its content hash. Back up the .wfc/ directory along with your archive folder to keep results recoverable.

The Database

.wfc/wfc.db is a SQLite database and the single source of truth for all pipeline state. It contains 11 tables (via SQLModel) covering:

  • Registrationmodules, methods, method_contracts, module_contracts, tracked_functions, param_defs, samples

  • Executionruns, run_inputs, run_outputs

  • Versioningmethod_versions

You do not interact with the database directly. The wfc CLI manages all reads and writes. Any system that reads pipeline data (canvas UI, lineage queries, reports) consumes this schema.

Every run is recorded here with its method, parameters, sample, status, cache key, and metrics. Lineage is traced through run_inputs.source_run_id relationships. Cache hits create audit rows (cache_source_run_id set) that appear in lineage like normal runs.

The database is the provenance record: it is the index that maps each output to the content hash holding its bytes. Because it is state rather than source, wfc init lists .wfc/wfc.db in .gitignore — git versions your method code, not your run history. That also means the database is part of what you must back up to keep results recoverable; see [[storage-and-provenance]].

Configuration

Project settings live in .wfc/wf-canvas.toml — the database connection, the environment root, a [dvc] block (always written by wfc init; only the archive URL location is configurable), and an optional registry block. This file is committed to git as project source. For the full field-by-field reference, see [[wf-canvas-toml]].

Modules vs. Methods

Modules are organizational containers that group related methods under a domain name (e.g., cell_analysis, csv_tools). Modules define output contracts — required outputs and metrics that every method in the module must produce. Contracts come from module.yaml or the CLI --contracts flag.

Methods are individual analysis scripts. Each method has:

  • A Python script ({method_name}.py) — the analysis implementation (plain script or using wfc-client sugar)

  • A method.yaml declaring input slots, output slots, parameters, and the container environment it runs in

  • A parent module that it belongs to

Nested vs. Flat

Methods can be organized two ways:

  • Nested under modules/ — the method directory lives inside its module directory: modules/cell_analysis/preprocess/preprocess.py. Register with wfc register-method modules/cell_analysis/preprocess --module cell_analysis.

  • Flat under methods/ — the method lives in a standalone directory: methods/aggregate/aggregate.py. Register with wfc register-method methods/aggregate --module csv_tools.

Both layouts are functionally equivalent. The module association is set by the --module flag, not by directory location.

Registration validations

When you register a method, wfc:

  1. AST-scans the script (no import, no side effects) for @wfc.method-decorated functions and save_artifact calls

  2. Parses method.yaml for contracts and the env name

  3. Validates at least one input slot is declared

  4. Validates the named container environment has already been built

  5. Validates method outputs satisfy the module’s required output contracts

  6. Git-commits the method directory

For the full walkthrough of authoring a method, declaring its contracts, and building its environment, see [[authoring-a-method-script]] and [[registering-an-environment]].

The Run Workspace (.runs/)

The .runs/ directory is where a pipeline does its transient work. It serves two purposes:

  • sentinels/ — Zero-byte sentinel files used only for Snakemake DAG wiring, organized by {pipeline_id}/{node_id}/{sample}/{variant}/.complete. They mark a step as finished; they hold no output bytes.

  • {run_id}/ — A transient staging directory. A method writes its outputs here, then the archive pass moves the bytes into the DVC content-addressed cache, which is the sole authoritative store. The former .runs/workspace/ output tree no longer exists; outputs are reached by content-hash, not by a workspace path.

Because .runs/ is staging and sentinels rather than durable data, wfc init adds it to .gitignore — you never commit it and you can delete it safely between runs.

What lives where

The key idea is that output bytes never live under .runs/ for long. They are hashed and moved into the content-addressed cache, and the database records which hash holds each output. Whether a step re-runs or reuses a prior result is decided by a cache key computed from the method’s code, its parameters, its inputs, and its environment. Both of these — the storage model and the cache-key model — have their own pages:

  • [[storage-and-provenance]] explains the content-addressed cache, what git does and does not track, how the cache is shared across machines, and what to back up.

  • [[caching-and-reproducibility]] explains how the cache key is computed and why a given step re-runs or hits the cache.

This page stays focused on the layout: .wfc/ for config and the database, modules/ and methods/ for your code, data/samples/ for restored inputs, and .runs/ plus the DVC cache for execution.

Next Steps

With an understanding of the project structure, continue to:

  • [[wf-canvas-toml]] — The full reference for every section of .wfc/wf-canvas.toml.

  • [[authoring-a-method-script]] — How to write method scripts using wfc-client (@wfc.method + ctx.save_artifact) or the plain environment-variable contract, and declare contracts in method.yaml.

  • [[registering-an-environment]] — How to build the container environment a method runs in.

  • [[storage-and-provenance]] and [[caching-and-reproducibility]] — The deeper story behind where outputs live and why steps re-run.