<!-- generated from pm_mvp::docs.consumer.tutorials.writing-contracts @ a9302c4c8244; do not edit -->

# Tutorial: Writing Contracts

## Writing contracts

A *contract* is the promise a method makes about its data: which inputs it accepts, which outputs it produces, and what those files must contain. Contracts are how Workflow Canvas catches a broken pipeline early -- when you wire two nodes together, or the moment a step starts -- instead of after a long run has already burned compute and produced garbage.

This tutorial covers the three things you need to write contracts well:

- **The two tiers** -- a *module* contract that every method in a module must honor, and a *method* contract that declares each method's own inputs and outputs.
- **Column validation** -- declaring which CSV/Parquet columns a slot requires, so a typo'd or missing column is caught instead of silently dropped.
- **The three enforcement points** -- registration, pipeline load, and runtime -- and the mental model for *why* a pipeline is rejected at one stage versus another.

This builds directly on the method you wrote in [[authoring-a-method-script]]. If you haven't authored a method yet, start there. For the exhaustive field-by-field reference (every key and its default), see [[method-yaml-schema]] -- this tutorial teaches the model, that page is the lookup table.

## Two tiers: module and method contracts

Contracts live at two levels, and they compose.

**Method contract (`method.yaml`)** -- declares the inputs, outputs, and parameters of *one* method. This file is the source of truth for that method's I/O shape. You write one per method, alongside its script:

```yaml
# methods/binary_labeling/method.yaml
inputs:
  data:
    type: .csv
    required: true
    description: "Labeled cell CSV"
outputs:
  predictions:
    type: .csv
    required: true
  model:
    type: .pkl
    required: true
params:
  threshold:
    type: float
    default: 0.5
```

**Module contract (`module.yaml`)** -- sits one level up and declares guarantees that *every* method in the module must satisfy. Use it for domain-wide promises: "every classifier in this module produces a `model` output and reports an `mcc` metric."

```yaml
# modules/cell_classifiers/module.yaml
description: Train and apply binary classifiers on labeled cell data
contracts:
  - type: output
    name: model
    value_type: model
    required: true
  - type: metric
    name: mcc
    value_type: float
    required: true
```

Each module-contract entry has four fields: `type` (`output` or `metric`), `name` (the required output/metric name), `value_type` (the expected type, e.g. `float`, `model`, `.csv`), and `required` (default `true`).

**How they compose:** the method contract defines the *shape* (which slots exist and their types); the module contract guarantees that certain *named* outputs and metrics are always present, no matter which method you run. A method is free to declare more outputs than the module requires -- it just can't declare fewer of the required ones.

## Declaring module contracts: file vs. CLI

You can declare a module's contracts in `modules/{module}/module.yaml` (shown above) or pass them inline when you register the module:

```bash
wfc register-module --name cell_classifiers \
  --contracts '[{"type":"output","name":"model","value_type":"model","required":true}]'
```

`--contracts` accepts either an inline JSON string or a path to a JSON file. The flag and `module.yaml` serve the same purpose, so what happens when both are present? **The CLI `--contracts` flag takes precedence, and `module.yaml` is the fallback.** In day-to-day work, prefer `module.yaml` -- it lives next to the code, is version-controlled, and needs no re-typing. Reach for `--contracts` only when you're scripting registration or want to override the file for a one-off.

Method contracts have no CLI equivalent: `method.yaml` is always the source of truth for a method's own inputs and outputs.

## Column validation: what's inside the file

Declaring a slot's `type: .csv` only promises the *file format*. Often you need to promise the file's *contents* -- that it actually has the columns the next step reads. For `csv` and `parquet` slots, add a `columns:` block:

```yaml
inputs:
  data:
    type: .csv
    columns:
      strict: ["cell_index", "condition", "proliferative"]
      patterns: ["*_intensity"]
```

There are three ways to declare required columns, and they combine freely:

| Key | Meaning |
|---|---|
| `strict` | Exact column names that must be present, verbatim. |
| `from_params` | Column names *derived* from this run's parameter values (see below). |
| `patterns` | fnmatch-style globs; at least one real column must match each pattern. |

**The hard/soft split is the key idea.** Where you put the `columns:` block changes how strictly it's enforced:

- On an **input** slot, column validation is a **hard gate**. If a required column is missing when the step starts, the step fails immediately with an error naming the missing column. Nothing downstream runs on bad data.
- On an **output** slot, column validation is a **soft check**. A missing column produces a *warning* in the run log but does not fail the step. The idea is that you control your own outputs, so drift there is a heads-up, not a wall -- whereas inputs arrive from upstream and must be trusted before you compute on them.

## Worked example: from_params

`strict` is fine when column names are fixed. But often a method's required columns depend on its *parameters* -- you don't know the exact names until the run is configured. That's what `from_params` is for: it builds required column names from this run's parameter values.

Say a method scores each cell against a list of marker genes, and it expects one score column per marker, named like `<marker>_score`. The marker list is a parameter:

```yaml
params:
  markers:
    type: list
    required: true

inputs:
  data:
    type: .csv
    columns:
      from_params:
        - params: ["markers"]
          pattern: "{}_score"
```

Each `from_params` entry names one or more `params` and a `pattern` (a Python `str.format` template; the default is `"{}"`, which just uses the value as-is). At validation time, every named param's value is collected (a list expands to all its items), the **cartesian product** across the named params is taken, and each combination is fed through the pattern.

So if a run sets `markers: ["cd3", "cd8"]`, the contract requires the input CSV to contain `cd3_score` **and** `cd8_score`. Change the param to `["cd3", "cd8", "foxp3"]` and the contract automatically requires three columns -- no edit to `method.yaml`.

You can cross multiple params for a grid. Two params, `markers: ["cd3", "cd8"]` and `stains: ["dapi"]`, with `pattern: "{}_{}"` require `cd3_dapi` and `cd8_dapi` -- one column per combination. If any named param is missing from the run, that entry is skipped rather than erroring, so optional params won't break validation.

A practical companion to `from_params` lives on the parameter side: in the canvas inspector, a string param can declare `column_of_input: <slot>` to turn its text box into a dropdown populated from the *upstream* node's declared columns, or `new_column: true` to mark a param that *names a column this method produces* (kept as free text). These are authoring conveniences for picking column names correctly; they reuse the same column vocabulary your contracts declare.

## Where contracts are enforced

Contracts are checked at three distinct moments. Knowing which check fires when is the difference between a confusing failure and an obvious one.

**1. At registration (`wfc register-method`).** When you register a method, Workflow Canvas parses its `method.yaml` and validates two things against the module: every *required module output* must appear among the method's output slots, and the method must declare *at least one input slot* (a root-like method takes its input from a system node such as the Input Selector, never from nothing). A method that omits a required module output, or declares no inputs, fails registration with a message listing what's missing. Additionally, each output slot's `type` is validated at registration — a bare un-dotted name (`csv`), stale semantic name (`anndata`, `model`, `figure`), or missing/empty value raises a `ValueError` with a guidance message explaining the dotted-extension convention (e.g. `.csv`, `.pkl`, `.h5ad`). This is the earliest, cheapest check -- it runs before the method is ever wired into a pipeline.

**2. At pipeline load (`load_pipeline`).** When a pipeline is loaded -- whether you run it from the CLI or validate the graph in the canvas -- the wiring is checked against the declared slots: every required input slot must be connected, and connections must respect declared types. On top of that, a *static* column cross-check runs: for any two connected steps, the `strict` columns the downstream input requires are compared against the `strict` columns the upstream output declares. If the upstream provably can't supply a column the downstream demands, the pipeline is rejected here -- before a single step executes. (`from_params` and `patterns` columns are deferred to runtime, since their exact names depend on parameter values and real data.)

**3. At runtime (inside the running step).** When a step actually runs, the input column hard gate fires first: the input files are read and checked for all required columns (`strict`, expanded `from_params`, and `patterns`), and a missing column raises an error before your code runs. After your script finishes, two more checks run: the module contract is verified by looking for the required outputs and metrics in `WFC_RUN_DIR` (a missing required output, or a non-zero exit, fails the step), and the output column soft check warns on any output drift.

**The mental model -- load vs. runtime.** A pipeline is rejected *at load* when the failure is knowable from the contracts alone: a missing wire, a type mismatch, or a `strict` column the topology can never supply. It's rejected *at runtime* when the failure depends on the actual data or the resolved parameter values: a real input file that's missing a column, a `from_params` column that wasn't produced, or a declared output your script never wrote. Load-time checks are about *can this pipeline possibly be valid*; runtime checks are about *did this specific run honor the contract*.

## Next steps

You can now declare contracts at both tiers, validate file contents by column, and reason about which enforcement stage will catch which mistake.

- For the complete `method.yaml` field reference -- every input, output, and param key, all column-spec options, and the `env`/`executor` fields -- see [[method-yaml-schema]].
- To author the script those contracts wrap, revisit [[authoring-a-method-script]].
- Before a contracted method can run, its environment must be built and registered; see [[registering-an-environment]].
- To wire contracted methods into a pipeline and watch type checks block bad connections live, see [[use-the-canvas]].