Registering Modules, Methods, and Samples
Registering a Module
A module is a logical grouping of related methods that share a common contract (the required outputs and metrics every method in the module must provide).
Command
wfc register-module --name my_analysis \
[--module-dir modules/my_analysis] \
[--description "My analysis module"] \
[--contracts '[{"type": "output", "name": "normalized", "value_type": "csv", "required": true}]']
Two ways to supply contracts
Source |
When used |
|---|---|
|
File is parsed automatically via |
|
Required when no |
What happens in the database
A Module row is upserted (matched by name).
One ModuleContract row is created for each contract entry. Module contracts declare the module’s required outputs and metrics (each row has a
typeofoutputormetric, aname, an optionalvalue_type, and arequiredflag). At method registration, the framework checks that each method’s declared outputs include every output the module marksrequired— module contracts are an output/metric guarantee, not an input-slot spec.
Modules are idempotent to re-register – calling the command again with the same name updates the existing row.
Registering a Method
A method is a single analysis step – a public Python function that lives in a directory under a module. (The function may optionally use the @wfc.method decorator from wfc-client, but plain public functions are discovered and tracked all the same.)
Command
wfc register-method modules/my_analysis/preprocess --module my_analysis
What the command does (step by step)
Locate and scan script – finds
{method_name}.pyin the given directory and AST-parses it to extract function signatures.Resolve module – looks up the parent module in the database (raises
ValueErrorif not found).Upsert Method row – creates or updates the method record, storing the script path and resolved environment name (from
method.yaml; a method must name a built container env viaenv:).Sync tracked functions and parameters – clears existing
TrackedFunctionandParamDefrows for this method, then inserts fresh rows from the AST scan results. Each tracked function gets an ordinal and its parameters are stored with name, type annotation, and default value.Store method contract – records
input_slots,output_slots,params_schema, andexecutorfrommethod.yaml. At least one input slot must be declared (raisesValueErrorotherwise).Validate against module contract – checks that the method’s declared outputs satisfy the parent module’s required outputs.
Git commit – commits the method directory to version control so the code version is captured in the cache key.
Copy source snapshot – copies source files to
methods/{method_name}/so the code fingerprint is always computed from the registered copy.
save_artifact static validation (Tier-1 only)
When a method uses the @wfc.method decorator, registration additionally AST-scans the decorated function body for ctx.save_artifact("<name>", ...) calls and cross-checks the literal output names against method.yaml outputs:. A required output with no matching save_artifact call, or a save_artifact name not declared in method.yaml, fails registration with a clear error – so output-name typos are caught at registration, not at run time. Dynamic (non-literal) names produce a warning rather than an error, and the scan is function-body-only (saves inside helper functions are not followed). This check applies only to decorated Tier-1 methods; a plain env-var + file method declares its outputs purely through method.yaml.
Key constraints
Every method must declare at least one input slot.
Method outputs must satisfy the parent module’s contract (required outputs must be present).
The
@wfc.methoddecorator is optional. AST discovery picks up every public (non-underscore) function in the script automatically;@wfc.method(from thewfc-clientpackage) is ergonomic Tier-1 sugar, not a requirement for a function to be discovered or tracked.
Registering a Sample
A sample is a data file (typically CSV) that serves as the root input to a pipeline.
Command
wfc register-sample --name CFPAC_ERKi --source /data/raw/cfpac_erki.csv
Prerequisites
DVC must be configured. Your .wfc/wf-canvas.toml must include a [dvc] section with a url field (e.g. url = "file:///path/to/storage", s3://..., ssh://...); wfc init creates and mirrors it automatically when auto_init = true. Registration raises DvcNotConfiguredError if the section is missing, if url is unset, or if .dvc/config declares no remotes.
What the command does
Content-hash – computes an MD5 hash of the source file via
hash_path().DVC cache store – copies the file into the DVC content-addressed cache (
cache_file()).DB row – creates a
Samplerow storingfile_size,file_mtime,registration_mode, andcontent_hash.Remote push – if a DVC remote is configured, the sample’s cache object is pushed to it. In standalone CLI mode the push is synchronous (failures are logged as a warning, leaving the sample in the local cache); inside a running pipeline the push is enqueued onto the background push worker instead.
Important: data/samples/ is ephemeral
The data/samples/ directory is an ephemeral workspace, not a permanent copy destination. The DVC cache is the sole authoritative store. Before pipeline execution, wfc restore-sample materializes the file from the cache into the workspace. The registered_path on the Sample row records where the file will be restored, not where it currently lives.
Only "copy" mode is currently implemented; "link" mode raises NotImplementedError.
Restoring samples
wfc restore-sample --name CFPAC_ERKi
Looks up Sample.content_hash, calls restore_from_cache() (with integrity verification), and falls back to pull_cache() if the local cache is empty. The restore_sample Snakemake rule integrates this into the DAG for lazy, on-demand restore during pipeline execution.
Registering from the Canvas
Everything above can also be done in the browser, without the CLI. In the Canvas Registry view, click + Register and pick what you’re registering – a module, a method, or a sample.
Browse for the path. Use the Browse… button to select the directory or file from your project root instead of typing the path by hand.
Dry Run first. Click Dry Run to run the same pre-checks registration performs – without saving anything. The modal shows each check as pass, warning, or fail (contract parsing, environment presence, the AST scan), so you can see exactly what will be validated before you commit.
Register. When the pre-checks look right, click Register to commit. If the server reports an error or a failing pre-check, it appears in a banner at the top of the modal and nothing is persisted.
This registers exactly what the CLI commands above do – use whichever fits your workflow.
Next Steps
[[run-and-inspect-results]] – learn how to browse runs, trace lineage, and view outputs after pipeline execution.
[[canvas]] – use the visual Pipeline Builder and Run History interface.