2 Getting Started
2.1 Quick orientation
Use this cookbook to keep the human-facing contract straight while the species prep repos do the heavy lifting.
This repo tells you:
- what you are producing,
- which intermediate artifacts must be reviewed,
- what belongs in the final CU/WSP bundle, and
- what can be treated as a derived handoff for another system.
2.2 The deliverable in one sentence
For most runs, the core analytical bundle is:
cu_timeserieswsp_metric_specs- optional
wsp_cyclic_benchmarks - recommended
cu_metadata
Keep these human-readable sidecars with the bundle:
run-log.mddecision-log.mdqc-summary.mdmetadata_notes.md
2.3 One bundle, two layers
2.4 Keep these data layers separate
| Layer | Typical files | Why it exists |
|---|---|---|
| Source inputs | annual extracts, lookups, static crosswalks | raw material for the run |
| Selection + processing intermediates | matching checks, decomposition tables, CU-specific prep tables | prove the method behaved as expected |
| Release tables | cu_timeseries, optional pop_timeseries |
main analytical outputs |
| Status bundle tables | wsp_metric_specs, optional wsp_cyclic_benchmarks, optional published_integrated_statuses |
supports WSP metric and rapid-status workflows |
| Exchange / consumer bridges | Salmon Data Package, derived cuyear.csv, derived cu_metadata.csv |
packaging and system-specific handoff |
The main trap to avoid is pretending there is one flat “selected dataset” that feeds every output. In practice, species workflows often use different data layers for different purposes.
2.5 Repository map (what each repo is for)
2.5.1 Core cookbook
dfo-pacific-science/cu-escapement-data-cookbook- Purpose: contract, templates, examples, QA
2.5.2 Species prep repos (primary execution)
BronwynMacDonald/FRSK-WSPDataPrep- Fraser Sockeye CU + pop prep
BronwynMacDonald/FRCo-WSPDataPrep- Interior Fraser Coho CU + pop prep
BronwynMacDonald/FRCm-WSPDataPrep- Lower Fraser Chum prep
BronwynMacDonald/FRPink-WSPDataPrep- Fraser Pink prep
2.6 Standard prep-repo folders
Most species repos use:
DATA_IN/— annual inputs and staged source extractsDATA_LOOKUP_FILES/— static lookups and crosswalksDATA_PROCESSING/— intermediate processing tablesDATA_TRACKING/— matching checks and review artifactsDATA_OUT/— final species outputsCODE/— executable scripts
Treat DATA_PROCESSING/ and DATA_TRACKING/ as part of the deliverable, not as
throwaway scratch.
2.7 Before first run
2.7.1 Install baseline R packages
Some pipelines can run with less, but Sockeye and Coho paths often assume a broader tidyverse-style setup.
2.8 Suggested run graph
- verify source files and identifiers,
- declare the output layers you need,
- select sites/surveys for each layer,
- select records and apply approved fixes,
- process values and review intermediate QC artifacts,
- estimate
cu_timeseriesand any supporting outputs, - assemble metadata + metric specs,
- optionally wrap the release in a package or consumer bridge.
2.9 Minimum outputs to plan for
Even a straightforward annual update should usually leave you with:
- one canonical
cu_timeseriestable, - any needed supporting tables (
pop_timeseries, historical context layer), - a populated
wsp_metric_specsfile, - intermediate QC artifacts reviewed and logged,
- an exception register,
- short run/decision/QC notes.