4 Step 2: Site and Survey Selection
4.1 Objective
Build an explicit site/survey selection specification for each output layer you plan to produce.
4.2 Most workflows do not have one final selected dataset
That is the key thing to keep straight.
In live species pipelines, different outputs may use different site sets. For example, one layer may support CU trend estimation while another supports pop representation or historical context.
4.3 Start by naming the output layers
At minimum, decide which of these you need:
| Output layer | Typical purpose | Typical selection basis |
|---|---|---|
cu_trend |
trend metrics and status interpretation | indicator/WSP systems or species-specific trend stream set |
cu_abundance |
abundance, benchmarks, or benchmark-compatible roll-up | abundance/SR stream set or equivalent |
pop_representation |
pop-level context or reporting | broader mapped stream set |
historical_context |
continuity or legacy context | historical source layer with explicit caveats |
Not every run needs all four. The point is to declare the layer first, then select sites for that layer.
4.4 Required artifact set
site_selection_layers.csv— one row per output layersite_selection_spec.csv— row-level include/exclude decisionssite_selection_excluded_projects.csv— reviewed-but-excluded systemssite_selection_summary.md— plain-language rationale
If you generate narrative text from the spec table, keep that output with the bundle as supporting metadata.
4.5 Minimum columns for site_selection_spec.csv
output_layersite_or_project_idsite_or_project_nameCU_IDor target grouping fieldclassificationinclude_flagenhancement_handlingrationalesource_referencereview_datereviewer
Add species-specific fields when needed, but keep the core columns stable.
4.6 Do this exactly
- Start from verified lookup and crosswalk tables.
- Declare the output layers required for this run.
- Apply inclusion/exclusion rules separately for each layer.
- Record the reason for every exclusion and special case.
- Export both machine-readable and plain-language summaries.
4.7 Minimum rule set to document
- quality threshold used,
- enhancement handling rule,
- aggregation / double-counting rule,
- start-year or historical-context rule,
- special-case handling rule.
4.8 Practical implementation notes from current repos
- Sockeye: keep decoder, timing labels, and stream lists synchronized; the trend layer and abundance layer are not always the same thing.
- Coho: document whether the run uses WSP-only systems for CU outputs and a broader all-stream layer for pop outputs.
- Chum: selection is simpler, but aggregation assumptions still need to be declared explicitly.
- Pink: keep the official CU series separate from the historical context layer.