4 Step 2: Site and Survey Selection

4.1 Objective

Build an explicit site/survey selection specification for each output layer you plan to produce.

4.2 Most workflows do not have one final selected dataset

That is the key thing to keep straight.

In live species pipelines, different outputs may use different site sets. For example, one layer may support CU trend estimation while another supports pop representation or historical context.

4.3 Start by naming the output layers

At minimum, decide which of these you need:

Output layer	Typical purpose	Typical selection basis
`cu_trend`	trend metrics and status interpretation	indicator/WSP systems or species-specific trend stream set
`cu_abundance`	abundance, benchmarks, or benchmark-compatible roll-up	abundance/SR stream set or equivalent
`pop_representation`	pop-level context or reporting	broader mapped stream set
`historical_context`	continuity or legacy context	historical source layer with explicit caveats

Not every run needs all four. The point is to declare the layer first, then select sites for that layer.

4.4 Required artifact set

site_selection_layers.csv — one row per output layer
site_selection_spec.csv — row-level include/exclude decisions
site_selection_excluded_projects.csv — reviewed-but-excluded systems
site_selection_summary.md — plain-language rationale

If you generate narrative text from the spec table, keep that output with the bundle as supporting metadata.

4.5 Minimum columns for `site_selection_spec.csv`

output_layer
site_or_project_id
site_or_project_name
CU_ID or target grouping field
classification
include_flag
enhancement_handling
rationale
source_reference
review_date
reviewer

Add species-specific fields when needed, but keep the core columns stable.

4.6 Do this exactly

Start from verified lookup and crosswalk tables.
Declare the output layers required for this run.
Apply inclusion/exclusion rules separately for each layer.
Record the reason for every exclusion and special case.
Export both machine-readable and plain-language summaries.

4.7 Minimum rule set to document

quality threshold used,
enhancement handling rule,
aggregation / double-counting rule,
start-year or historical-context rule,
special-case handling rule.

4.8 Practical implementation notes from current repos

Sockeye: keep decoder, timing labels, and stream lists synchronized; the trend layer and abundance layer are not always the same thing.
Coho: document whether the run uses WSP-only systems for CU outputs and a broader all-stream layer for pop outputs.
Chum: selection is simpler, but aggregation assumptions still need to be declared explicitly.
Pink: keep the official CU series separate from the historical context layer.

4.9 Generated narrative text is useful, but not a substitute for the table

The Data-Standards repo includes patterns for generating plain-language site selection text. Use that if helpful, but keep the machine-readable spec table as the primary source of truth.

4.10 Stop conditions

Do not proceed quietly if:

a required CU ends up with zero included sites for a required layer,
site counts shift sharply from the prior release without explanation,
key systems disappear because of naming drift, or
two source datasets imply conflicting inclusion rules.