5 Step 3: Record Selection

5.1 Objective

Select the best usable record for each analysis key after basic normalization and before final processing.

5.2 Important distinction

A “selected record” is not always a raw source row.

Sometimes the selected record is:

a cleaned source row,
a recoded value,
a merged record assembled from multiple inputs, or
a record retained specifically for a pop or historical-context layer.

That is fine, as long as the rule is explicit.

5.3 Do this exactly

Normalize obvious naming, timing, and estimate-class issues.
Define a deterministic selection hierarchy for the layer you are building.
Apply the hierarchy consistently.
Capture all overrides in the exception register and decision log.

5.4 Recommended hierarchy template

preferred estimate class or estimate type,
preferred survey method or platform,
preferred verified source,
approved manual override,
escalate if a true tie still remains.

5.5 Mandatory checks

no duplicate selected rows by analysis key,
biological zero vs missing is handled consistently,
year assignment rule is explicit,
output-layer intent is preserved.

5.6 Hidden patches should be promoted into explicit exceptions

If you rename a system, override a timing group, suppress a year, or force a value, put it in the exception register. Do not leave it buried in a script with no run-level note.

5.7 Typical analysis keys

Use the key that matches the layer you are building. For example:

CU_ID + Year for CU time series,
Pop_ID + Year or CU_ID + Pop_Name + Year for pop outputs,
a species-specific stream/timing key for matching or pre-processing steps.

5.8 Required outputs from Step 3

selected_records.csv
selection_exclusions.csv
selection_decisions.md
updated exception-register.csv

5.9 Escalate when

multiple records still tie after the full hierarchy,
estimate-class changes cause large retrospective shifts,
zero/missing ambiguity cannot be resolved from source documentation, or
a manual fix would materially change interpretation without a clear rationale.