5 Step 3: Record Selection

5.1 Objective

Select the best usable record for each analysis key after basic normalization and before final processing.

5.2 Important distinction

A “selected record” is not always a raw source row.

Sometimes the selected record is:

  • a cleaned source row,
  • a recoded value,
  • a merged record assembled from multiple inputs, or
  • a record retained specifically for a pop or historical-context layer.

That is fine, as long as the rule is explicit.

5.3 Do this exactly

  1. Normalize obvious naming, timing, and estimate-class issues.
  2. Define a deterministic selection hierarchy for the layer you are building.
  3. Apply the hierarchy consistently.
  4. Capture all overrides in the exception register and decision log.

5.5 Mandatory checks

  • no duplicate selected rows by analysis key,
  • biological zero vs missing is handled consistently,
  • year assignment rule is explicit,
  • output-layer intent is preserved.

5.6 Hidden patches should be promoted into explicit exceptions

If you rename a system, override a timing group, suppress a year, or force a value, put it in the exception register. Do not leave it buried in a script with no run-level note.

5.7 Typical analysis keys

Use the key that matches the layer you are building. For example:

  • CU_ID + Year for CU time series,
  • Pop_ID + Year or CU_ID + Pop_Name + Year for pop outputs,
  • a species-specific stream/timing key for matching or pre-processing steps.

5.8 Required outputs from Step 3

  • selected_records.csv
  • selection_exclusions.csv
  • selection_decisions.md
  • updated exception-register.csv

5.9 Escalate when

  • multiple records still tie after the full hierarchy,
  • estimate-class changes cause large retrospective shifts,
  • zero/missing ambiguity cannot be resolved from source documentation, or
  • a manual fix would materially change interpretation without a clear rationale.