8 Step 6: Metadata and Metric Specifications

8.1 Objective

Produce the metadata and metric-spec layer that makes the CU series understandable, reviewable, and reusable.

8.2 Key rule: keep one source of truth

Use one canonical worksheet or flat-file source for metadata authoring, then derive the export files from it. Do not hand-edit multiple near-duplicate files and hope they stay aligned.

8.3 Core files in this layer

File	Purpose
`cu_metadata.csv`	CU context, method notes, provenance, interpretation
`wsp_metric_specs.csv`	metric settings, benchmark fields, CU-level status config
`codes.csv`	controlled vocabularies and permitted values
`metadata_notes.md`	unresolved items, ownership, caveats
`metadata_field_definitions.csv`	recommended field dictionary for maintainers
`published_integrated_statuses.csv`	optional comparison input

8.4 What `cu_metadata.csv` should carry

At minimum, keep fields that answer these questions:

What CU is this?
What data sources and method produced the series?
What interventions or caveats affect interpretation?
Who reviewed it, when, and with which versioned inputs?

Useful field groups include:

CU identity and aliases,
source systems and year coverage,
estimation mode,
benchmark-compatibility notes,
intervention and enhancement notes,
review status and provenance.

8.5 What `wsp_metric_specs.csv` should carry

This file is the machine-facing spec for downstream metric/status tools.

At minimum, make the metric choices explicit for each CU:

identity/context fields (CU_ID, CU_Name, Group),
abundance / trend metric selection,
generation and smoothing settings,
cyclic benchmark flags,
lower and upper benchmarks,
notes on disabled or non-applicable metrics.

If a metric is enabled, the supporting benchmark fields should be present and interpretable.

8.6 Controlled vocabularies belong in `codes.csv`

Do not bury permitted values in prose. Record them.

A minimal code-list structure is:

field_name
code
label
definition
status (current, proposed, deprecated)
notes

This is especially helpful for metric types, verification states, intervention flags, and any controlled status labels.

8.7 Spreadsheet / Google Sheet normalization rules

If metadata is maintained in a sheet, document the export rules explicitly:

retain a raw export,
exclude README or instruction tabs from normalized exports,
use one approved delimiter for repeatable values,
validate that multi-value fields do not mix delimiters,
flag placeholder text such as TBD or TEXT before release.

The point is not bureaucracy. It is avoiding silent cell-format nonsense.

8.8 Minimum validation script

m <- read.csv("cu_metadata.csv")
s <- read.csv("wsp_metric_specs.csv")

stopifnot(!any(names(m) == ""))
stopifnot(!any(names(s) == ""))
stopifnot(all(c("CU_ID", "CU_Name", "Species") %in% names(m)))
stopifnot(all(c("CU_ID", "CU_Name") %in% names(s)))
stopifnot(!any(is.na(m$CU_ID) | m$CU_ID == ""))
stopifnot(!any(duplicated(m["CU_ID"])))
stopifnot(setequal(unique(m$CU_ID), unique(s$CU_ID)))

Add stricter field checks for your release, but at minimum catch blank columns, missing CU IDs, and CU/spec mismatches between files.

8.9 Metadata QC bundle

Before sign-off, produce a short review pack that covers:

completeness summary,
unresolved placeholder list,
verification-pending list,
stale-status or stale-method flags,
any unexpected new code values.

8.10 Required outputs from Step 6

completed cu_metadata.csv
completed wsp_metric_specs.csv
codes.csv
metadata_notes.md
optional field-definition or summary outputs used by reviewers