8 Step 6: Metadata and Metric Specifications
8.1 Objective
Produce the metadata and metric-spec layer that makes the CU series understandable, reviewable, and reusable.
8.2 Key rule: keep one source of truth
Use one canonical worksheet or flat-file source for metadata authoring, then derive the export files from it. Do not hand-edit multiple near-duplicate files and hope they stay aligned.
8.3 Core files in this layer
| File | Purpose |
|---|---|
cu_metadata.csv |
CU context, method notes, provenance, interpretation |
wsp_metric_specs.csv |
metric settings, benchmark fields, CU-level status config |
codes.csv |
controlled vocabularies and permitted values |
metadata_notes.md |
unresolved items, ownership, caveats |
metadata_field_definitions.csv |
recommended field dictionary for maintainers |
published_integrated_statuses.csv |
optional comparison input |
8.4 What cu_metadata.csv should carry
At minimum, keep fields that answer these questions:
- What CU is this?
- What data sources and method produced the series?
- What interventions or caveats affect interpretation?
- Who reviewed it, when, and with which versioned inputs?
Useful field groups include:
- CU identity and aliases,
- source systems and year coverage,
- estimation mode,
- benchmark-compatibility notes,
- intervention and enhancement notes,
- review status and provenance.
8.5 What wsp_metric_specs.csv should carry
This file is the machine-facing spec for downstream metric/status tools.
At minimum, make the metric choices explicit for each CU:
- identity/context fields (
CU_ID,CU_Name,Group), - abundance / trend metric selection,
- generation and smoothing settings,
- cyclic benchmark flags,
- lower and upper benchmarks,
- notes on disabled or non-applicable metrics.
If a metric is enabled, the supporting benchmark fields should be present and interpretable.
8.6 Controlled vocabularies belong in codes.csv
Do not bury permitted values in prose. Record them.
A minimal code-list structure is:
field_namecodelabeldefinitionstatus(current,proposed,deprecated)notes
This is especially helpful for metric types, verification states, intervention flags, and any controlled status labels.
8.7 Spreadsheet / Google Sheet normalization rules
If metadata is maintained in a sheet, document the export rules explicitly:
- retain a raw export,
- exclude
READMEor instruction tabs from normalized exports, - use one approved delimiter for repeatable values,
- validate that multi-value fields do not mix delimiters,
- flag placeholder text such as
TBDorTEXTbefore release.
The point is not bureaucracy. It is avoiding silent cell-format nonsense.
8.8 Minimum validation script
m <- read.csv("cu_metadata.csv")
s <- read.csv("wsp_metric_specs.csv")
stopifnot(!any(names(m) == ""))
stopifnot(!any(names(s) == ""))
stopifnot(all(c("CU_ID", "CU_Name", "Species") %in% names(m)))
stopifnot(all(c("CU_ID", "CU_Name") %in% names(s)))
stopifnot(!any(is.na(m$CU_ID) | m$CU_ID == ""))
stopifnot(!any(duplicated(m["CU_ID"])))
stopifnot(setequal(unique(m$CU_ID), unique(s$CU_ID)))Add stricter field checks for your release, but at minimum catch blank columns, missing CU IDs, and CU/spec mismatches between files.