Standardize Your Data

Active Workflow Canonical Terms Required FSAR + SPSR Ready

Semantic Salmon System — entry path (start here)

Use this 5-part system in order

  1. DFO Salmon Ontology for canonical term IRIs
  2. Salmon data package for package metadata structure
  3. metasalmon for validation checks
  4. Salmon Data GPT as an optional drafting assistant
  5. SPSR for operational FSAR intake

Quick links:

Cookbook Guide

Step 1: Assess Your Current Data

What you need: - Your source dataset - A list of column names and meanings

How to do it:

  1. List all columns and describe each in plain language.
  2. Mark candidate fields for semantic mapping:
    • identifiers (CU/SMU/population)
    • time fields (brood year, return year, event date)
    • measurements (escapement, abundance, catch)
    • controlled codes (run type, method, status)

Step 2: Map to Canonical Terms

What you need: - Column assessment from Step 1 - GC DFO Salmon Ontology documentation

How to do it:

  1. Search for matching canonical terms.
  2. Confirm definitions match your intended meaning.
  3. Record the full canonical IRI for each mapping.

Example mapping (illustrative — confirm exact term IRIs in WIDOCO):

current_column,standard_term_label,standard_term_iri
CU_code,Conservation Unit,https://w3id.org/gcdfo/salmon#ConservationUnit
BY,Brood Year,https://w3id.org/gcdfo/salmon#BroodYear
Esc,Escapement,https://w3id.org/gcdfo/salmon#Escapement

Rule: do not use partial/shortened URIs.

Step 3: Build or Update Your Data Dictionary

Use one row per source column.

Minimum recommended fields:

  • variable_name
  • label
  • data_type
  • definition
  • standard_term_iri
  • unit (where applicable)
  • accepted_values (for categorical fields)

Example:

variable_name,label,data_type,definition,standard_term_iri,unit
CU_code,Conservation Unit,string,The conservation unit identifier,https://w3id.org/gcdfo/salmon#ConservationUnit,
BY,Brood Year,integer,The year in which spawning occurred,https://w3id.org/gcdfo/salmon#BroodYear,year
Esc,Escapement,float,Number of fish returning to spawn,https://w3id.org/gcdfo/salmon#Escapement,number of fish

Step 4: Standardize Controlled Values

  1. Identify categorical columns.
  2. List observed values.
  3. Map to controlled concepts and keep a mapping table.

Example:

variable_name,current_value,standard_value,concept_iri
run_type,Spring,Spring Run,https://w3id.org/gcdfo/salmon#SpringRun
run_type,Summer,Summer Run,https://w3id.org/gcdfo/salmon#SummerRun
run_type,Fall,Fall Run,https://w3id.org/gcdfo/salmon#FallRun

Step 5: Apply Transformations Reproducibly

  • apply mappings in script (R/Python), not manually in ad-hoc spreadsheets
  • preserve source column traceability
  • validate data types and units after transformation
  • version your transformed output

Step 6: Validate and Prepare for Intake

Before moving to package/upload:

If your destination is SPSR for FSAR workflows, continue with FSAR Data Standardization Workflow.

Next Steps