Standardize Your Data
Active Workflow
Canonical Terms Required
FSAR + SPSR Ready
Semantic Salmon System — entry path (start here)
Use this 5-part system in order
- DFO Salmon Ontology for canonical term IRIs
- Salmon data package for package metadata structure
- metasalmon for validation checks
- Salmon Data GPT as an optional drafting assistant
- SPSR for operational FSAR intake
Quick links:
Cookbook Guide
Step 1: Assess Your Current Data
What you need: - Your source dataset - A list of column names and meanings
How to do it:
- List all columns and describe each in plain language.
- Mark candidate fields for semantic mapping:
- identifiers (CU/SMU/population)
- time fields (brood year, return year, event date)
- measurements (escapement, abundance, catch)
- controlled codes (run type, method, status)
Step 2: Map to Canonical Terms
What you need: - Column assessment from Step 1 - GC DFO Salmon Ontology documentation
How to do it:
- Search for matching canonical terms.
- Confirm definitions match your intended meaning.
- Record the full canonical IRI for each mapping.
Example mapping (illustrative — confirm exact term IRIs in WIDOCO):
current_column,standard_term_label,standard_term_iri
CU_code,Conservation Unit,https://w3id.org/gcdfo/salmon#ConservationUnit
BY,Brood Year,https://w3id.org/gcdfo/salmon#BroodYear
Esc,Escapement,https://w3id.org/gcdfo/salmon#Escapement
Rule: do not use partial/shortened URIs.
Step 3: Build or Update Your Data Dictionary
Use one row per source column.
Minimum recommended fields:
variable_namelabeldata_typedefinitionstandard_term_iriunit(where applicable)accepted_values(for categorical fields)
Example:
variable_name,label,data_type,definition,standard_term_iri,unit
CU_code,Conservation Unit,string,The conservation unit identifier,https://w3id.org/gcdfo/salmon#ConservationUnit,
BY,Brood Year,integer,The year in which spawning occurred,https://w3id.org/gcdfo/salmon#BroodYear,year
Esc,Escapement,float,Number of fish returning to spawn,https://w3id.org/gcdfo/salmon#Escapement,number of fish
Step 4: Standardize Controlled Values
- Identify categorical columns.
- List observed values.
- Map to controlled concepts and keep a mapping table.
Example:
variable_name,current_value,standard_value,concept_iri
run_type,Spring,Spring Run,https://w3id.org/gcdfo/salmon#SpringRun
run_type,Summer,Summer Run,https://w3id.org/gcdfo/salmon#SummerRun
run_type,Fall,Fall Run,https://w3id.org/gcdfo/salmon#FallRun
Step 5: Apply Transformations Reproducibly
- apply mappings in script (R/Python), not manually in ad-hoc spreadsheets
- preserve source column traceability
- validate data types and units after transformation
- version your transformed output
Step 6: Validate and Prepare for Intake
Before moving to package/upload:
If your destination is SPSR for FSAR workflows, continue with FSAR Data Standardization Workflow.