The Semantic Salmon Data Ecosystem

Author

Data Stewardship Unit (DFO Pacific Region Science Branch)

1 Overview

The Semantic Salmon Data Ecosystem (Salmon Data Integration System) connects standards, ontology terms, package metadata, validation tooling, optional guided review, and operational upload pathways so salmon data can move from local analysis into reusable systems.

Canonical Architecture Active Workflows Contributor Guidance

1.1 Entry path (what to use, in order)

  1. DFO Salmon Ontology — canonical semantics
  2. Salmon data package — canonical package structure
  3. metasalmon — package creation, review, and quality control
  4. Guided review (optional) — SMN-GPT or another constrained assistant for ambiguity triage
  5. SPSR — package-first operational intake path

Open the branded start page for this full path.

At a high level, the ecosystem has five working parts:

  1. DFO Salmon Ontology — canonical terms and semantics
  2. Salmon data package (SDP) — structured metadata packaging
  3. metasalmon and related checks — package creation, review, and validation
  4. SPSR (Salmon Population Summary Repository) — intake and operational publishing path for many FSAR workflows
  5. Guided review tools (optional) — acceleration layer for ambiguity triage after a package exists

2 1. DFO Salmon Ontology (canonical semantic layer)

The ontology provides shared definitions and persistent IRIs for key domain concepts.

3 2. Salmon data package (package layer)

The salmon data package standardizes package metadata without forcing one rigid dataset schema.

Canonical layout:

my-salmon-data-package/
├── datapackage.json          # optional
├── metadata/
│   ├── dataset.csv
│   ├── tables.csv
│   ├── column_dictionary.csv
│   └── codes.csv             # optional
└── data/
    └── *.csv

Spec reference: Salmon data package specification (canonical markdown: https://github.com/dfo-pacific-science/smn-data-pkg/blob/main/SPECIFICATION.md)

4 3. metasalmon + validation tooling (quality layer)

metasalmon is the maintained package workflow for creating, reviewing, and validating Salmon Data Packages.

Use the package docs when you need the actual workflow:

Typical checks include:

  • required columns and files present
  • value types and requiredness
  • IRI formatting and ontology-link consistency
  • controlled vocabulary and code-list alignment
  • provenance completeness for transformed fields

5 4. SPSR (operational intake layer)

For FSAR-oriented contributions, SPSR is a major operational destination.

Current working direction:

  • use a package-first route
  • derive wizard or bulk upload files from the same package
  • support route-scoped intake beginning with CU/composite, SMU, and Population

6 5. Guided review (optional acceleration)

Assistant tools can help review a package, draft mappings, and triage possible ontology gaps, but they do not replace validation or governance. They are best used as constrained review aids after the package exists, not as a second source of truth.

7 6. End-to-end flow

  1. Map terms to ontology IRIs
  2. Build a salmon data package
  3. Review + validate the package with metasalmon
  4. Use guided review only for ambiguity triage or candidate term proposals when needed
  5. Generate the appropriate SPSR route-scoped upload from the package
  6. Iterate from validation or reviewer feedback

8 7. Why this matters

  • lower onboarding friction for data producers
  • more consistent semantics across teams
  • cleaner FSAR-to-repository pathways
  • better machine-readability for integration and downstream analytics
  • fewer parallel spreadsheet-only workflows drifting away from canonical metadata

9 8. Suggested next reads