The Semantic Salmon Data Ecosystem

Author

Data Stewardship Unit (DFO Pacific Region Science Branch)

1 Overview

The Semantic Salmon Data Ecosystem (Semantic Salmon System) connects standards, ontology terms, package metadata, validation tooling, and operational upload pathways so salmon data can move from local analysis into reusable systems.

Canonical Architecture Active Workflows Contributor Guidance

1.1 Entry path (what to use, in order)

  1. DFO Salmon Ontology — canonical semantics
  2. Salmon data package — package metadata structure
  3. metasalmon — validation and quality control
  4. Salmon Data GPT — optional drafting/acceleration assistant
  5. SPSR — operational intake path for many FSAR workflows

Open the branded start page for this full path.

At a high level, the ecosystem has five working parts:

  1. DFO Salmon Ontology — canonical terms and semantics
  2. Salmon data package (SDP) — structured metadata packaging
  3. Validation tooling (metasalmon, related checks) — quality control
  4. SPSR (Salmon Population Summary Repository) — intake and operational publishing path for many FSAR workflows
  5. Salmon Data GPT / guided assistants (optional) — acceleration layer for drafting mappings and metadata

2 1. DFO Salmon Ontology (canonical semantic layer)

The ontology provides shared definitions and persistent IRIs for key domain concepts.

3 2. Salmon data package (package layer)

The salmon data package standardizes package metadata without forcing one rigid dataset schema.

Core files:

  • dataset.csv
  • tables.csv
  • column_dictionary.csv
  • codes.csv (when controlled values apply)

Spec reference: Salmon data package specification (canonical markdown: https://github.com/dfo-pacific-science/smn-data-pkg/blob/main/SPECIFICATION.md)

4 3. Validation tooling (quality layer)

Validation tooling checks package structure and semantic consistency before submission.

Typical checks include:

  • required columns/files present
  • value types and requiredness
  • IRI formatting and ontology-link consistency
  • controlled vocabulary/code-list alignment

5 4. SPSR (operational intake layer)

For FSAR-oriented contributions, SPSR is a major operational destination.

This creates a practical bridge from standards work to real upload and review workflows.

6 5. Guided assistants (optional acceleration)

Assistant tools can help draft mappings and package metadata, but they do not replace validation or governance. They are best used as drafting aids.

7 6. End-to-end flow

  1. Map terms to ontology IRIs
  2. Build salmon data package metadata
  3. Validate package
  4. Upload via SPSR workflow (for applicable FSAR datasets)
  5. Iterate from validation/reviewer feedback

8 7. Why this matters

  • lower onboarding friction for data producers
  • more consistent semantics across teams
  • cleaner FSAR-to-repository pathways
  • better machine-readability for integration and downstream analytics

9 8. Suggested next reads