Skip to contents

Installation

install.packages("remotes")
remotes::install_github("dfo-pacific-science/metasalmon")

One-shot Workflow

Load the built-in NuSEDS Fraser Coho sample and create a review-ready Salmon Data Package in one call.

library(metasalmon)

sample_path <- system.file("extdata", "nuseds-fraser-coho-sample.csv", package = "metasalmon")
fraser_coho <- readr::read_csv(sample_path, show_col_types = FALSE)

pkg_path <- create_sdp(
  fraser_coho,
  dataset_id = "fraser-coho-2024",
  table_id = "escapement",
  overwrite = TRUE
)

pkg_path
list.files(pkg_path, recursive = TRUE)

If path is omitted, create_sdp() writes to your working directory using a default folder name like fraser-coho-2024-sdp. In interactive use it can also mention when a newer metasalmon release is available; set check_updates = FALSE to skip that check.

Review In Excel

Open the output folder and review these files:

  • README-review.txt
  • semantic_suggestions.csv (when suggestions were found)
  • metadata/dataset.csv
  • metadata/tables.csv
  • metadata/column_dictionary.csv
  • metadata/codes.csv (when present)
  • data/*.csv resource files

create_sdp() seeds semantic suggestions by default and auto-fills the top-ranked column-level suggestions into missing dictionary fields (term_iri, property_iri, entity_iri, unit_iri, etc.). It does not overwrite existing non-empty IRI values. Table-level suggestions are still available when table metadata needs them, while code-level suggestions default to factor/categorical source columns only. Use semantic_code_scope = "all" if you want broader code-level seeding.

The inferred metadata includes REVIEW REQUIRED: placeholders for required fields so the package is immediately reviewable in Excel. Replace those placeholders before publishing. The metadata/*.csv files are the canonical package metadata; datapackage.json is a derived export for interoperability.

How To Decide If term_iri Is Correct

Use plain-language checks for each measurement column:

  1. Does the suggested label describe exactly what the column measures?
  2. Does the definition match your intent (not just a similar word)?
  3. Is the scope right (for example species-level vs population-level)?
  4. Is the unit consistent with your values and unit_iri?

Keep the IRI only when all checks pass.

Replace it when the term is close but not exact.

Remove it (leave blank) when no candidate is reliable yet.

When the top auto-applied suggestion is wrong, use semantic_suggestions.csv to pick a better alternative and copy that IRI into metadata/column_dictionary.csv.

Finalize

After Excel edits, save the metadata back to CSV, share the whole folder (or a zip of the whole folder) when you hand it to someone else, then run validation again before publishing:

pkg <- read_salmon_datapackage(pkg_path)
validate_dictionary(pkg$dictionary)
validate_semantics(pkg$dictionary)

For a staged, fully explicit workflow (manual artifact inference and controlled semantic merges), see the publication guide: