
5-Minute Quickstart
metasalmon.RmdBefore You Start
Complete the one-time Setup and Credentials guide first if you want to make sure GitHub installs work cleanly and any optional LLM provider is configured before you begin.
Installation
install.packages("remotes")
remotes::install_github("dfo-pacific-science/metasalmon")One-shot Workflow
Load the built-in Fraser Coho 2023-2024 example (173 rows) and create a review-ready Salmon Data Package in one call.
library(metasalmon)
data_path <- system.file("extdata", "nuseds-fraser-coho-2023-2024.csv", package = "metasalmon")
fraser_coho <- readr::read_csv(data_path, show_col_types = FALSE)
pkg_path <- create_sdp(
fraser_coho,
path = "fraser-coho-2023-2024-sdp",
dataset_id = "fraser-coho-2023-2024",
table_id = "escapement",
check_updates = FALSE,
overwrite = TRUE
)
pkg_path
list.files(pkg_path, recursive = TRUE)If path is omitted, create_sdp() writes to
your working directory using a default folder name like
fraser-coho-2023-2024-sdp. In interactive use it can also
mention when a newer metasalmon release is available; set
check_updates = FALSE to skip that check.
This quickstart uses the bundled
nuseds-fraser-coho-2023-2024.csv example, a 173-row Fraser
coho slice derived from the official Open Government Canada Fraser and
BC Interior workbook.
Open Government Canada record: https://open.canada.ca/data/en/dataset/c48669a3-045b-400d-b730-48aafe8c5ee6
See example-data-README.md in the bundled
extdata folder for provenance and licensing. The package
also ships a matching starter dictionary at
nuseds-fraser-coho-2023-2024-column_dictionary.csv if you
want a ready-made context file for the package-native LLM review
path.
If you also need the DFO Enterprise Data Hub / GeoNetwork XML, use the one-shot path:
pkg_path <- create_sdp(
fraser_coho,
path = "fraser-coho-2023-2024-sdp",
dataset_id = "fraser-coho-2023-2024",
table_id = "escapement",
include_edh_xml = TRUE,
check_updates = FALSE,
overwrite = TRUE
)That writes the HNAP-aware EDH XML to
metadata/metadata-edh-hnap.xml. If you are working from an
existing dataset.csv row instead of a one-shot package
build, call edh_build_hnap_xml() directly.
Optional LLM Review Later
The base quickstart does not require an API key. If
you want llm_assess = TRUE, finish the one-time Setup
and Credentials guide first, then rerun the
create_sdp() call with your chosen provider.
When you want the LLM to use supporting README notes, CSV dictionaries, Excel workbooks, or PDF reports as context, continue with:
Review In Excel
Open README-review.txt, then review these files in this
order:
metadata/column_dictionary.csvmetadata/tables.csvmetadata/dataset.csv-
metadata/codes.csv(when present) -
semantic_suggestions.csv(only if you want more context or a better match)
That metadata/column_dictionary.csv file is also a
perfectly reasonable llm_context_files input when you want
the LLM review step to reason from the package itself instead of a
separate methods note.
create_sdp() seeds semantic suggestions by default and
auto-fills top-ranked compatible drafts directly into blank semantic
fields in the metadata CSVs. That includes column-level IRIs in
metadata/column_dictionary.csv and strong table-level
observation-unit drafts in metadata/tables.csv, using
observation_unit/description when available
and otherwise falling back to
table_label/table_id. Any auto-applied
semantic IRI draft is written back as REVIEW: <iri>
so you still confirm it manually. It does not overwrite existing
non-empty semantic values. Code-level suggestions default to factor and
low-cardinality character source columns; use
semantic_code_scope = "all" if you want broader code-level
seeding.
The inferred metadata includes MISSING DESCRIPTION: and
MISSING METADATA: placeholders for required fields so the
package is immediately reviewable in Excel. Replace those placeholders
before publishing. The metadata/*.csv files are the
canonical package metadata; datapackage.json is a derived
export for interoperability.
How To Decide If term_iri Is Correct
Use plain-language checks for each measurement column:
- Does the suggested label describe exactly what the column measures?
- Does the definition match your intent (not just a similar word)?
- Is the scope right (for example species-level vs population-level)?
- Is the unit consistent with your values and
unit_iri?
Keep the IRI only when all checks pass.
Replace it when the term is close but not exact.
Remove it (leave blank) when no candidate is reliable yet.
When the top auto-applied suggestion is wrong, use
semantic_suggestions.csv to pick a better alternative and
copy that IRI into metadata/column_dictionary.csv.
If no candidate fits, request a new term instead of forcing a bad match:
- shared cross-organization/domain terms -> https://github.com/salmon-data-mobilization/salmon-domain-ontology/issues/new/choose
- DFO-specific policy/operations terms -> https://github.com/dfo-pacific-science/dfo-salmon-ontology/issues/new/choose
Finalize
After Excel edits, save the metadata back to CSV and reload the package in R:
pkg <- read_salmon_datapackage(pkg_path)
validate_salmon_datapackage(pkg_path, require_iris = FALSE)That review-state validation is the first follow-up check. For the full post-review workflow — reloading the package, detecting unresolved semantic gaps, deciding shared salmon-domain vs DFO-specific routing, drafting term requests, rebuilding EDH XML if needed, and only then running strict final validation — continue with:
For a staged, fully explicit workflow where you assemble metadata
tables manually instead of continuing from a reviewed
create_sdp() package, use: