
Publishing Data Packages
data-dictionary-publication.RmdOverview
This guide follows the 5-Minute Quickstart and focuses on the publishing hardening steps. If you have not yet generated a starter dictionary yet, start with the 5-Minute Quickstart first.
When all of the pieces are ready, metasalmon writes
files matching the Salmon Data Package specification so you can upload
or hand the folder to someone else with confidence.
create_sdp() is the main one-shot path; this article covers
the more explicit, manual workflow where you assemble the metadata
tables yourself and then call
write_salmon_datapackage().
1) Start with your data
library(metasalmon)
library(readr)
# Replace with your own data path
df <- read_csv("my-table.csv")If you already have a dictionary and metadata from the quickstart, skip directly to Describe the dataset and tables.
Keep a working copy of your data frame handy so you can re-run these steps whenever the source data changes.
2) Build a starter column dictionary
If you already ran the quickstart and already have dict,
skip this section.
dict <- infer_dictionary(
df,
dataset_id = "my-dataset-2026",
table_id = "main-table"
)The dictionary lists every column and assigns a
column_role (identifier, attribute, measurement, temporal,
or categorical). Move through the rows and fill in
column_label, column_description, and
value_type so reviewers understand what each field means,
and mark columns as required when they must appear in every
row.
3) Describe the dataset and tables
dataset_meta <- tibble::tibble(
dataset_id = "my-dataset-2026",
title = "My Project Data",
description = "Sample data describing salmon measurements",
creator = "Your Team",
contact_name = "Data Steward",
contact_email = "data@example.gov",
license = "Open Government License - Canada"
)
table_meta <- tibble::tibble(
dataset_id = "my-dataset-2026",
table_id = "main-table",
file_name = "data/main-table.csv",
table_label = "Main Salmon Table",
description = "Escapement and effort data by population"
)Include extra columns such as spatial_extent,
temporal_start, or table_label when they help
others understand the scope.
4) Add codes lists when needed
Only create codes.csv when a column uses categorical
values (species, run_type, gear, etc.). Each row ties a
code_value to a short label and, ideally, the ontology term
that explains what the code means.
codes <- tibble::tibble(
dataset_id = "my-dataset-2026",
table_id = "main-table",
column_name = "RUN_TYPE",
code_value = "FALL",
code_label = "Fall run timing"
)If the column reuses a published controlled vocabulary (like the DFO
Salmon Ontology), include the matching IRI in term_iri so
automated tools can link to the definition. In the one-shot
create_sdp() workflow, code-level semantic suggestions are
seeded automatically only for factor/categorical source columns unless
you opt into semantic_code_scope = "all".
5) Create the package
resources <- list(main = df)
pkg_path <- write_salmon_datapackage(
resources = resources,
dataset_meta = dataset_meta,
table_meta = table_meta,
dict = dict,
codes = codes,
path = "my-data-package",
format = "csv",
overwrite = TRUE
)
list.files(pkg_path, recursive = TRUE)This writes the canonical metadata CSV files under
metadata/, the data tables under data/, and a
derived datapackage.json at the package root. The
metadata/*.csv files are the source of truth; if they
disagree with datapackage.json, fix the CSV metadata and
rewrite the package. The folder is now ready for publication, archiving,
or sharing with colleagues. Share the whole folder (or a zip of the
whole folder), not just datapackage.json.
Optional: include DwC-DP export hints
You can attach optional Darwin Core Data Package (DwC-DP) mappings when you need an export view for biodiversity tools. The default is OFF to keep SDP canonical.
dict <- readr::read_csv("inst/extdata/column_dictionary.csv", show_col_types = FALSE)
sem <- suggest_semantics(dict, include_dwc = TRUE)
attr(sem, "dwc_mappings") |>
dplyr::filter(dwc_table %in% c("event", "occurrence")) |>
dplyr::select(column_name, dwc_table, dwc_field, term_iri)Keep the SDP column names intact; use the DwC mappings only when exporting a DwC-DP view.
Optional: export EDH XML metadata
When your publication workflow includes DFO Enterprise Data Hub /
GeoNetwork, edh_build_iso19139_xml() now defaults to the
richer HNAP-aware EDH export and still offers the older compact ISO
19139 path as an explicit fallback.
edh_hnap_xml <- file.path(pkg_path, "metadata", "metadata-edh-hnap.xml")
edh_build_iso19139_xml(dataset_meta, output_path = edh_hnap_xml)
edh_iso_xml <- file.path(pkg_path, "metadata", "metadata-iso19139.xml")
edh_build_iso19139_xml(
dataset_meta,
output_path = edh_iso_xml,
profile = "iso19139"
)
file.exists(edh_hnap_xml)
file.exists(edh_iso_xml)The default HNAP-aware path recognizes extra optional columns when
present, including created, modified,
status, distribution_url,
reference_system, bbox_*, and localized fields
like title_fr / description_fr.
Validate and enrich either XML output against your local EDH profile before production upload.
Using suggest_dwc_mappings() directly
For more control over DwC-DP mapping suggestions, use
suggest_dwc_mappings():
dict <- tibble::tibble(
column_name = c("event_date", "decimal_latitude", "scientific_name"),
column_label = c("Event Date", "Decimal Latitude", "Scientific Name"),
column_description = c("Date the event occurred", "Latitude in decimal degrees", "Species scientific name")
)
dict <- suggest_dwc_mappings(dict)
attr(dict, "dwc_mappings")
# Shows suggested DwC-DP table/field mappings with term IRIsSemantic suggestions with role-aware sources
When using suggest_semantics(), the function
automatically queries role-appropriate sources:
# Default: ontology suggestions only (DwC mappings OFF)
sem <- suggest_semantics(dict)
# Include DwC-DP mappings alongside ontology suggestions
sem_with_dwc <- suggest_semantics(dict, include_dwc = TRUE)
# View ontology suggestions
suggestions <- attr(sem, "semantic_suggestions")
# View DwC mappings (only when include_dwc = TRUE)
dwc_maps <- attr(sem_with_dwc, "dwc_mappings")The ontology suggestions use role-aware ranking (Phase 2) that prefers: - QUDT for units - GBIF/WoRMS for taxa/entities - STATO/OBA for properties - gcdfo patterns for methods
Terms from Wikidata are flagged with
alignment_only = TRUE and ranked lower.
Validation before publication
- Run
validate_dictionary(dict)to ensure the dictionary has required columns and validcolumn_role/value_typecombinations. - If you generated
codes.csv, double-check that every code used in the data has an entry there. - Re-open the package with
read_salmon_datapackage(pkg_path)to confirm the metadata, dictionary, and data align.
Next steps
- See the “How It Fits Together” section in the README for a visual map of how the components interact.
- Read the Linking to Standard Vocabularies guide when you want to align your dictionary with published vocabularies.
- Try the Using AI to Document Your Data workflow for drafting descriptions and ontology-aligned metadata quickly.