
Create a Salmon Data Package directly from raw tables
create_sdp.RdPrimary one-shot wrapper: infer dictionary/table metadata/codes/dataset metadata from raw data tables and immediately write a review-ready Salmon Data Package.
Usage
create_sdp(
resources,
path = NULL,
dataset_id = "dataset-1",
table_id = "table_1",
guess_types = TRUE,
seed_semantics = TRUE,
semantic_sources = c("smn", "gcdfo", "ols", "nvs"),
semantic_max_per_role = 1,
seed_verbose = TRUE,
seed_codes = NULL,
seed_table_meta = TRUE,
seed_dataset_meta = TRUE,
semantic_code_scope = c("factor", "all", "none"),
llm_assess = FALSE,
llm_provider = c("openai", "openrouter", "openai_compatible", "chapi"),
llm_model = NULL,
llm_api_key = NULL,
llm_base_url = NULL,
llm_top_n = 5L,
llm_context_files = NULL,
llm_context_text = NULL,
llm_timeout_seconds = 60,
llm_request_fn = NULL,
check_updates = interactive(),
format = "csv",
overwrite = FALSE,
include_edh_xml = FALSE,
...
)Arguments
- resources
Either a named list of data frames (one per resource table) or a single data frame (converted internally to a one-table list).
- path
Character; directory path where package will be written. If omitted, defaults to
file.path(getwd(), paste0(<dataset_id>-sdp))using a filesystem-safe dataset id slug.- dataset_id
Dataset identifier applied to all inferred metadata rows.
- table_id
Fallback table identifier when
resourcesis a single data frame.- guess_types
Logical; if
TRUE(default), infervalue_typefor each dictionary column.- seed_semantics
Logical; if
TRUE(default), seed semantic suggestions during inference.- semantic_sources
Vector of vocabulary sources passed to
suggest_semantics().- semantic_max_per_role
Maximum number of suggestions retained per I-ADOPT role.
- seed_verbose
Logical; if TRUE, emit progress messages while seeding semantic suggestions.
- seed_codes
Optional
codes.csv-style seed metadata.- seed_table_meta
Optional
tables.csv-style seed metadata. UseTRUE(default) to infer starter table metadata fromresources.- seed_dataset_meta
Optional
dataset.csv-style seed metadata. UseTRUE(default) to infer starter dataset metadata fromresources.- semantic_code_scope
Character string controlling which
codes.csvrows are sent throughsuggest_semantics()during one-shot seeding."factor"(default) analyzes codes sourced from factor columns and low-cardinality character columns in the original data frame(s);"all"analyzes all inferred or supplied code rows;"none"skips code-level semantic suggestions.- llm_assess
Logical; if
TRUE, run the optional LLM shortlist assessment insidesuggest_semantics().- llm_provider
LLM provider preset forwarded to
suggest_semantics().- llm_model
Optional LLM model identifier forwarded to
suggest_semantics().- llm_api_key
Optional API key override forwarded to
suggest_semantics().- llm_base_url
Optional OpenAI-compatible base URL forwarded to
suggest_semantics().- llm_top_n
Maximum number of retrieved candidates sent to the LLM per target.
- llm_context_files
Optional local context files forwarded to
suggest_semantics(). See that function for supported file types, including HTML, DOCX,.R,.Rmd,.qmd, PDF, and Excel context files.- llm_context_text
Optional inline context snippets forwarded to
suggest_semantics().- llm_timeout_seconds
Timeout for each LLM request in seconds.
- llm_request_fn
Advanced/test hook overriding the low-level OpenAI-compatible request function.
- check_updates
Logical; if
TRUE, run a short, non-fatalcheck_for_updates()call after writing the package and mention newer releases only when one is available. Defaults tointeractive().- format
Character; resource format:
"csv"(default, only format supported)- overwrite
Logical; if
FALSE(default), errors if path exists. IfTRUE, replacement is only allowed for empty directories or directories previously written bymetasalmon.- include_edh_xml
Logical; when
TRUE, writes an HNAP-aware EDH XML metadata file tometadata/metadata-edh-hnap.xmlusingedh_build_hnap_xml(). The default isFALSE.- ...
Deprecated legacy EDH arguments accepted for backwards compatibility:
edh_profile,EDH_Profile, andEDH_profileall enable EDH XML export and must be"dfo_edh_hnap"when supplied.edh_xml_pathis ignored with a warning because XML now always writes to the default metadata path. Any other extra arguments error.
Details
This one-shot helper creates a review-ready package by default:
semantic suggestions are seeded and the top-ranked column-level suggestions
are auto-applied only into missing dictionary IRI fields. Table-level
observation-unit suggestions stay enabled, and create_sdp() can
auto-apply them into missing tables.csv$observation_unit_iri values when
the suggestion still looks lexically compatible with the available table
context (prefer observation_unit/description, otherwise fall back to
table_label/table_id); compatible suggestions can also backfill
tables.csv$observation_unit labels when missing. To reduce review
noise conservatively, code-level suggestions default to factor and
low-cardinality character source columns only; set
semantic_code_scope = "all" to broaden that or "none" to disable it.
The package root contains README-review.txt,
semantic_suggestions.csv (when available), datapackage.json,
metadata/, and data/. Review the prefilled values already written into
metadata/tables.csv and metadata/column_dictionary.csv first; use
semantic_suggestions.csv as a fallback shortlist when you want more
context or a better match. To keep that review file usable,
semantic_suggestions.csv trims code-level suggestions that do not have
enough human-readable context to review safely. When llm_assess = TRUE,
the same review file also carries llm_* columns so the shortlisted LLM
judgment stays explicit and reviewable. Any auto-applied column/table IRI
draft is written back into the metadata CSVs as a REVIEW:-prefixed value
for manual confirmation there. Required-field review placeholders are also
inserted into the inferred metadata files. In
interactive use, create_sdp() can also mention an available package update;
set check_updates = FALSE to skip that network check. The package bundles
two Fraser coho examples: nuseds-fraser-coho-sample.csv (30 rows across
1996-2024) for the quickest demo, and nuseds-fraser-coho-2023-2024.csv
(173 rows from the official Open Government Canada Fraser and BC Interior
workbook) for a fuller multi-year example. The bundled
system.file("extdata", "example-data-README.md", package = "metasalmon")
note points to the upstream record/resource URLs, licensing, and the
repository data-raw/ script used to derive the fuller example.
Examples
if (FALSE) { # \dontrun{
data_path <- system.file("extdata", "nuseds-fraser-coho-sample.csv", package = "metasalmon")
fraser_coho <- readr::read_csv(data_path, show_col_types = FALSE)
pkg <- create_sdp(
fraser_coho,
dataset_id = "fraser-coho-2024",
table_id = "escapement",
overwrite = FALSE
)
} # }