
Infer a starter dictionary from a data frame
infer_dictionary.RdProposes a starter dictionary (column dictionary schema) from raw data by guessing column types, roles, and basic metadata.
Usage
infer_dictionary(
df,
guess_types = TRUE,
dataset_id = "dataset-1",
table_id = "table_1",
seed_semantics = FALSE,
semantic_sources = c("smn", "gcdfo", "ols", "nvs"),
semantic_max_per_role = 1,
seed_verbose = TRUE,
seed_codes = NULL,
seed_table_meta = NULL,
seed_dataset_meta = NULL
)Arguments
- df
A data frame or tibble to analyze. Or, when provided as a named list of data frames,
infer_dictionary()infers each table and returns a combined dictionary.- guess_types
Logical; if
TRUE(default), infer value types from data.- dataset_id
Character; dataset identifier (default: "dataset-1").
- table_id
Character; table identifier (default: "table_1").
- seed_semantics
Logical; if
TRUE, runsuggest_semantics()and attach the resultingsemantic_suggestionsattribute to the returned dictionary.- semantic_sources
Character vector of vocabulary sources passed to
suggest_semantics()whenseed_semantics = TRUE. Default:c("smn", "gcdfo", "ols", "nvs").- semantic_max_per_role
Maximum number of suggestions retained per I-ADOPT role when seeding suggestions. Default:
1.- seed_verbose
Logical; if TRUE, print a short progress message while seeding semantic suggestions.
- seed_codes
Optional
codes.csv-style tibble forwarded tosuggest_semantics()whenseed_semantics = TRUE.- seed_table_meta
Optional
tables.csv-style tibble forwarded tosuggest_semantics()whenseed_semantics = TRUE.- seed_dataset_meta
Optional
dataset.csv-style tibble forwarded tosuggest_semantics()whenseed_semantics = TRUE.
Value
A tibble with dictionary schema columns in canonical Salmon Data
Package order: dataset_id, table_id, column_name, column_label,
column_description, term_iri, property_iri, entity_iri,
constraint_iri, method_iri, unit_label, unit_iri, term_type,
value_type, column_role, required.
Examples
if (FALSE) { # \dontrun{
df <- data.frame(
species = c("Coho", "Chinook"),
count = c(100, 200),
date = as.Date(c("2024-01-01", "2024-01-02"))
)
dict <- infer_dictionary(df)
# Optional: seed semantic suggestions from vocabulary services
# (SMN is queried first; GCDFO is a distinct DFO-specific source)
dict <- infer_dictionary(
df,
seed_semantics = TRUE,
semantic_sources = c("smn", "gcdfo", "ols", "nvs")
)
suggestions <- attr(dict, "semantic_suggestions")
} # }