
Suggest semantic annotations for a dictionary
suggest_semantics.RdSearches external vocabularies to suggest IRIs for measurement columns that
are missing semantic annotations. For each measurement column with missing
I-ADOPT component fields (term_iri, property_iri, entity_iri, unit_iri,
constraint_iri), this function queries vocabulary services and ranks
results by relevance, with SMN queried first for salmon-domain roles and
GCDFO retained as a distinct DFO-specific source.
Usage
suggest_semantics(
df,
dict,
sources = c("smn", "gcdfo", "ols", "nvs"),
include_dwc = FALSE,
max_per_role = 3,
search_fn = find_terms,
codes = NULL,
table_meta = NULL,
dataset_meta = NULL
)Arguments
- df
A data frame or tibble containing the data being documented.
- dict
A dictionary tibble created by
infer_dictionary()(may have incomplete semantic fields).- sources
Character vector of vocabulary sources to search. Options are
"smn"(Salmon Domain Ontology via content negotiation),"gcdfo"(DFO-specific source),"ols"(Ontology Lookup Service),"nvs"(NERC Vocabulary Server), and"bioportal"(requiresBIOPORTAL_APIKEYenvironment variable). Default isc("smn", "gcdfo", "ols", "nvs").- include_dwc
Logical; if
TRUE, also attach DwC-DP export mappings (viasuggest_dwc_mappings()) as a parallel attributedwc_mappings. Default isFALSEto keep the UI simple for non-DwC users.- max_per_role
Maximum number of suggestions to keep per I-ADOPT role (variable, property, entity, unit, constraint) per column. Default is 3.
- search_fn
Function used to search terms. Defaults to
find_terms(). Can be replaced for testing or custom search strategies.- codes
Optional
codes.csv-like tibble. When provided, suggestions are also generated for missingcodes.csv$term_iritargets.- table_meta
Optional
tables.csv-like tibble. When provided, suggestions are generated for missingtables.csv$observation_unit_iri.- dataset_meta
Optional
dataset.csv-like tibble. When provided, suggestions are generated for missingdataset.csv$keywordsas candidate semantic keywords (IRIs intended for keyword curation).
Value
The dictionary tibble (unchanged) with a semantic_suggestions
attribute containing a tibble of suggested IRIs. The suggestions tibble
starts with column_name, dictionary_role, table_id, and dataset_id
so the original dictionary term is visible before the candidate match.
It also includes target_scope, target_sdp_file, and
target_sdp_field so users can see exactly where each accepted suggestion
would land in the Salmon Data Package. Additional columns include
search_query, column_label, column_description, label, iri,
source, ontology, and definition. If the underlying search results
include a score column, it is preserved for downstream filtering.
For non-column targets, the tibble also includes explicit destination
context (target_row_key, target_label, target_description,
code_value, code_label, code_description) so table-, dataset-, and
code-level rows are inspectable without extra joins.
Details
The function uses the column's label or description as the search query and returns suggestions as an attribute on the dictionary tibble. This allows you to review candidates before accepting them into your dictionary.
Column targets keep the existing behavior: only columns with
column_role == "measurement" are processed for missing I-ADOPT fields.
When codes, table_meta, or dataset_meta are supplied, additional
target rows are generated for codes.csv, tables.csv, and dataset.csv
respectively.
A term can legitimately appear more than once with different
dictionary_role values (for example as both a variable and a property).
In that case, match_type still describes lexical match quality, while
target_sdp_field tells you where that suggestion would be written in the
package. The output adds role_collision and role_collision_note so
variable-vs-property collisions stay explicit and destination-aware.
After calling this function, access suggestions with:
Suggestions stay separate by default. Review them first, then use
apply_semantic_suggestions() for an explicit opt-in merge, or copy values
manually when you need finer control.
See also
find_terms() for direct vocabulary searches, infer_dictionary()
for creating starter dictionaries, apply_semantic_suggestions() for
explicitly filling selected IRI fields, validate_dictionary() for
checking dictionary completeness.
Examples
if (FALSE) { # \dontrun{
# Create a starter dictionary
dict <- infer_dictionary(my_data, dataset_id = "example", table_id = "main")
# Get semantic suggestions for measurement columns
dict_with_suggestions <- suggest_semantics(my_data, dict)
# View the suggestions
suggestions <- attr(dict_with_suggestions, "semantic_suggestions")
print(suggestions)
# Filter suggestions for a specific column
spawner_suggestions <- suggestions[suggestions$column_name == "SPAWNER_COUNT", ]
# Explicitly apply the top suggestion for one column without overwriting
# any existing IRIs in the dictionary
dict <- apply_semantic_suggestions(dict_with_suggestions, columns = "SPAWNER_COUNT")
} # }