Connecting R to Sharepoint enables secure, programmatic data access for reproducible analysis workflows. This is especially useful to data stewards and scientists who want to move away from email and personal hard-drive based data storage and sharing and instead use a cloud service as a central repository for data.
This is particularly useful for data management tasks in Sharepoint, such as:
Downloading data files for analysis
Uploading processed data files
Managing document libraries and folders
Automating data workflows
Warning
Keep in mind, to access the DFO Sharepoint from R using the methods below you need to be on the DFO network or using the VPN.
Using the Microsoft365 Package
It’s possible to access data in SharePoint, including DFO’s SharePoint and others, from R using the Microsoft365R package. This package allows you to authenticate and interact with SharePoint sites, document libraries, and files.
Example: Accessing SharePoint Data
# Install the Microsoft365R package if you haven't alreadyinstall.packages("Microsoft365R")# Load the Microsoft365R packagelibrary('Microsoft365R')library('dplyr')# Clear you cache of authentication tokens if you have issues# Comment this out if you don't want to clear your tokens#AzureAuth::clean_token_directory()# Get the SharePoint site you want to work with using either the name of the SharePoint site or id or URLsite <-get_sharepoint_site('Fishery & Assessment Data Section Wiki')# list the document libraries in the SharePoint sitesite$list_drives()
Running the above code will prompt an authentication process with Azure Active Directory. You may be prompted to sign in with your browser. After signing in successfully, you will see the message Authenticated with Azure Active Directory. Please close this page and return to R.
# get the default document library for the SharePoint sitedrv <- site$get_drive()# Download file from FADS Wiki SharePointdrv$download_file("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv", overwrite =TRUE)# Alernatively, you can load data from SharePoint directly as an R dataframe without downloading it locally. This is useful for large files or when you want to avoid cluttering your local storage.# This will load the file as a dataframe directly into Rfr_pk_bms_not_downloaded <- drv$load_dataframe("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")fr_pk_bms <- readr::read_csv("benchmarks.csv")# Now do whatever data wrangling you need to do with the data! # For example, let's add a column just for demonstrationfr_pk_bms <- fr_pk_bms |>mutate(new_column ="This is a new column")# Save the modified data to a new CSV filewrite.csv(fr_pk_bms, "benchmarks_modified.csv", row.names =FALSE)
You will also want to upload data to SharePoint, whether it’s an updated file or a new output. This can be done using the save_dataframe function.
Lets upload the modified file we just created to the SharePoint site.
# Upload and update a file to the SharePoint sitedrv$save_dataframe(fr_pk_bms, "General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")# Check if the file was uploaded successfullyuploaded_file <- drv$load_dataframe("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")
# Print the file detailsprint(uploaded_file)
Conclusion
This enables secure, programmatic data access for reproducible analysis workflows. Using the Microsoft365R package, you can easily access and manage data in SharePoint from R. This allows for a more streamlined and efficient workflow, especially for data stewards and scientists who work with large datasets and need to collaborate with others.
This approach can help you automate data management tasks, reduce reliance on email and personal storage, and improve collaboration within your team.