Access Data in SharePoint from R

Why Connect to Sharepoint?

Connecting R to Sharepoint enables secure, programmatic data access for reproducible analysis workflows. This is especially useful to data stewards and scientists who want to move away from email and personal hard-drive based data storage and sharing and instead use a cloud service as a central repository for data.

This is particularly useful for data management tasks in Sharepoint, such as:

  • Downloading data files for analysis
  • Uploading processed data files
  • Managing document libraries and folders
  • Automating data workflows
Warning

Keep in mind, to access the DFO Sharepoint from R using the methods below you need to be on the DFO network or using the VPN.

Using the Microsoft365 Package

It’s possible to access data in SharePoint, including DFO’s SharePoint and others, from R using the Microsoft365R package. This package allows you to authenticate and interact with SharePoint sites, document libraries, and files.

Example: Accessing SharePoint Data

# Install the Microsoft365R package if you haven't already
install.packages("Microsoft365R")

# Load the Microsoft365R package
library('Microsoft365R')
library('dplyr')

# Clear you cache of authentication tokens if you have issues
# Comment this out if you don't want to clear your tokens
#AzureAuth::clean_token_directory()

# Get the SharePoint site you want to work with using either the name of the SharePoint site or id or URL
site <- get_sharepoint_site('Fishery & Assessment Data Section Wiki')

# list the document libraries in the SharePoint site
site$list_drives()

Running the above code will prompt an authentication process with Azure Active Directory. You may be prompted to sign in with your browser. After signing in successfully, you will see the message Authenticated with Azure Active Directory. Please close this page and return to R.

# get the default document library for the SharePoint site
drv <- site$get_drive()

# Download file from FADS Wiki SharePoint
drv$download_file("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv", overwrite = TRUE)

# Alernatively, you can load data from SharePoint directly as an R dataframe without downloading it locally. This is useful for large files or when you want to avoid cluttering your local storage.
# This will load the file as a dataframe directly into R
fr_pk_bms_not_downloaded <- drv$load_dataframe("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")

fr_pk_bms <- readr::read_csv("benchmarks.csv")

# Now do whatever data wrangling you need to do with the data! 
# For example, let's add a column just for demonstration
fr_pk_bms <- fr_pk_bms |>
  mutate(new_column = "This is a new column")


# Save the modified data to a new CSV file
write.csv(fr_pk_bms, "benchmarks_modified.csv", row.names = FALSE)

You will also want to upload data to SharePoint, whether it’s an updated file or a new output. This can be done using the save_dataframe function.

Lets upload the modified file we just created to the SharePoint site.

# Upload and update a file to the SharePoint site
drv$save_dataframe(fr_pk_bms, "General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")

# Check if the file was uploaded successfully
uploaded_file <- drv$load_dataframe("General/FSAR Data/Fraser Pinks/data/generated/benchmarks.csv")
# Print the file details
print(uploaded_file)

Conclusion

This enables secure, programmatic data access for reproducible analysis workflows. Using the Microsoft365R package, you can easily access and manage data in SharePoint from R. This allows for a more streamlined and efficient workflow, especially for data stewards and scientists who work with large datasets and need to collaborate with others.

This approach can help you automate data management tasks, reduce reliance on email and personal storage, and improve collaboration within your team.

Additional Resources