
Read all CSV files from a GitHub directory
read_github_csv_dir.RdLists all CSV files in a GitHub repository directory and reads them into a
named list of tibbles. Similar to using dir() with lapply() to read
multiple local CSV files.
Usage
read_github_csv_dir(
path,
ref = "main",
repo = NULL,
token = NULL,
pattern = "\\.csv$",
...
)Arguments
- path
Path to the directory inside the repository (e.g.,
"data/observations"), or a full GitHub URL pointing to a directory. Trailing slashes are optional.- ref
Git reference: branch name, tag, or commit SHA. Defaults to
"main". For reproducible analyses, prefer tags or commit SHAs. Ignored whenpathis already a full URL with a ref embedded.- repo
Repository slug in
"owner/name"form. Required whenpathis a relative path; optional whenpathis a full URL.- token
Optional GitHub PAT override. If
NULL(default), uses the token fromgh::gh_token(), which is typically set byms_setup_github().- pattern
Optional regular expression to filter CSV file names. Defaults to
"\\.csv$"(files ending in.csv). Set toNULLto match all files in the directory (not just CSVs).- ...
Additional arguments passed to
readr::read_csv()for each file, such ascol_types,skip,n_max, etc.
Value
A named list of tibbles, where names are the CSV file names (without
the .csv extension). Returns an empty list if no CSV files are found.
Details
This function uses the GitHub API to list directory contents, filters for CSV
files, then reads each file using read_github_csv(). Authentication is
required even for public repositories when using the API.
Before using this function, run ms_setup_github() once to configure
authentication. For private repositories, your PAT must have the repo
scope.
For reproducible analyses, pin to a specific tag or commit SHA rather than
a branch name like "main", since branch contents can change over time.
Manual alternative: You can achieve the same result by using gh::gh()
to list directory contents, filtering for CSV files, then looping through
them with read_github_csv(). See the vignette for an example.
See also
read_github_csv() for reading a single CSV file,
ms_setup_github() for authentication setup.
Examples
if (FALSE) { # \dontrun{
# First, set up authentication (run once)
ms_setup_github(repo = "myorg/myrepo")
# Read all CSV files from a directory
data_list <- read_github_csv_dir("data/observations", repo = "myorg/myrepo")
# Access individual data frames by name
observations <- data_list$observations
metadata <- data_list$metadata
# Pin to a release tag for reproducibility
data_v1 <- read_github_csv_dir(
"data/observations",
ref = "v1.0.0",
repo = "myorg/myrepo"
)
# Custom pattern to match specific files
subset <- read_github_csv_dir(
"data",
repo = "myorg/myrepo",
pattern = "^obs_.*\\.csv$"
)
# Pass arguments to read_csv for all files
data_typed <- read_github_csv_dir(
"data/observations",
repo = "myorg/myrepo",
col_types = "ccin"
)
} # }