The purpose of this R package (likely to undergo a name change…) is to:
auk
for importing and munging the large eBird datasets*.*@cboettig and @amstrimas are currently developing an
auk
alternative,birddb
. It is likely that, once stable, this R package will usebirddb
as dependency for eBird import and manipulation. For now, however, the functions herein provide a much faster alternative toauk
.
Download development version from GitHub with:
# install.packages("devtools")
# remotes::install_github("trashbirdecology/bbsassistant") # to be safe
remotes::install_github("trashbirdecology/bbsebird")
Prior to using this package, you must have downloaded eBird data. To request and download eBird observations, visit the eBird website. Credentials are required, and may take up to a few business days for approval, depending on use case. For more information on the eBird data see the eBird website, or visit the repository for the Cornell Lab of Ornithology’s offical R package for munging eBird data, auk
.
When your account is approved, you will gain access to the eBird Basic Database (EBD). This package requires two components of the EBD to be saved to local file:
#explicitly load some packages
pkgs <- c("bbsebird")
# install.packages("mapview") # you can use thsi package to get interactive map views..
invisible(lapply(pkgs, library, character.only = TRUE))
rm(pkgs)
If using this README, this is the only RMD chunk you shoudl have to edit. Most important are where the eBird data and BBS shapefiles are stored (dir.orig.data) and where you wish to save resulting data/models (dir.proj). The latter need not exist – if it does not exist the package will create the directory for you.
# REQUIRED ARGUMENTS
dir.orig.data = "C:/Users/jburnett/OneDrive - DOI/bbsebird-testing/" # this will be improved to be more intuitive re: what data?
dir.proj = "C:/users/jburnett/OneDrive - DOI/bbsebird-testing/House_Sparrow/"
species = c("House Sparrow") ## eventually need to add alookup table to ensure species.abbr and speices align.
species.abbr = c("houspa") # see ebird filename for abbreviation
### this needs improvement as well...e.g. a species lookup table to link common-speci-abbrev across BBS and eBird data...
##bbs arguments
usgs.layer = "US_BBS_Route-Paths-Snapshot_Taken-Feb-2020" # name of the USGS BBS route shapefile to use
cws.layer = "ALL_ROUTES" # name of the Canadian (CWS) BBS route shapefile.
##ebird arguments
mmyyyy = "dec-2021" # the month and year of the eBird data downloads on file
# Strongly suggested but optional args
##general arguments
# dir.proj = "C:/Users/jburnett/desktop/testing/"
### see bbsAssistant::region_codes
states = c("us-fl", "us-ga", "us-al")
countries = c("US") ## string of countries Call \code{bbsebird::iso.codes} to find relevant codes for Countries and States/Prov/Territories.
# species = c("Double-crested Cormorant", "Nannopterum auritum", "phalacrocorax auritum")
# species.abbr = c("doccor","dcco", "docco")
year.range = 2008:2019
base.julian.date = lubridate::ymd(paste0(min(year.range), c("-01-01"))) # used as base date for Julian dates.
crs.target = 4326 #target CRS for all created spatial layers
##grid arguments
grid.size = 1.00 # size in decimal degrees (for US/CAN a good est is 1.00dec deg == 111.11km)
##ebird arguments
min.yday = 91
max.yday = 245
## Munge the states and countries indexes for use in dir/proj dir reation
if(!exists("states")) states <- NULL
if(!is.null(states)){regions <- states}else{regions <- countries}
stopifnot(all(tolower(states) %in% tolower(bbsAssistant::region_codes$iso_3166_2)))
This chunk will create new environmental variables for project and data directories based on the project directory and data directory specified above.
# set_proj_shorthand: this will make all directories within a new dir in dir.proj. this is useful for iterating over species/time/space and saving all resulting information in those directories.
subdir.proj <- set_proj_shorthand(species.abbr, regions, grid.size, year.range)
dirs <- dir_spec(dir.orig.data = dir.orig.data,
dir.proj = dir.proj,
subdir.proj = subdir.proj) # create and/or specify
The following chunk creates a spatial sampling grid of size grid.size with units defaulting to the units of crs.target.
study_area <- make_spatial_grid(
dir.out = dirs[['dir.spatial.out']],
states = states,
countries = countries,
crs.target = crs.target,
grid.size = grid.size,
hexagonal = TRUE,
overwrite = TRUE
)
if(is.list(study_area)){
grid <- study_area$grid
overlay <- study_area$grid.overlay
}
plot(grid)
plot(overlay)
rm(study_area)
Create the BBS data. This chunk relieson R package . The resulting data is aligned with the spatial grid (see above).
## if the files already exist, don't overwrite unless you've made changes to data specs
if("bbs_obs.rds" %in% list.files(dirs$dir.bbs.out)){bbs_obs <- readRDS(list.files(dirs$dir.bbs.out, "bbs_obs.rds", full.names=TRUE))}else{
bbs_orig <- bbsAssistant::grab_bbs_data(bbs_dir = dirs$dir.bbs.out)
bbs_obs <- bbsAssistant::munge_bbs_data(
bbs_list = bbs_orig,
states = states,
species = species,
year.range = year.range)
bbs_obs <- bbsebird:::match_col_names(bbs_obs) # munge column names to mesh with eBird
saveRDS(bbs_obs, paste0(dirs$dir.bbs.out, "/bbs_obs.rds")) # suggest saving data to file for easy access
}
# Overlay BBS and study area / sampling grid
### note, sometimes when running this in a notebook/rmd a random .rdf" path error occurs.
#### I have no clue what this bug is. Just try running it again. See also https://github.com/rstudio/rstudio/issues/6260
if("bbs_spatial.rds" %in% list.files(dirs$dir.bbs.out)){bbs_spatial <- readRDS(list.files(dirs$dir.bbs.out, "bbs_spatial.rds", full.names=TRUE))}else{
bbs_spatial <- make_bbs_spatial(
df = bbs_obs,
overwrite=TRUE,
cws.routes.dir = dirs$cws.routes.dir,
usgs.routes.dir = dirs$usgs.routes.dir,
plot.dir = dirs$dir.plots,
crs.target = crs.target,
grid = study_area,
dir.out = dirs$dir.spatial.out
)
saveRDS(bbs_spatial, paste0(dirs$dir.bbs.out, "/bbs_spatial.rds"))
}
## check out the bbs spatial data to ensure things look ok
# plot(bbs_spatial['area']) # cell area
Munge the eBird data (must be saved to file):
## check the specified ebird original data directory for files.
(fns.ebird <- id_ebird_files(
dir.ebird.in = dirs$dir.ebird.in,
dir.ebird.out = dirs$dir.ebird.out,
mmyyyy = mmyyyy,
species = species.abbr,
states.ind = states
))
stopifnot(length(fns.ebird) > 1)
# Import and munge the desired files
ebird <- munge_ebird_data(
fns.ebird = fns.ebird,
species = c(species, species.abbr),
dir.ebird.out = dirs$dir.ebird.out,
countries = countries,
states = states,
# overwrite = FALSE, ## this function checks for existing, munged files iin dir.ebird.out..
years = year.range
)
# Create spatial ebird
ebird_spatial <- make_ebird_spatial(
df = ebird,
crs.target = crs.target,
grid = ifelse(is.list(study_area), study_area[[1]], study_area),
overwrite = FALSE, # this fun checks for existing spatial ebird file in dir.spatial.out
dir.out = dirs$dir.spatial.out
)
## visualizing the ebird_spatial data takes a while, do not recommend.
Create a list of lists and indexes for use in JAGS or elsewhere. We suggest creating a list using make_bundle
and subsequently grabbing useful data from there.
make_bundle
creates site-level covariates in both long (vector) and wide (matrix) form. Matrix form are housed inside Xsite matrix, whereas long-form are within bbs.df and ebird.df.
message("[note] sometimes when running this chunk in notebook/rmarkdown it crashes. try restarting session or running interactively\n")
### make a teeny little bundle for model dev/debugging
bundle.dev <- make_bundle(
bbs = bbs_spatial,
ebird = ebird_spatial,
grid = study_area,
dev.mode = TRUE
)
## recommend saving to file in case you have crashes due to memory or modeling
saveRDS(bundle.dev, paste0(dirs$dir.proj,"/dev-bundle.rds"))
### make full sized bundle
bundle <- make_bundle(
# data
bbs = bbs_spatial,
ebird = ebird_spatial,
grid = study_area,
# optional args
dev.mode = FALSE
)
saveRDS(bundle, paste0(dirs$dir.proj,"/bundle.rds"))
# ### or read in from file...
# (bundle.fns <- list.files(paste0(dirs$dir.proj), pattern="bundle.rds", full.names = TRUE))
# bundle <- readRDS(bundle.fns[1])
–> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>
–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>
–> –> –> –> –>