The purpose of this R package (likely to undergo a name change…) is to:

  1. provide a (currently) faster alternative to the R package auk for importing and munging the large eBird datasets*.
  2. integrate the BBS and eBird observation datasets for use in hierarchical modeling efforts by assigning point-referenced data to a gridded surface.
  3. provide modeling workflow for analysing spatio-temporal dynamics in Nimble or JAGS.

*@cboettig and @amstrimas are currently developing an auk alternative, birddb. It is likely that, once stable, this R package will use birddb as dependency for eBird import and manipulation. For now, however, the functions herein provide a much faster alternative to auk.

Installation

Download development version from GitHub with:

# install.packages("devtools")
# remotes::install_github("trashbirdecology/bbsassistant") # to be safe
remotes::install_github("trashbirdecology/bbsebird")

eBird Data Requirements

Prior to using this package, you must have downloaded eBird data. To request and download eBird observations, visit the eBird website. Credentials are required, and may take up to a few business days for approval, depending on use case. For more information on the eBird data see the eBird website, or visit the repository for the Cornell Lab of Ornithology’s offical R package for munging eBird data, auk.

When your account is approved, you will gain access to the eBird Basic Database (EBD). This package requires two components of the EBD to be saved to local file:

  1. the observations (i.e. counts)
  2. the sampling events (i.e. information about the observation process)

DATA MUNGING

Step 1: Setup

#explicitly load some packages
pkgs <- c("bbsebird")
# install.packages("mapview") # you can use thsi package to get interactive map views..
invisible(lapply(pkgs, library, character.only = TRUE))
rm(pkgs)

If using this README, this is the only RMD chunk you shoudl have to edit. Most important are where the eBird data and BBS shapefiles are stored (dir.orig.data) and where you wish to save resulting data/models (dir.proj). The latter need not exist – if it does not exist the package will create the directory for you.

# REQUIRED ARGUMENTS
dir.orig.data  = "C:/Users/jburnett/OneDrive - DOI/bbsebird-testing/" # this will be improved to be more intuitive re: what data? 
dir.proj       = "C:/users/jburnett/OneDrive - DOI/bbsebird-testing/House_Sparrow/"
species             = c("House Sparrow") ## eventually need to add alookup table to ensure species.abbr and speices align.
species.abbr        = c("houspa") # see ebird filename for abbreviation
### this needs improvement as well...e.g. a species lookup table to link common-speci-abbrev across BBS and eBird data...
##bbs arguments
usgs.layer          = "US_BBS_Route-Paths-Snapshot_Taken-Feb-2020" # name of the USGS BBS route shapefile to use
cws.layer           = "ALL_ROUTES" # name of the Canadian (CWS) BBS route shapefile.
##ebird arguments
mmyyyy              = "dec-2021" # the month and year of the eBird data downloads on file

# Strongly suggested but optional args
##general arguments
# dir.proj  = "C:/Users/jburnett/desktop/testing/"


### see bbsAssistant::region_codes
states              = c("us-fl", "us-ga", "us-al")
countries           = c("US") ## string of  countries Call \code{bbsebird::iso.codes} to find relevant codes for Countries and States/Prov/Territories.
# species             = c("Double-crested Cormorant", "Nannopterum auritum", "phalacrocorax auritum")
# species.abbr        = c("doccor","dcco", "docco")

year.range          = 2008:2019
base.julian.date    = lubridate::ymd(paste0(min(year.range), c("-01-01"))) # used as base date for Julian dates.
crs.target          = 4326 #target CRS for all created spatial layers
##grid arguments
grid.size           = 1.00 # size in decimal degrees (for US/CAN a good est is 1.00dec deg == 111.11km)

##ebird arguments
min.yday            = 91
max.yday            = 245

## Munge the states and countries indexes for use in dir/proj dir reation
if(!exists("states")) states <- NULL
if(!is.null(states)){regions <- states}else{regions <- countries}
stopifnot(all(tolower(states) %in% tolower(bbsAssistant::region_codes$iso_3166_2)))

This chunk will create new environmental variables for project and data directories based on the project directory and data directory specified above.

# set_proj_shorthand: this will make all directories within a new dir in dir.proj. this is useful for iterating over species/time/space and saving all resulting information in those directories.
subdir.proj <-  set_proj_shorthand(species.abbr, regions, grid.size, year.range)
dirs        <-  dir_spec(dir.orig.data = dir.orig.data,  
                         dir.proj = dir.proj,
                         subdir.proj = subdir.proj) # create and/or specify 

Step 2: Make Integrated Data

Create a spatial sampling grid

The following chunk creates a spatial sampling grid of size grid.size with units defaulting to the units of crs.target.

study_area <- make_spatial_grid(
                          dir.out = dirs[['dir.spatial.out']],
                          states = states,
                          countries = countries,
                          crs.target = crs.target,
                          grid.size = grid.size, 
                          hexagonal = TRUE,
                          overwrite = TRUE
                          )
if(is.list(study_area)){ 
  grid          <- study_area$grid
  overlay       <- study_area$grid.overlay
}
plot(grid)
plot(overlay)
rm(study_area)

Create the BBS data. This chunk relieson R package . The resulting data is aligned with the spatial grid (see above).

## if the files already exist, don't overwrite unless you've made changes to data specs
if("bbs_obs.rds" %in% list.files(dirs$dir.bbs.out)){bbs_obs <- readRDS(list.files(dirs$dir.bbs.out, "bbs_obs.rds", full.names=TRUE))}else{
bbs_orig <- bbsAssistant::grab_bbs_data(bbs_dir = dirs$dir.bbs.out) 
bbs_obs  <- bbsAssistant::munge_bbs_data(
    bbs_list = bbs_orig,
    states   = states,
    species = species, 
    year.range = year.range)
bbs_obs <- bbsebird:::match_col_names(bbs_obs) # munge column names to mesh with eBird
saveRDS(bbs_obs, paste0(dirs$dir.bbs.out, "/bbs_obs.rds")) # suggest saving data to file for easy access
}
# Overlay BBS and study area / sampling grid
### note, sometimes when running this in a notebook/rmd a random .rdf" path error occurs.
#### I have no clue what this bug is. Just try running it again. See also https://github.com/rstudio/rstudio/issues/6260
if("bbs_spatial.rds" %in% list.files(dirs$dir.bbs.out)){bbs_spatial <- readRDS(list.files(dirs$dir.bbs.out, "bbs_spatial.rds", full.names=TRUE))}else{
bbs_spatial <- make_bbs_spatial(
  df = bbs_obs,
  overwrite=TRUE, 
  cws.routes.dir = dirs$cws.routes.dir,
  usgs.routes.dir = dirs$usgs.routes.dir,
  plot.dir = dirs$dir.plots,
  crs.target = crs.target,
  grid = study_area,
  dir.out = dirs$dir.spatial.out
)
saveRDS(bbs_spatial, paste0(dirs$dir.bbs.out, "/bbs_spatial.rds"))
}
## check out the bbs spatial data to ensure things look ok
# plot(bbs_spatial['area']) # cell area

Munge the eBird data (must be saved to file):

## check the specified ebird original data directory for files. 
(fns.ebird    <- id_ebird_files(
  dir.ebird.in = dirs$dir.ebird.in,
  dir.ebird.out = dirs$dir.ebird.out,
  mmyyyy = mmyyyy,
  species = species.abbr,
  states.ind = states
))
stopifnot(length(fns.ebird) > 1)

# Import and munge the desired files
ebird <- munge_ebird_data(
  fns.ebird = fns.ebird,
  species = c(species, species.abbr),
  dir.ebird.out = dirs$dir.ebird.out,
  countries = countries,
  states = states,
  # overwrite = FALSE, ## this function checks for existing, munged files iin dir.ebird.out..
  years = year.range
)

# Create spatial ebird
ebird_spatial <- make_ebird_spatial(
  df = ebird,
  crs.target = crs.target,
  grid = ifelse(is.list(study_area), study_area[[1]], study_area),
  overwrite = FALSE, # this fun checks for existing spatial ebird file in dir.spatial.out
  dir.out = dirs$dir.spatial.out 
)
## visualizing the ebird_spatial data takes a while, do not recommend.

Step 3: Bundle Data for Use in BUGS

Create a list of lists and indexes for use in JAGS or elsewhere. We suggest creating a list using make_bundle and subsequently grabbing useful data from there.

make_bundle creates site-level covariates in both long (vector) and wide (matrix) form. Matrix form are housed inside Xsite matrix, whereas long-form are within bbs.df and ebird.df.

message("[note] sometimes when running this chunk in notebook/rmarkdown it crashes. try restarting session or running interactively\n")

### make a teeny little bundle for model dev/debugging
bundle.dev <- make_bundle(
  bbs = bbs_spatial,
  ebird = ebird_spatial,
  grid = study_area,
  dev.mode = TRUE
)
## recommend saving to file in case you have crashes due to memory or modeling
saveRDS(bundle.dev, paste0(dirs$dir.proj,"/dev-bundle.rds"))

### make full sized bundle
bundle <- make_bundle(
  # data
  bbs = bbs_spatial,
  ebird = ebird_spatial,
  grid = study_area,
  # optional args
  dev.mode = FALSE
)
saveRDS(bundle, paste0(dirs$dir.proj,"/bundle.rds"))

# ### or read in from file...
# (bundle.fns <- list.files(paste0(dirs$dir.proj), pattern="bundle.rds", full.names = TRUE))
# bundle <- readRDS(bundle.fns[1])

–> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –> –>

–> –> –> –> –>