library(tidyverse)
library(knitr)Exploratory Data Analysis of Bat Capture Data
Rationale
All species of bats in Ontario are listed as endangered and while collecting data from them is important, capture methods can be invasive and stressful for the bats (Edwards et al., 2022). For this reason, it’s important to optimize our nights of capture to ensure that we are collecting the most data we can while disturbing the least amount of bat roosts.
I’m using previously collected bat capture data collected from the Davy Lab from the last few years to discover any noticeable trends. Trends such as number of species, number of individuals, or change over time in physical measurements taken at the different sites of capture. By analyzing capture trends in previously collected data, we can find out where to focus our resources and limit unnecessary stress to already endangered species.
Study Questions
In which year were the most bats captured? The least?
Where were the most bats captured?
Which species were most frequently caught at each site?
Has mass or forearm length changed over the time?
Libraries
Data Import
First I import the Davy Lab’s collective capture data, selecting only the columns necessary to perform my data. Each row of the data corresponds to a captured individual and includes the year of capture, location of capture, species, sex, age, forearm length (mm), and mass (g) of each bat.
dat <- read_csv("BatCaptureData.csv") %>%
select('Year', 'General Location', 'Species', 'Sex', 'Age', 'Forearm_mm', 'Mass_g')Here we have a dataframe containing 5063 observations with 7 columns of values that need to be tidied before analyzing.
Data Cleaning
Here I spend some time manipulating the data to make it easier to use, easier to read, and to ensure all of the values are consistent and in the correct format.
Renaming Columns
dat <- dat %>%
rename(
year = Year,
location = `General Location`,
species = Species,
sex = Sex,
age = Age,
forearm = Forearm_mm,
mass = Mass_g
)Cleaning: year
Expected: Numerical years between 2017 and 2021
sort(unique(dat$year))[1] 2017 2018 2019 2020 2021
class(dat$year)[1] "numeric"
Already properly formatted!
Cleaning: location
Expected: No inadvertent duplicates and consistent naming scheme
sort(unique(dat$location)) [1] "Amherst Island" "Belgrave" "Brighton" "Burlington"
[5] "Cardiff" "Cordova Mines" "Eganville" "Hillsburgh"
[9] "Kinmount" "Kirkland Lake" "Marston" "Minden"
[13] "Mount St Patrick" "Penatanguishene" "Peterborough" "Port Elgin"
[17] "Red Bay" "Roslin" "Tory Hill" "Wainfleet"
[21] "Wilberforce" "Wolfe Island"
Already properly formatted. I’ll be keeping NAs in the dataset for analyses where they won’t interfere.
Cleaning: species
Expected: No inadvertent duplicates, valid species, consistent naming scheme
unique(dat$species) [1] "EPFU" "LABO" "MYLU" "PESU" "MYSE" "PESU?" "MYLE" NA "MYLE?"
[10] "?"
dat <- dat %>%
mutate(species = case_when(
species == 'EPFU' ~ 'Big brown bat',
species == 'LABO' ~ 'Eastern red bat',
species == 'MYLE' ~ 'Eastern small-footed bat',
species == 'MYSE' ~ 'Northern long-eared bat',
species == 'PESU' ~ 'Tricoloured bat',
species == 'MYLU' ~ 'Little brown bat',
TRUE ~ NA_character_
))
unique(dat$species)[1] "Big brown bat" "Eastern red bat"
[3] "Little brown bat" "Tricoloured bat"
[5] "Northern long-eared bat" NA
[7] "Eastern small-footed bat"
Issues: Naming scheme was very jargon-heavy and there were values with some uncertainty when the observers weren’t certain in their species ID.
Fix: Set uncertain answers as NAs and replaced jargony abbreviations with the common names of the bat species.
- EPFU: Big brown bat (Eptesicus fuscus)
- LABO: Eastern red bat (Lasiurus borealis)
- MYLE: Eastern small-footed bat (Myotis leibii)
- MYSE: Northern long-eared bat (Myotis septentrionalis)
- PESU: Tricoloured bat (Perimyotis sublavus)
- MYLU: Little brown bat (Myotis lucifugus)
While we’re here, lets check how many of each species we have in our data set!
| species | Count |
|---|---|
| Little brown bat | 4243 |
| Big brown bat | 458 |
| Eastern small-footed bat | 170 |
| Tricoloured bat | 144 |
| NA | 32 |
| Northern long-eared bat | 14 |
| Eastern red bat | 2 |
Cleaning: sex
Expected: Only values should correspond with either male or female, consistent naming scheme
unique(dat$sex)[1] "M" "F" NA "?" "unknown" "A"
dat <- dat %>%
mutate(sex = case_when(
sex == 'M' ~ 'M',
sex == 'F' ~ 'F',
TRUE ~ NA
))
unique(dat$sex)[1] "M" "F" NA
Issues: Some uncertain values, ‘A’ as a value makes no sense
Fix: Set uncertain and incorrect values as NAs and keep M’s and F’s unchanged.
Cleaning: age
Expected: Only values should correspond with either adult or juvenile, consistent naming scheme
unique(dat$age) [1] "J" "A"
[3] "J?" "A?"
[5] NA "?"
[7] "unknown" "a"
[9] "J (was called A before)" "na"
[11] "F"
dat <- dat %>%
mutate(age = case_when(
age == 'a' | age == 'A' ~ 'A',
age == 'j' | age == 'J' | age == 'J (was called A before)' ~ 'J',
TRUE ~ NA
))
unique(dat$age)[1] "J" "A" NA
Issues: Lots of uncertainty and variation in the naming scheme, unnecessary text in some values, one value of ‘F’ that makes no sense in the column.
Fix: Set uncertain values to NA and change naming scheme of certain values to be consistent as either ‘J’ or ‘A’.
Cleaning: forearm
Expected: Numerical values that follow a consistent scheme
dat <- dat %>%
mutate(forearm = case_when(
forearm == 'na' | forearm == '?' | forearm == 'unknown' ~ NA,
forearm == '38..5' ~ '38.5',
TRUE ~ forearm
))
dat$forearm <- as.numeric(dat$forearm)Issues: Few uncertain values and one that contained 2 decimal points
Fix: Set uncertain values as NAs and fix the format of the one mistake
Now I want to see if the values are biologically valid
First I plot the distributions of forearm measurements by species and look for any values that are completely unreasonable using field identification guides provided by the Davy Lab:
na.omit(dat) %>%
ggplot(aes(x = forearm)) + geom_histogram() +
facet_wrap(~species, scales = 'free')Bats with values outside reasonable range:
Little brown bat: Should be <=40mm
Eastern small-footed bat:: Should be <=33mm
dat <- dat %>%
filter(!(species == "Eastern small-footed bat" & forearm >= 34),
!(species == "Little brown bat" & forearm > 40))
na.omit(dat) %>%
ggplot(aes(x = forearm)) + geom_histogram() +
facet_wrap(~species, scales = 'free')All values look reasonable now!
Cleaning: mass
Expected: Numerical values with a consistent scheme
dat <- dat %>%
mutate(mass = case_when(
mass == 'unknown' |
mass == '~19.01*' |
mass == '13.23 with transmitter' ~ NA,
TRUE ~ mass
))
dat$mass <- as.numeric(dat$mass)Issues: Some uncertain values
Fix: Changed uncertain values to NAs and set the column as numeric
Again, I want to check for unreasonable values
Mass can be variable so I’ll be more lenient with expected values, but I will be checking for any outrageous numbers.
na.omit(dat) %>%
ggplot(aes(x = mass)) + geom_histogram() +
facet_wrap(~species, scales = 'free')dat <- dat %>%
filter(!(species == 'Eastern small-footed bat' & mass >= 7),
!(species == 'Big brown bat' & mass >= 28),
!(mass > 100))Removed values: No eastern small-footed bat would be measured above 7g in weight, no big brown bat would be larger than 28g and finally there were 3 values over 500g which are an obvious error.
na.omit(dat) %>%
ggplot(aes(x = mass)) + geom_histogram() +
facet_wrap(~species, scales = 'free')Values are all reasonable now!
Final Cleaning
Finally, I want to get rid of any captures where all of the columns have NA values because they will be completely useless in my analyses!
dat <- dat %>%
filter(!(is.na(forearm) & is.na(mass) & is.na(sex) & is.na(age)))All in all I have reorganized the dataframe from 5063 observations to a clean and consistent 4550 observations ready to be plotted and analyzed.
Analysis
Question 1: In which year were the most bats captured? The Least?
plot_1 <- dat %>%
count(year) %>%
mutate(year = fct_reorder(factor(year), n)) %>%
ggplot(aes(x = year, y = n, fill = year)) +
geom_col() +
geom_text(aes(label = paste0("n = ", n)), vjust = -0.5) +
theme_minimal() +
theme(panel.grid = element_blank(), legend.position = 'none') +
labs(x = 'Sample Year', y = 'Number of Bats Captured')
plot_1Finding: 2018 had the highest number of bats captured (n =1677)
Finding: 2020 had the lowest number of bats captured (n = 128)
This steep dropoff in capture observations in 2020 was likely do the difficulties with COVID!
Question 2: Where were the most bats captured?
plot_2 <- dat %>%
filter(!is.na(location)) %>%
count(location) %>%
mutate(location = fct_reorder(location, n, .desc = TRUE)) %>%
ggplot(aes(y = location, x = n, fill = location)) +
geom_col() +
theme_minimal() +
theme(panel.grid = element_blank(), legend.position = 'none') +
labs(x = 'Number of Bats Captured', y = 'Sample Location')
plot_2Finding: Mount St Patrick easily has the highest number of bat captures, completely dwarfing the counts of all other locations.
This may be due to more rigorous capture efforts at this location, but it may also be a factor of extremely high populations in this area! I.E., this is a great place to focus research efforts.
Question 3: Which species were most frequently caught at each site?
common_sp <- dat %>%
group_by(location, species) %>%
summarise(count = n(), .groups = "drop") %>%
arrange(location, desc(count)) %>%
group_by(location) %>%
slice_max(order_by = count, n = 1)
plot_3 <- common_sp %>%
filter(!is.na(location)) %>%
ggplot(aes(x = species, y = fct_reorder(location, species), color = species)) +
geom_point(size = 5) +
labs(title = "Most Common Species per Location",
x = "Species",
y = "Location") +
theme_minimal() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none")
plot_3Finding: Most sites found little brown bats to be the most common (n = 15), some sites found big brown bats to be most common (n = 6), Burlington sampling found eastern small-footed bats as common as little brown bats.
In species-specific studies where you are aiming to capture a relatively difficult to eastern small-footed bats, focusing your trapping efforts at the Burlington site would likely be a great use of resources!
4a. Has forearm length changed over time?
Note: I excluded eastern red bats from the analysis as we only have 2 observations in the dataset
plot_5 <- dat %>%
filter(!is.na(forearm), !is.na(species), !is.na(year), !(species == 'Eastern red bat')) %>%
ggplot(aes(x = year, y = forearm)) +
geom_point(alpha = 0.4, color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
facet_wrap(~ species, scales = "free_y") +
labs(title = "Forearm Size Across Years",
x = "Year",
y = "Forearm Length (mm)") +
theme_minimal() +
theme(panel.grid = element_blank())
plot_5Finding: Looks as though tricoloured bats may be having a reduction in their forearm length in recent years.
Finding: Northern long-eared bats look to be having an increase in forearm length, however small sample size is likely exaggerating the trend.
Finding: The other three species look to be quite consistent in their measurements across the sampling years.
4b. Has mass changed over time?
Note: I excluded eastern red bats from the analysis as we only have 2 observations in the dataset
plot_6 <- dat %>%
filter(!is.na(mass), !is.na(species), !is.na(year), !(species == 'Eastern red bat')) %>%
ggplot(aes(x = year, y = mass)) +
geom_point(alpha = 0.4, color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
facet_wrap(~ species, scales = "free_y") +
labs(title = "Mass Across Years",
x = "Year",
y = "Mass (g)") +
theme_minimal() +
theme(panel.grid = element_blank())
plot_6Finding: All species apart from the eastern small-footed bats show a very consistent spread in their mass measurements each year.
Finding: Eastern small-footed bats look to be having an increase in their mean mass!
This could potentially have been caused by sampling late in the season as they come to the end of their foraging season and fatten up for the winter. But this could be used as rationale to perform future studies to see whether this is really the case!
I decided to explore whether there was any statistical basis in this visual finding.
First I made a subset of the data only containing eastern small-footed bats
dat_smallfooted <- dat %>%
filter(species == "Eastern small-footed bat", !is.na(mass), !is.na(year))
dat_smallfooted$year <- as.factor(dat_smallfooted$year)Then I performed a one-way anova to determine whether mass differed significantly across years. This ANOVA took a subset from the total data of n = 154 eastern small-footed bats.
mass_anova <- aov(mass ~ year, data = dat_smallfooted)
summary(mass_anova) Df Sum Sq Mean Sq F value Pr(>F)
year 4 6.825 1.7063 9.06 1.41e-06 ***
Residuals 149 28.060 0.1883
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The anova returned a p-value associated with ‘year’ of 1.41e-06, much smaller than the alpha value of 0.05, indicating significance.
I then performed a TukeyHSD to determine which years were significantly difference from each other.
TukeyHSD(mass_anova)There were 4 pairs of years that showed significantly different mass distributions:
2021-2019: p = ~0.01
2018-2017: p = ~0.002
2020-2018: p = ~0.005
2021-2018: p = ~0.0004
The most interesting of these results to me was the longest term pair from 2021-2018 being highly significant, indicating that there was quite the jump mass between these years.
In the following graph you can see a visualization of this jump in mass.
ggplot(dat_smallfooted, aes(x = year, y = mass)) +
geom_boxplot() +
geom_jitter(width = 0.2, alpha = 0.4) +
labs(title = "Eastern Small-footed Bat Mass Over Time",
x = "Year",
y = "Mass (g)") +
theme_minimal() +
theme(panel.grid = element_blank())Conclusions:
Using compiled bat capture data we are able to find which trapping sites are best to optimize our sampling efforts. We can also analyze our data to find trends within our data which can inform future exploratory studies. For example, if we wish to perform studies on the eastern small-footed bat and it’s apparent increase in mass, Burlington is likely the best site within our currently collected data.
Citations:
Phoebe D Edwards, Rudy Boonstra, Curtis O Bosson, N Jane Harms, Piia M Kukka, Craig K R Willis, Thomas S Jung, Effects of capture on stress-axis measures in endangered little brown bats (Myotis lucifugus), Journal of Mammalogy, Volume 103, Issue 1, February 2022, Pages 91–99, https://doi.org/10.1093/jmammal/gyab135
https://home.nps.gov/subjects/bats/studying-bats.htm (Figure 1)
https://www.ontario.ca/page/eastern-small-footed-myotis-recovery-strategy (Figure 2)