Anuran Abundance Trends in Georgian Bay Islands National Park - an Exploratory Data Analysis

Author

Christopher John Dennison

Published

April 14, 2024


Introduction

Amphibians are excellent indicators of ecosystem health and are in decline globally (Grant, Miller, & Muths, 2020). Loss of amphibian populations has facilitated the creation of regional monitoring programs which explore the spatial and temporal trends of amphibian abundance. For example, the Canadian Marsh Monitoring Program (MMP) has been collecting data on wetland bird and amphibian populations since 1995:

(https://www.birdscanada.org/bird-science/marsh-monitoring-program).

Similar to birds, frog and toads (Anurans) are relatively easy to detect due to their unique vocalizations during breeding. Call and visual surveys have contributed much to our understanding about species-specific trends across Canada. Outside of the MMP, federal departments and local conservation authorities also conduct regional surveys of Anurans to track declines.

For example, Anuran abundance data from Georgian Bay Islands (GBI) National Park was collected by Parks Canada staff from 2005 - 2017.

As per the official description of the dataset:

“Abundance and diversity of frogs and toads is a good indicator for assessing ecological integrity. The park visually counts adult frogs and toads in coastal wetlands after the breeding season. This method does not permit assessment of early breeding species and may overestimate frog abundance because these surveys coincide with mass emergence of newly developed frogs, which are subjected to very high mortality rates.” (Roy, 2017, p.1).

Long-term use of Anuran survey data requires that said data be maintained in a format conducive to analysis.

Here, I will use the GBI National Park Anuran Abundance Dataset to demonstrate the steps of an Exploratory Data Analysis (EDA) using the R coding language, which includes:

  1. Initial Exploration & Research Questions
  2. Data Cleaning
  3. Data Wrangling & Transformations
  4. Visualization & Analysis

I will also use the aforementioned dataset to answer some preliminary research questions, which will be established in the proceeding section.

Part I - Initial Exploration and Research Questions

Note: for this section, I will include our R code and output for illustrative purposes.

The dataset I will be using includes Anuran surveys conducted from 2005 - 2017 in GBI National Park. The structure of any dataset should be examined before transitioning into tidying and data hygiene:

# Note: ensure that the working directory is set to include the 'GBI_frogToad.csv dataset!

setwd('C:\\Users\\denni\\Desktop\\BIOL 5404W Final Project')

# Read-in dataset:

GBI_frogToad <- read.csv("GBI_frogToad.csv")

# We will remove the first row, which contains French column labels:

GBI_frogToad <- GBI_frogToad[-1, ]

# We'll use the str() and colnames() functions to identify the general structure of our dataset:

str(GBI_frogToad)
'data.frame':   13220 obs. of  7 variables:
 $ Date                 : chr  "19-09-2005" "19-09-2005" "19-09-2005" "19-09-2005" ...
 $ Year                 : chr  "2005" "2005" "2005" "2005" ...
 $ Location             : chr  "Treasure Bay" "Treasure Bay" "Treasure Bay" "Treasure Bay" ...
 $ Transect.ID          : chr  "1" "2" "3" "4" ...
 $ Species              : chr  "Bullfrog" "Bullfrog" "Bullfrog" "Bullfrog" ...
 $ Zone                 : chr  "Wet Meadow" "Wet Meadow" "Wet Meadow" "Wet Meadow" ...
 $ Number.of.Individuals: chr  "" "" "1" "" ...
colnames(GBI_frogToad)
[1] "Date"                  "Year"                  "Location"             
[4] "Transect.ID"           "Species"               "Zone"                 
[7] "Number.of.Individuals"

There are 7 columns in the dataset, each corresponding to information related to Anuran surveys conducted in GBI National Park. We will explore the structure and content of each column in the proceeding code, with the objective of identifying any errors to be addressed during the data hygiene section of our EDA:

# 1) Date

GBI_frogToad %>% 
  distinct(Date) %>% 
  nrow()
[1] 43
# 2) Year

range(GBI_frogToad$Year)
[1] "2005" "2017"
# 3) Location

GBI_locations <- GBI_frogToad$Location %>% 
  unique() %>% 
  as.vector()

length(GBI_locations)
[1] 16
GBI_locations
 [1] "Treasure Bay"     "Ojibway Bay"      "Hockeystick Bay"  "Church Bay"      
 [5] "Lily Pond Bay"    "North Bay"        "12 Miles Bay"     "Centennial Bay"  
 [9] "Turtle Bay"       "Christian Beach"  "Papoose Bay"      "beausoleil point"
[13] "treasure bay"     "Bone Dock"        "Island 226"       "Turtle bay "     
# 4) Transect.ID

GBI_frogToad %>% 
  distinct(Transect.ID)
   Transect.ID
1            1
2            2
3            3
4            4
5            5
6            6
7            7
8            8
9            9
10          10
# 5) Species

Anuran_species <- GBI_frogToad$Species %>% 
  unique() %>% 
  as.vector()

length(Anuran_species)
[1] 8
Anuran_species
[1] "Bullfrog"              "Northern Leopard Frog" "Green Frog"           
[4] "Mink Frog"             "Spring Peeper"         "Wood Frog"            
[7] "American Toad"         "Other Species"        
# 6) Zone

GBI_zones <- GBI_frogToad %>% 
  distinct(Zone)

GBI_zones
                  Zone
1           Wet Meadow
2 Land/Water Intercept
3        Shallow Water
4            Wet Medow
5 Land/water intercept
6        Shallow water
# 7) Number.of.Individuals

class(GBI_frogToad$Number.of.Individuals)
[1] "character"

From this preliminary exploration of the dataset, we have already uncovered several potential errors and areas in need of data hygiene:

i) dates within the dataset should be converted to Julian dates prior to analysis,

ii) closer inspection of the 16 survey locations (sites) reveals that two sites, ‘Treasure Bay’ and ‘Turtle Bay’, maintain duplicates owing to different spellings (i.e. ‘Turtle Bay’ and ‘Turtle bay’).

iii) Similar to the ‘Locations’ column, there are duplicates in the ’ Zone’ column:

  1. ‘Land/Water Intercept’ and ‘Land/water intercept’ (the second entry has ‘water’ and ‘intercept’ uncapitalized while the other does not)

  2. ‘Wet Meadow’ and ‘Wet Medow’ (the latter is a typo)

  3. ‘Shallow Water’ and ‘Shallow water’ (the latter has a lower-case ‘w’ for water).

Now that we understand the structure of our dataset, and have uncovered a number of errors, we can formulate a set of research questions to guide our EDA and analysis:

Research Questions

  1. Does Anuran Abundance change in GBI National Park over time?

  2. What are the site-specific and habitat-specific dynamics of Anuran abundance over time?

  3. Are any particular Anuran species declining in GBI National Park?

Part II - Data Hygiene & Cleaning

Note: for this section, I will include our R code and output for illustrative purposes.

In addition to addressing the errors uncovered in Part I, we must also ensure that our data is free of missing data and NAs which could hinder our analysis.

We can check for NA values in our columns:

GBI_frogToad %>% 
  is.na() %>% 
  colSums()
                 Date                  Year              Location 
                    0                     0                     0 
          Transect.ID               Species                  Zone 
                    0                     0                     0 
Number.of.Individuals 
                    0 

… as well as missing data in the form of blank cells:

sum(GBI_frogToad$Date == " ")
[1] 0
sum(GBI_frogToad$Year == " ")
[1] 0
sum(GBI_frogToad$Location == " ")
[1] 0
sum(GBI_frogToad$Transect.ID == " ")
[1] 0
sum(GBI_frogToad$Species == " ")
[1] 0
sum(GBI_frogToad$Zone == " ")
[1] 0
sum(GBI_frogToad$Number.of.Individuals == " ") 
[1] 0

Fortunately, there do not seem to be any missing data or NA values in our dataset.

Now we should address the errors we uncovered in Part I, namely the presence of duplicate entries for ‘Zone’ and ‘Location’.

Specifically, we will remove the duplicate ‘Locations’ and ‘Zones’ from our dataset by fixing the grammatical inconsistencies that caused these redundancies:

GBI_frogToad$Location <- gsub('Turtle bay', 'Turtle Bay', GBI_frogToad$Location)
GBI_frogToad$Location <- gsub('treasure bay', 'Treasure Bay', GBI_frogToad$Location)
GBI_frogToad$Zone<- gsub('Wet Medow', 'Wet Meadow', GBI_frogToad$Zone)
GBI_frogToad$Zone<- gsub('Land/water intercept', 'Land/Water Intercept', GBI_frogToad$Zone)
GBI_frogToad$Zone<- gsub('Shallow water', 'Shallow Water', GBI_frogToad$Zone)
GBI_frogToad$Location <- str_trim(GBI_frogToad$Location)
GBI_frogToad$Location<- gsub('beausoleil point', 'Beausoleil Point', GBI_frogToad$Location)

Remember, one of the most fundamental steps in an EDA is to ensure the dataset is free of errors or redundancies that might create spurious results or otherwise influence analysis of the data. Doing this early-on will save several headaches when analyzing the dataset, and will ensure that future users of the dataset do not encounter similar issues.

Having cleaned our dataset, we can now move onto data wrangling and transformations.

Part III - Data Wrangling and Transformations

Note: for this section, I will include our R code and output for illustrative purposes.

Data wrangling and transformation is another fundamental step in any EDA, its primary purpose being to adjust a dataset so that it is conducive to analysis and the answering of research questions.

Let’s take a look at our Anuran abundance dataframe as it stands:

glimpse(GBI_frogToad)
Rows: 13,220
Columns: 7
$ Date                  <chr> "19-09-2005", "19-09-2005", "19-09-2005", "19-09…
$ Year                  <chr> "2005", "2005", "2005", "2005", "2005", "2005", …
$ Location              <chr> "Treasure Bay", "Treasure Bay", "Treasure Bay", …
$ Transect.ID           <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10…
$ Species               <chr> "Bullfrog", "Bullfrog", "Bullfrog", "Bullfrog", …
$ Zone                  <chr> "Wet Meadow", "Wet Meadow", "Wet Meadow", "Wet M…
$ Number.of.Individuals <chr> "", "", "1", "", "", "", "", "", "", "", "", "",…

There are a number of ways that we can make the data more suitable for analysis and the answering of our research questions.

To begin, let’s convert the ‘Number.of.Individuals’ column from a character variable into a continuous, numeric variable. We will also shorten some of our column names to make them more concise (but still descriptive). This will make our analysis more efficient. Lastly, we will convert the dates in our dataset to Julian dates instead of the current dd-mm-yyyy format:

# Convert 'Number.of.Individuals' to numeric

GBI_frogToad$Number.of.Individuals <- as.numeric(GBI_frogToad$Number.of.Individuals)

GBI_frogToad$Number.of.Individuals[is.na(GBI_frogToad$Number.of.Individuals)] <- 0

# Convert to Julian dates

GBI_frogToad$Date <- as.Date(GBI_frogToad$Date, format = "%d-%m-%y")
GBI_frogToad$Date <- yday(GBI_frogToad$Date)

# Make column names more concise

GBI_frogToad <- GBI_frogToad %>% 
  rename(year = Year,
         date = Date,
         site = Location,
         species = Species,
         zone = Zone,
         anuran_obs = Number.of.Individuals)

Now, we will use our dataset to create a ‘wide-format’ dataframe. Specifically, we want a dataframe in which each Anuran species has a dedicated column with the number of observations for that species, with rows corresponding to each year, site, habitat, and transect combination:

GBI_frogToad.W <- GBI_frogToad %>%
  pivot_wider(names_from = species, values_from = anuran_obs) %>% 
  group_by(site, date, zone)

We should also check to see if any sites are missing survey-years. This is especially important for our purposes, as our research questions relate to the site-specific, species-specific, and habitat-specific dynamics of Anuran abundance over time (i.e. temporal dynamics):

Survey_check <- GBI_frogToad.W %>% 
  group_by(site, year) %>% 
  tally()

view(Survey_check)

Unfortunately, many locations were not surveyed for Anurans during certain years. For example, ‘12 Miles Bay’ was only surveyed only 3 years (2013, 2014, and 2017). We will need to carefully consider this fact when exploring the temporal dynamics of Anuran abundance in the Park.

Lastly, we would like to further transform our wide-format dataframe to combine all Anuran observations from all transects at each location, giving us a “total count” for each species. We will do this for each site, habitat type, and for each year. We will also include a column that contains the total number of Anurans (all species) observed for each site, habitat type, and year:

GBI_frogToad.W <- GBI_frogToad.W %>% 
  group_by(site, date, year, zone) %>% 
  summarize(total_Bullfrog = sum(Bullfrog),
            total_GreenFrog = sum(`Green Frog`),
            total_WoodFrog = sum(`Wood Frog`),
            total_MinkFrog = sum(`Mink Frog`),
            total_AmericanToad = sum(`American Toad`),
            total_LeopardFrog = sum(`Northern Leopard Frog`),
            total_SpringPeeper = sum(`Spring Peeper`),
            total_Unknown = sum(`Other Species`)) %>% 
  mutate_all(~replace(., is.na(.), 0)) 

GBI_frogToad.W <- GBI_frogToad.W %>% 
  group_by(site, year, zone) %>% 
  mutate(total_Anuran = sum(total_Bullfrog, total_GreenFrog, total_WoodFrog,
                            total_MinkFrog, total_AmericanToad, total_LeopardFrog, total_SpringPeeper,
                            total_Unknown))

In addition to the ‘GBI_frogToad’ dataframe, the ‘GBI_frogToad.W’ object is a wide-transformed dataframe suitable to address our research questions. We will now proceed with these transformed, wrangled dataframes to conduct data visualization and analysis.

Part IV - Data Visualization & Analysis

Conclusions

Our EDA has uncovered some interesting trends in Anuran abundance in GBI National Park that could warrant further ecological studies. Namely, we discovered a potential temporal decline in four Anuran species in GBI National Park. Furthermore, we have found evidence that Anurans inhabiting a specific habitat type, the shoreline, may be impacted the most. This could be due to changes in shoreline composition from human-development, which, Canada-wide, is a major driver of loss in aquatic species which rely on riparian ecozones for reproduction and habitat.

Returning to our research questions:

Question 1: Does Anuran Abundance change in GBI National Park over time?

Answer: For certain species, there appears to have been a decline in abundance at sites with >3 survey-years since surveys began in 2005.

Question 2: What are the site-specific and habitat type-specific dynamics of Anuran abundance over time?

Answer: Shoreline (Land / Water Intercept) dependent species seem to have declined the most. In relation to our third research question, this would include Bullfrogs, Mink Frogs, and Green Frogs (and, to a lesser extent, Wood Frogs, which are privy to temporary ponds during breeding and forest habitats post-breeding). This habitat-specific trend was generalizable to sites with >3 survey-years.

Question 3: Are any particular Anuran species declining in GBI National Park?

Answer: Bullfrog, Green Frog, Mink Frog, and Wood Frog all seem to have experienced declines since 2005 in the park. Especially concerning is the fact that Mink Frog and Wood Frog have not been observed at several sites since the mid-late 2000s.

Future studies in GBI National Park should prioritize shoreline (riparian) habitats, and investigate if alterations to these habitats, whether from anthropogenic sources or otherwise, may be impacting Anuran species.

References

Grant, E.H.C, Miller, D.A.W., & Muths, E. (2020). A Synthesis of Evidence of Drivers of Amphibian Declines. Herpetologica, 76(2): 101 - 107. https://doi.org/10.1655/0018-0831-76.2.101

Roy, P. (2017). Frog Abundance - Georgian Bay Islands. [ecological dataset]. Parks Canada. https://open.canada.ca/data/dataset/f586e5a7-671b-4a8f-8dbe-5053a684fd37. Note: this dataset is made available under an Open Government Licence (Canada): https://open.canada.ca/en/open-government-licence-canada.

All photographs in this report were created by the authour (CDennison).