Anuran Abundance Trends in Georgian Bay Islands National Park - an Exploratory Data Analysis

Author

Christopher John Dennison

Published

April 14, 2024

Introduction

Amphibians are excellent indicators of ecosystem health and are in decline globally (Grant, Miller, & Muths, 2020). Loss of amphibian populations has facilitated the creation of regional monitoring programs which explore the spatial and temporal trends of amphibian abundance. For example, the Canadian Marsh Monitoring Program (MMP) has been collecting data on wetland bird and amphibian populations since 1995:

(https://www.birdscanada.org/bird-science/marsh-monitoring-program).

Similar to birds, frog and toads (Anurans) are relatively easy to detect due to their unique vocalizations during breeding. Call and visual surveys have contributed much to our understanding about species-specific trends across Canada. Outside of the MMP, federal departments and local conservation authorities also conduct regional surveys of Anurans to track declines.

For example, Anuran abundance data from Georgian Bay Islands (GBI) National Park was collected by Parks Canada staff from 2005 - 2017.

As per the official description of the dataset:

“Abundance and diversity of frogs and toads is a good indicator for assessing ecological integrity. The park visually counts adult frogs and toads in coastal wetlands after the breeding season. This method does not permit assessment of early breeding species and may overestimate frog abundance because these surveys coincide with mass emergence of newly developed frogs, which are subjected to very high mortality rates.” (Roy, 2017, p.1).

Long-term use of Anuran survey data requires that said data be maintained in a format conducive to analysis.

Here, I will use the GBI National Park Anuran Abundance Dataset to demonstrate the steps of an Exploratory Data Analysis (EDA) using the R coding language, which includes:

Initial Exploration & Research Questions
Data Cleaning
Data Wrangling & Transformations
Visualization & Analysis

I will also use the aforementioned dataset to answer some preliminary research questions, which will be established in the proceeding section.

Part I - Initial Exploration and Research Questions

Note: for this section, I will include our R code and output for illustrative purposes.

The dataset I will be using includes Anuran surveys conducted from 2005 - 2017 in GBI National Park. The structure of any dataset should be examined before transitioning into tidying and data hygiene:

# Note: ensure that the working directory is set to include the 'GBI_frogToad.csv dataset!

setwd('C:\\Users\\denni\\Desktop\\BIOL 5404W Final Project')

# Read-in dataset:

GBI_frogToad <- read.csv("GBI_frogToad.csv")

# We will remove the first row, which contains French column labels:

GBI_frogToad <- GBI_frogToad[-1, ]

# We'll use the str() and colnames() functions to identify the general structure of our dataset:

str(GBI_frogToad)

'data.frame':   13220 obs. of  7 variables:
 $ Date                 : chr  "19-09-2005" "19-09-2005" "19-09-2005" "19-09-2005" ...
 $ Year                 : chr  "2005" "2005" "2005" "2005" ...
 $ Location             : chr  "Treasure Bay" "Treasure Bay" "Treasure Bay" "Treasure Bay" ...
 $ Transect.ID          : chr  "1" "2" "3" "4" ...
 $ Species              : chr  "Bullfrog" "Bullfrog" "Bullfrog" "Bullfrog" ...
 $ Zone                 : chr  "Wet Meadow" "Wet Meadow" "Wet Meadow" "Wet Meadow" ...
 $ Number.of.Individuals: chr  "" "" "1" "" ...

colnames(GBI_frogToad)

[1] "Date"                  "Year"                  "Location"             
[4] "Transect.ID"           "Species"               "Zone"                 
[7] "Number.of.Individuals"

There are 7 columns in the dataset, each corresponding to information related to Anuran surveys conducted in GBI National Park. We will explore the structure and content of each column in the proceeding code, with the objective of identifying any errors to be addressed during the data hygiene section of our EDA:

# 1) Date

GBI_frogToad %>% 
  distinct(Date) %>% 
  nrow()

[1] 43

# 2) Year

range(GBI_frogToad$Year)

[1] "2005" "2017"

# 3) Location

GBI_locations <- GBI_frogToad$Location %>% 
  unique() %>% 
  as.vector()

length(GBI_locations)

[1] 16

GBI_locations

 [1] "Treasure Bay"     "Ojibway Bay"      "Hockeystick Bay"  "Church Bay"      
 [5] "Lily Pond Bay"    "North Bay"        "12 Miles Bay"     "Centennial Bay"  
 [9] "Turtle Bay"       "Christian Beach"  "Papoose Bay"      "beausoleil point"
[13] "treasure bay"     "Bone Dock"        "Island 226"       "Turtle bay "

# 4) Transect.ID

GBI_frogToad %>% 
  distinct(Transect.ID)

   Transect.ID
1            1
2            2
3            3
4            4
5            5
6            6
7            7
8            8
9            9
10          10

# 5) Species

Anuran_species <- GBI_frogToad$Species %>% 
  unique() %>% 
  as.vector()

length(Anuran_species)

[1] 8

Anuran_species

[1] "Bullfrog"              "Northern Leopard Frog" "Green Frog"           
[4] "Mink Frog"             "Spring Peeper"         "Wood Frog"            
[7] "American Toad"         "Other Species"

# 6) Zone

GBI_zones <- GBI_frogToad %>% 
  distinct(Zone)

GBI_zones

                  Zone
1           Wet Meadow
2 Land/Water Intercept
3        Shallow Water
4            Wet Medow
5 Land/water intercept
6        Shallow water

# 7) Number.of.Individuals

class(GBI_frogToad$Number.of.Individuals)

[1] "character"

From this preliminary exploration of the dataset, we have already uncovered several potential errors and areas in need of data hygiene:

i) dates within the dataset should be converted to Julian dates prior to analysis,

ii) closer inspection of the 16 survey locations (sites) reveals that two sites, ‘Treasure Bay’ and ‘Turtle Bay’, maintain duplicates owing to different spellings (i.e. ‘Turtle Bay’ and ‘Turtle bay’).

iii) Similar to the ‘Locations’ column, there are duplicates in the ’ Zone’ column:

‘Land/Water Intercept’ and ‘Land/water intercept’ (the second entry has ‘water’ and ‘intercept’ uncapitalized while the other does not)
‘Wet Meadow’ and ‘Wet Medow’ (the latter is a typo)
‘Shallow Water’ and ‘Shallow water’ (the latter has a lower-case ‘w’ for water).

Now that we understand the structure of our dataset, and have uncovered a number of errors, we can formulate a set of research questions to guide our EDA and analysis:

Research Questions

Does Anuran Abundance change in GBI National Park over time?
What are the site-specific and habitat-specific dynamics of Anuran abundance over time?
Are any particular Anuran species declining in GBI National Park?

Part II - Data Hygiene & Cleaning

Note: for this section, I will include our R code and output for illustrative purposes.

In addition to addressing the errors uncovered in Part I, we must also ensure that our data is free of missing data and NAs which could hinder our analysis.

We can check for NA values in our columns:

GBI_frogToad %>% 
  is.na() %>% 
  colSums()

                 Date                  Year              Location 
                    0                     0                     0 
          Transect.ID               Species                  Zone 
                    0                     0                     0 
Number.of.Individuals 
                    0

… as well as missing data in the form of blank cells:

sum(GBI_frogToad$Date == " ")

[1] 0

sum(GBI_frogToad$Year == " ")

[1] 0

sum(GBI_frogToad$Location == " ")

[1] 0

sum(GBI_frogToad$Transect.ID == " ")

[1] 0

sum(GBI_frogToad$Species == " ")

[1] 0

sum(GBI_frogToad$Zone == " ")

[1] 0

sum(GBI_frogToad$Number.of.Individuals == " ")

[1] 0

Fortunately, there do not seem to be any missing data or NA values in our dataset.

Now we should address the errors we uncovered in Part I, namely the presence of duplicate entries for ‘Zone’ and ‘Location’.

Specifically, we will remove the duplicate ‘Locations’ and ‘Zones’ from our dataset by fixing the grammatical inconsistencies that caused these redundancies:

GBI_frogToad$Location <- gsub('Turtle bay', 'Turtle Bay', GBI_frogToad$Location)
GBI_frogToad$Location <- gsub('treasure bay', 'Treasure Bay', GBI_frogToad$Location)
GBI_frogToad$Zone<- gsub('Wet Medow', 'Wet Meadow', GBI_frogToad$Zone)
GBI_frogToad$Zone<- gsub('Land/water intercept', 'Land/Water Intercept', GBI_frogToad$Zone)
GBI_frogToad$Zone<- gsub('Shallow water', 'Shallow Water', GBI_frogToad$Zone)
GBI_frogToad$Location <- str_trim(GBI_frogToad$Location)
GBI_frogToad$Location<- gsub('beausoleil point', 'Beausoleil Point', GBI_frogToad$Location)

Remember, one of the most fundamental steps in an EDA is to ensure the dataset is free of errors or redundancies that might create spurious results or otherwise influence analysis of the data. Doing this early-on will save several headaches when analyzing the dataset, and will ensure that future users of the dataset do not encounter similar issues.

Having cleaned our dataset, we can now move onto data wrangling and transformations.

Part III - Data Wrangling and Transformations

Note: for this section, I will include our R code and output for illustrative purposes.

Data wrangling and transformation is another fundamental step in any EDA, its primary purpose being to adjust a dataset so that it is conducive to analysis and the answering of research questions.

Let’s take a look at our Anuran abundance dataframe as it stands:

glimpse(GBI_frogToad)

Rows: 13,220
Columns: 7
$ Date                  <chr> "19-09-2005", "19-09-2005", "19-09-2005", "19-09…
$ Year                  <chr> "2005", "2005", "2005", "2005", "2005", "2005", …
$ Location              <chr> "Treasure Bay", "Treasure Bay", "Treasure Bay", …
$ Transect.ID           <chr> "1", "2", "3", "4", "5", "6", "7", "8", "9", "10…
$ Species               <chr> "Bullfrog", "Bullfrog", "Bullfrog", "Bullfrog", …
$ Zone                  <chr> "Wet Meadow", "Wet Meadow", "Wet Meadow", "Wet M…
$ Number.of.Individuals <chr> "", "", "1", "", "", "", "", "", "", "", "", "",…

There are a number of ways that we can make the data more suitable for analysis and the answering of our research questions.

To begin, let’s convert the ‘Number.of.Individuals’ column from a character variable into a continuous, numeric variable. We will also shorten some of our column names to make them more concise (but still descriptive). This will make our analysis more efficient. Lastly, we will convert the dates in our dataset to Julian dates instead of the current dd-mm-yyyy format:

# Convert 'Number.of.Individuals' to numeric

GBI_frogToad$Number.of.Individuals <- as.numeric(GBI_frogToad$Number.of.Individuals)

GBI_frogToad$Number.of.Individuals[is.na(GBI_frogToad$Number.of.Individuals)] <- 0

# Convert to Julian dates

GBI_frogToad$Date <- as.Date(GBI_frogToad$Date, format = "%d-%m-%y")
GBI_frogToad$Date <- yday(GBI_frogToad$Date)

# Make column names more concise

GBI_frogToad <- GBI_frogToad %>% 
  rename(year = Year,
         date = Date,
         site = Location,
         species = Species,
         zone = Zone,
         anuran_obs = Number.of.Individuals)

Now, we will use our dataset to create a ‘wide-format’ dataframe. Specifically, we want a dataframe in which each Anuran species has a dedicated column with the number of observations for that species, with rows corresponding to each year, site, habitat, and transect combination:

GBI_frogToad.W <- GBI_frogToad %>%
  pivot_wider(names_from = species, values_from = anuran_obs) %>% 
  group_by(site, date, zone)

We should also check to see if any sites are missing survey-years. This is especially important for our purposes, as our research questions relate to the site-specific, species-specific, and habitat-specific dynamics of Anuran abundance over time (i.e. temporal dynamics):

Survey_check <- GBI_frogToad.W %>% 
  group_by(site, year) %>% 
  tally()

view(Survey_check)

Unfortunately, many locations were not surveyed for Anurans during certain years. For example, ‘12 Miles Bay’ was only surveyed only 3 years (2013, 2014, and 2017). We will need to carefully consider this fact when exploring the temporal dynamics of Anuran abundance in the Park.

Lastly, we would like to further transform our wide-format dataframe to combine all Anuran observations from all transects at each location, giving us a “total count” for each species. We will do this for each site, habitat type, and for each year. We will also include a column that contains the total number of Anurans (all species) observed for each site, habitat type, and year:

GBI_frogToad.W <- GBI_frogToad.W %>% 
  group_by(site, date, year, zone) %>% 
  summarize(total_Bullfrog = sum(Bullfrog),
            total_GreenFrog = sum(`Green Frog`),
            total_WoodFrog = sum(`Wood Frog`),
            total_MinkFrog = sum(`Mink Frog`),
            total_AmericanToad = sum(`American Toad`),
            total_LeopardFrog = sum(`Northern Leopard Frog`),
            total_SpringPeeper = sum(`Spring Peeper`),
            total_Unknown = sum(`Other Species`)) %>% 
  mutate_all(~replace(., is.na(.), 0)) 

GBI_frogToad.W <- GBI_frogToad.W %>% 
  group_by(site, year, zone) %>% 
  mutate(total_Anuran = sum(total_Bullfrog, total_GreenFrog, total_WoodFrog,
                            total_MinkFrog, total_AmericanToad, total_LeopardFrog, total_SpringPeeper,
                            total_Unknown))

In addition to the ‘GBI_frogToad’ dataframe, the ‘GBI_frogToad.W’ object is a wide-transformed dataframe suitable to address our research questions. We will now proceed with these transformed, wrangled dataframes to conduct data visualization and analysis.

Part IV - Data Visualization & Analysis

Part VI - Section A: General Trends in Anuran Abundance

Before diving into our research questions, which relate to temporal changes in Anuran abundance (Part IV - Section B), let’s explore the general distribution of Anuran abundance GBI National Park at different sites, across different years, for different species, and in different habitat types.

First, we’ll create a table that shows the average number of Anurans observed at each site, considering all years that a site was surveyed:

Table 1. Average Anuran observations at GBI National Park Sites, 2005-2017
Site	All_Species	American.Bullfrog	Green.Frog	Wood.Frog	Mink.Frog	American.Toad	Northern.Leopard.Frog	Spring.Peeper	Unknown.Species
12 Miles Bay	13.44	11.33	1.11	0.00	0.00	0.00	0.67	0.00	0.33
Beausoleil Point	1.00	0.00	0.00	0.00	0.67	0.33	0.00	0.00	0.00
Bone Dock	0.33	0.00	0.00	0.00	0.00	0.00	0.33	0.00	0.00
Centennial Bay	2.80	0.40	0.60	0.00	0.00	0.00	1.80	0.00	0.00
Christian Beach	2.17	0.00	0.00	0.00	0.00	0.00	2.17	0.00	0.00
Church Bay	1.52	0.43	0.05	0.00	0.00	0.00	0.95	0.00	0.10
Hockeystick Bay	5.54	0.42	2.83	0.08	0.00	0.00	1.75	0.00	0.46
Island 226	6.00	0.00	5.00	0.00	0.00	0.00	1.00	0.00	0.00
Lily Pond Bay	4.50	0.72	1.28	0.00	0.06	0.00	1.17	0.00	1.28
North Bay	3.33	0.06	1.39	0.00	0.11	0.00	1.50	0.00	0.28
Ojibway Bay	3.79	0.89	1.14	0.04	0.25	0.00	1.11	0.00	0.36
Papoose Bay	3.67	0.50	1.50	0.00	0.00	0.00	1.00	0.33	0.33
Treasure Bay	4.04	0.50	0.71	0.04	0.08	0.04	1.62	0.00	1.04
Turtle Bay	3.75	0.88	1.62	0.00	0.00	0.00	0.62	0.62	0.00

From the data shown in Table 1., there appears to be some variation in Anuran abundance across sites and species (this bodes well for our temporal analysis).

Let’s also visualize site-specific Anuran observations (including all years and habitat types) using a Boxplot:

There are outliers for several of our sites, most notably the ‘12 Miles Bay’ site. This is further reflected in Figure 2. below, which shows the distribution of average Anuran observations across sites (including all surveyed years and habitat types):

What if we want to visualize year-specific distributions? For example, the average number of Anurans observed each year in GBI National Park:

There is no obvious year-specific trend observable in Figure 2. But we will not abandon hope! Remember, we will also explore site-specific and species-specific temporal trends in Anuran abundance (see Part IV - Section B).

Let’s also visualize the mean number of Anurans observed in each habitat type in the Park:

The Land / Water Intercept (shoreline habitat) is where the majority of Anuran observations occur, perhaps suggesting the importance of this habitat-type for Anurans in the Park.

But which Anuran species are, on average, being observed the most in shoreline habitats?

American Bullfrog, Green Frog, and Northern Leopard Frog seem to be the most observed species in shoreline habitats.

From the preliminary Tables and Figures produced in Part IV - Section A, we have discovered:

There appears to be a healthy amount of variation in Anuran observations in different years and across different sites,

The Land / Water Intercept (shoreline) habitat is where the most Anurans are being observed, and

American Bullfrog, Green Frog, and Northern Leopard Frog are the most-observed species in the Park.

Now, let’s answer questions about the temporal trends in Anuran abundance in GBI National Park.

Part VI - Section B: Temporal Trends in Anuran Abundance

Our primary research questions concern the temporal dynamics of Anuran abundance in GBI National Park. Specifically we are interested in whether site-specific, species-specific, or habitat-specific changes occur in Anuran abundance over time.

Before diving into answering these questions, we should address an important aspect of our data that we alluded to in Part I and III of our EDA.

Specifically, recall that many sites were surveyed more or less than others (i.e. some sites were not surveyed every year (2005-2017)). Let’s visualize the extent of this in Table 2:

Table 2. Number of years Anuran surveys occurred for GBI National Park sites
site	n
12 Miles Bay	3
Beausoleil Point	1
Bone Dock	1
Centennial Bay	3
Christian Beach	2
Church Bay	5
Hockeystick Bay	8
Island 226	1
Lily Pond Bay	5
North Bay	6
Ojibway Bay	9
Papoose Bay	2
Treasure Bay	8
Turtle Bay	4

Quite concerning is the fact that several of our sites were only surveyed in one year out of a possible 13. Since our research questions are mostly concerned with temporal trends in Anuran abundance, we will exclude some sites from our analysis. Specifically, we will only consider sites that have more than 3 survey-years to discernany temporal trends in Anuran abundance.

We’ll begin by determining which sites have 3 or less survey-years, and create a new dataframe that excludes these sites:

no_yearAnurans <- no_yearAnurans %>% 
filter(n <= 3) %>% 
  slice_max(n, by = site)

site_3less <- as.list(no_yearAnurans$site)
site_3less

[[1]]
[1] "12 Miles Bay"

[[2]]
[1] "Beausoleil Point"

[[3]]
[1] "Bone Dock"

[[4]]
[1] "Centennial Bay"

[[5]]
[1] "Christian Beach"

[[6]]
[1] "Island 226"

[[7]]
[1] "Papoose Bay"

GBI_frogToad_yearSites <- GBI_frogToad.W[ ! GBI_frogToad.W$site %in% site_3less, ]

Now that we have selected sites for our temporal analyses, we will create a line graph of site-specific Anuran abundance by year:

It would seem from Figure 5. that Anuran abundance trends are quite stochastic at sites with >3 survey-years. However, notice that the ‘Ojibway Bay’ site (yellow) exhibits a steady decline in Anuran abundance over time.

Let’s isolate ‘Ojibway Bay’ in our plot to gain a clearer visualization of this trend:

Figure 6. shows an obvious decline in Anuran abundance at the ‘Ojibway Bay’ site over time. Let’s explore this site-specific trend even further, starting with a look at any potential species-specific trends at this site. That is, are specific species declining at the ‘Ojibway Bay’ site?

Observing Figure 7., it appears that there is a general, declining trend for four species observed at Ojibway Bay:

Green Frog (Lithobates clamitans),
American Bullfrog (Lithobates catesbeianus),
Mink Frog (Lithobates septentrionalis), and
Wood Frog (Lithobates sylvaticus).

Let’s visualize these four species specifically at ‘Ojibway Bay’:

There does appear to be a decline in these four species at the ‘Ojibway Bay’ site.

But is this trend generalizable to all sites with >3 survey-years? That is, is there evidence that these species-specific declines are occurring in GBI National Park more generally?

Figure 8. reveals that:

American Bullfrog abundance remains somewhat steady in GBI National Park over time, but that this species experiences a steadier, declining trend after ~2009,
Green Frogs experience a rather steep decline after 2006, only to rebound considerably in 2017, and
Mink Frogs have not been observed since 2008, and Wood Frogs not since 2006.

Although the ecological reasons for these declines are beyond the scope of our EDA, perhaps we can uncover some evidence by observing habitat-specific trends:

Based on the trends observed in Figure 9., Anuran abundance has decreased steadily, and significantly, in shoreline (Land / Water Intercept) habitats in the park. This could be indicative of several ecological factors, including shoreline development and potential species-specific trends. Recall from Part IV - Section (A) that Land / Water Intercept habitats accounted for the most Anuran observations in the Park. Now we have discovered a potential, significant decline in Anuran observations in a habitat type important to supporting Anurans in GBI National Park.

Shoreline destruction, and the subsequent loss of riparian habitat, could be a future area of research in the Park, as our EDA has uncovered a potential decline in Anuran abundance within habitat, and a more general decline in several Anuran species.

Conclusions

Our EDA has uncovered some interesting trends in Anuran abundance in GBI National Park that could warrant further ecological studies. Namely, we discovered a potential temporal decline in four Anuran species in GBI National Park. Furthermore, we have found evidence that Anurans inhabiting a specific habitat type, the shoreline, may be impacted the most. This could be due to changes in shoreline composition from human-development, which, Canada-wide, is a major driver of loss in aquatic species which rely on riparian ecozones for reproduction and habitat.

Returning to our research questions:

Question 1: Does Anuran Abundance change in GBI National Park over time?

Answer: For certain species, there appears to have been a decline in abundance at sites with >3 survey-years since surveys began in 2005.

Question 2: What are the site-specific and habitat type-specific dynamics of Anuran abundance over time?

Answer: Shoreline (Land / Water Intercept) dependent species seem to have declined the most. In relation to our third research question, this would include Bullfrogs, Mink Frogs, and Green Frogs (and, to a lesser extent, Wood Frogs, which are privy to temporary ponds during breeding and forest habitats post-breeding). This habitat-specific trend was generalizable to sites with >3 survey-years.

Question 3: Are any particular Anuran species declining in GBI National Park?

Answer: Bullfrog, Green Frog, Mink Frog, and Wood Frog all seem to have experienced declines since 2005 in the park. Especially concerning is the fact that Mink Frog and Wood Frog have not been observed at several sites since the mid-late 2000s.

Future studies in GBI National Park should prioritize shoreline (riparian) habitats, and investigate if alterations to these habitats, whether from anthropogenic sources or otherwise, may be impacting Anuran species.

References

Grant, E.H.C, Miller, D.A.W., & Muths, E. (2020). A Synthesis of Evidence of Drivers of Amphibian Declines. Herpetologica, 76(2): 101 - 107. https://doi.org/10.1655/0018-0831-76.2.101

Roy, P. (2017). Frog Abundance - Georgian Bay Islands. [ecological dataset]. Parks Canada. https://open.canada.ca/data/dataset/f586e5a7-671b-4a8f-8dbe-5053a684fd37. Note: this dataset is made available under an Open Government Licence (Canada): https://open.canada.ca/en/open-government-licence-canada.

All photographs in this report were created by the authour (CDennison).