This Exploratory Data Analysis (EDA) will explore the paleobiogeography of dinosaur families in North America at the end of the Cretaceous Period.
The last few thousand years of the Cretaceous Period in North America are represented by a series of geological formations bearing dinosaur fossils (Scollard, Frenchman, upper Hell Creek, Lance, and Denver, among others). These formations present a unique opportunity to study Cretaceous ecosystems in a temporally-constrained window, across a wide geographical range, representing variable environments.
The southern formations including the upper Hell Creek, Lance, and Denver formations represent more coastal settings, due to their proximity to a receding interior seaway. The Frenchman and Scollard formations in Canada represent more northern habitats further from the coast.
The “cret_dino_abun.csv” file represents a working dataset of dinosaur fossil abundances collected from major museums and institutions across North America. Information includes:
Geological formation of origin (Scollard, Frenchman, Hell Creek, Lance, or Denver)
Abbreviation of the institution where specimens are housed
Dinosaur family/clade
County or general area of locality
Note: To maintain a large sample size, one instance of a fossil can range from a single tooth to an entire skeleton.
The “locality_dat.csv” file represents a complimentary dataset on the county or area used in the “cret_dino_abun.csv” dataset. Information includes:
County or general area of locality
Average latitude and longitude of the county/area (center of the county)
Adjusted latitude and longitude of county/area (area of fossil localities)
State or province name
State or province abbreviation
Note: The average latitude and longitude values were obtained from location data available on Wikipedia. Adjusted latitude and longitude values were estimated based on locality data of fossils from the county.
The objective of the overall project is to investigate if there are any detectable differences in the relative abundance of dinosaur families in certain areas, and if these differences represent significant variations in the composition of dinosaur communities across North America.
In this EDA, the objective is to visualize the distribution of dinosaur fossils across North America and identify potential geographical trends in the relative abundance of dinosaur families/clades that should be pursued further as the project continues.
This R markdown HTML document was built with R version 4.3.2.
If you wish to see the R code used throughout this report, click on the ‘Show’ buttons.
Ensure that the following packages are properly installed.
devtools
tidyverse
rnaturalearth
rnaturalearthdata
sf
ggrepel
gridExtra
maps
RColorBrewer
knitr
library(devtools)
library(tidyverse)
library(rnaturalearth)
library(rnaturalearthdata)
library(sf)
library(ggrepel)
library(gridExtra)
library(maps)
library(RColorBrewer)
library(knitr)
The package “rnaturalearthhires” requires manual installing using devtools
devtools::install_github("ropensci/rnaturalearthhires")
Note: Ensure that the working directory is set to the current folder and all required csv files are in the working directory.
Import the required files into R. In each case, check that the data loaded in correctly by looking at the top of the data set.
dino <- read_csv("cret_dino_abun.csv")
# Show first 6 lines
kable(head(dino), format = "html", table.attr = "class='table table-striped table-hover table-bordered', margin:auto;'")
| Geological Formation | Institution | Geographical Area | Dinosaur Family | Abundance |
|---|---|---|---|---|
| Scollard | AMNH | Dry Island | Ankylosauridae | 1 |
| Scollard | CMN | Dry Island | Ceratopsidae | 2 |
| Scollard | CMN | Dry Island | Thescelosauridae | 1 |
| Scollard | CMN | Dry Island | Tyrannosauridae | 3 |
| Scollard | CMN | Dry Island | Leptoceratopsidae | 3 |
| Scollard | CMN | Dry Island | Ankylosauridae | 1 |
“cret_dino_abun.csv” will be referred to as “dino”
local <- read_csv("locality_dat.csv")
# Show first 6 lines
kable(head(local), format = "html", table.attr = "class='table table-striped table-hover table-bordered', margin:auto;'")
| Geographical Area | Latitude | Longitude | Adjusted Latitude | Adjusted Longitude | State/Province | abbrev |
|---|---|---|---|---|---|---|
| Dry Island | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| GNP | 49.04 | -106.57 | 49.04 | -106.57 | Saskatchewan | SK |
| Eastend | 49.38 | -108.51 | 49.38 | -108.51 | Saskatchewan | SK |
| Denver | 39.74 | -104.98 | 39.74 | -104.98 | Colorado | CO |
| Garfield | 47.28 | -106.99 | 47.69 | -106.92 | Montana | MT |
| Rosebud | 46.23 | -106.72 | 46.26 | -106.59 | Montana | MT |
“locality_dat.csv” will be referred to as “local”
The following section performs a series of operations to tidy and clean the individual data frames, merge them, and clean the combined data frame. Details of the changes are annotated in the code.
Adjusted column names
Checked that values in each column are reasonable
Tidied the abundance column by separating the aggregate abundances
# Adjust column names to shorten and include no special characters
names(dino)[1] <- 'fm'
names(dino)[2] <- 'inst'
names(dino)[3] <- 'area'
names(dino)[4] <- 'fam'
names(dino)[5] <- 'abun'
# Formation column (fm) should consist only of Scollard, Frenchman, Hell Creek, Lance, and Denver
unique(dino$fm) # All values are correct
# Check institution column (inst) for typos
unique(dino$inst) # All values correct
# Check area column (area) for typos
unique(dino$area) # All values are correct
# Check family (fam) column for typos/duplicates
unique(dino$fam) # All values are correct
# Abundance column should have reasonable values greater than 0
range(dino$abun) # Range from 0 to 474. The zeros are unnecessary so it will be removed
dino <- dino[dino$abun != 0, ]
# Overwrite the dino df with data that has values greater than 0 in the abundance column
# Double check that all zero values have been removed
range(dino$abun) # Range from 1 to 474
# Since the abundance column is an aggregate of observations, it is not tidy. To tidy the dataset:
dino <- dino %>%
uncount(weights = abun, .remove = TRUE)
# Now each column is a variable and row an observation
Adjusted column names
Checked that values in each column are reasonable
# Adjust column names to shorten and include no special characters
names(local)[1] <- 'area'
# These values compliment those in the dino df and are given the same column name
names(local)[2] <- 'lat'
names(local)[3] <- 'long'
names(local)[4] <- 'adj_lat'
names(local)[5] <- 'adj_long'
names(local)[6] <- 'st_pr'
names(local)[7] <- 'st_pr_abb'
# Each area in the dino df needs complimentary data in the local df
length(unique(dino$area)) == length(unique(local$area)) # Yields true. Same number of areas in both data frames
# Latitude values should range from 0 to 90 (northern hemisphere)
range(local$lat)
range(local$adj_lat)
# Both acceptable values between 39.12 (Colorado) and 51.94 (Alberta)
# Longitude values should range from -180 to 0 (western hemisphere)
range(local$long)
range(local$adj_long)
# Both acceptable values (all negative and between -100 and -120)
# Check all state and province names are spelled correctly
unique(local$st_pr) # All Correct values
# Check all abbreviations are correct
unique(local$st_pr_abb) # All correct values
# Check that each abbreviation corresponds to the correct state/province:
local %>%
select(st_pr, st_pr_abb) %>%
distinct()
# Each state/province has the correct abbreviation
The two data frames are combined into “dino_loc”, which ties together the fossil information with locality information. Check that the data frames merged correctly by looking at the top of the new data frame. This data frame will be used for the visualizations.
# Merge the two data frames into a new data frame
dino_loc <- left_join(dino, local, by = 'area')
# Show first 6 lines
kable(head(dino_loc), format = "html", table.attr = "class='table table-striped table-hover table-bordered', margin:auto;'")
| fm | inst | area | fam | lat | long | adj_lat | adj_long | st_pr | st_pr_abb |
|---|---|---|---|---|---|---|---|---|---|
| Scollard | AMNH | Dry Island | Ankylosauridae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| Scollard | CMN | Dry Island | Ceratopsidae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| Scollard | CMN | Dry Island | Ceratopsidae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| Scollard | CMN | Dry Island | Thescelosauridae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| Scollard | CMN | Dry Island | Tyrannosauridae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
| Scollard | CMN | Dry Island | Tyrannosauridae | 51.94 | -112.96 | 51.94 | -112.96 | Alberta | AB |
Checked and adjusted formation names to match a geographical region
Checked for any NA values
# Check that each county matches a corresponding formation
dino_loc %>%
select(area, fm, st_pr_abb) %>% # Selects only area, formation, and the state/province abbreviation
distinct() %>% # Select only the distinct combinations
group_by(area) %>% # Group them by the area column
filter(n() > 1) %>% # Filters to groups where the number of occurrences of area is duplicated
arrange(area) # Arranges the answers by area so it is easy to see the duplicated areas
### Note: Differences may have resulted due to the age of the collections. Older records may refer to formations as Lance, regardless of location.
# The name of each formation corresponds to a province/state(s). The Hell Creek Formation is the only one spread over multiple states (MT, SD, and ND) and the rest of the Formations are restricted to a state/province.
# Adjust so that the formation labels correspond to the correct state/province:
dino_loc$fm[dino_loc$st_pr_abb == 'MT' | dino_loc$st_pr_abb == 'SD' | dino_loc$st_pr_abb == 'ND'] <- 'Hell Creek'
dino_loc$fm[dino_loc$st_pr_abb == 'WY'] <- 'Lance'
dino_loc$fm[dino_loc$st_pr_abb == 'CO'] <- 'Denver'
dino_loc$fm[dino_loc$st_pr_abb == 'SK'] <- 'Frenchman'
dino_loc$fm[dino_loc$st_pr_abb == 'AB'] <- 'Scollard'
# Check that there are no NA values in the data frame
unique(is.na(dino_loc)) # All returns FALSE. No NA values
The following section performs modifications to the data frame to fit the needs of this EDA. Details of the changes are annotated on the code.
Updated and consolidated dinosaur family/clade names
Ordered formations from North to South
# Some classifications of the fossils are outdated or incorrect, and should be consolidated.
# The groups that need to be consolidated are as follows:
# Caenagnathidae <- Avimimidae, Oviraptoridae
# Tyrannosauridae <- Megalosauridae
# Hadrosauridae <- Iguanodontidae
# Dromaeosauridae <- Small Theropod
# Thescelosauridae <- Hypsilophodontidae
# To adjust these names:
dino_loc$fam[dino_loc$fam == 'Avimimidae' | dino_loc$fam == 'Oviraptoridae'] <- 'Caenagnathidae'
# Avimimidae is a dubious basal lineage
# Oviraptoridae only known from Asia
# Both consolidated into the closely related Caenagnathidae
dino_loc$fam[dino_loc$fam == 'Megalosauridae'] <- 'Tyrannosauridae'
# Megalosauridae not known from the Cretaceous of North America
# Consolidated into Tyrannosauridae, the only large theropod in North America at the time
dino_loc$fam[dino_loc$fam == 'Iguanodontidae'] <- 'Hadrosauridae'
# Iguanodontidae not known from the Cretaceous of North America
# Consolidated into Hadrosauridae, a related group that is abundant in the Late Cretaceous
dino_loc$fam[dino_loc$fam == 'Small Theropod'] <- 'Dromaeosauridae'
# Small theropod was a descriptor used by the RSM for unidentified small theropod teeth and made up a sizable portion of the Frenchman Formation material
# Tentatively consolidated into Dromaeosauridae, since the teeth of other groups of small theropods are fairly easily diagnosable
dino_loc$fam[dino_loc$fam == 'Hypsilophodontidae'] <- 'Thescelosauridae'
# Thescelosaurus was previously placed into Hypsilophodontidae
# Consolidated into Thescelosauridae, the new family of Thescelosaurus
# To check that names were adjusted accordingly
unique(dino_loc$fam)
# Order the Formations from North to South
dino_loc <- dino_loc %>%
mutate(fm = factor(fm, levels = c("Scollard", "Frenchman", "Hell Creek", "Lance", "Denver")))
The current sample size of this data set is 9328 fossils representing 15 dinosaur families or clades.
The data set contains records from 23 institutions, collected from across 7 states and provinces in North America.
The following section will visualize the data in various formats, highlighting different aspects of the data.
The following graphs represent basic visualizations of the number of dinosaur fossils when sorted by geological formation, institution, geographical area, and dinosaur family/clade.
This graph shows the number of dinosaur fossils recorded per geological formation. The graph illustrates a clear sampling bias in the Hell Creek and Lance formations.
abun_fm <- dino_loc %>%
count(fm) %>%
mutate(fm = factor(fm, levels = fm[order(-n)])) %>% # Order from largest to smallest values
ggplot(aes(x = fm, y = n)) +
geom_col() +
labs(x = "Formations", y = "Number of Specimens") +
theme_bw()
print(abun_fm)
This graph shows the number of dinosaur fossils recorded at each institution. The graph illustrates that most of the Cretaceous dinosaur fossils are housed in a small number of museums.
abun_inst <- dino_loc %>%
count(inst) %>%
mutate(inst = factor(inst, levels = inst[order(-n)])) %>%
ggplot(aes(x = inst, y = n)) +
geom_col() +
labs(x = "Institutions", y = "Number of Specimens") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(abun_inst)
This graph shows the number of dinosaur fossils recorded from each geographical area. The areas are sorted based by its corresponding geological formation. The graph illustrates that most fossils for a given formation comes from a few productive areas. This is important to note since it reveals that not all areas will be useful for this analysis.
abun_area <- dino_loc %>%
group_by(fm, area) %>%
count(area) %>%
arrange(fm, desc(n)) %>%
mutate(area = factor(area, levels = unique(area)[order(-n)])) %>%
ggplot(aes(x = area, y = n, fill = fm)) +
geom_col() +
labs(x = "Geographical Areas", y = "Number of Specimens", fill = "Formation") + # Edit the legend title
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(abun_area)
This graph shows the number of dinosaur fossils recorded for each dinosaur family/clade. The graph illustrates that the most common group of dinosaurs during the Cretaceous (based on fossils) are the large herbivore groups, Ceratopsidae (horned dinosaurs) and Hadrosauridae (duck-billed dinosaurs).
abun_fam <- dino_loc %>%
count(fam) %>%
mutate(fam = factor(fam, levels = fam[order(-n)])) %>%
ggplot(aes(x = fam, y = n)) +
geom_col() +
labs(x = "Families/Clades", y = "Number of Specimens") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(abun_fam)
The following graphs represent visualizations focusing on the distribution of these abundance values across geological formations.
This graph shows the number of dinosaur fossils, per clade, present in each formation. This illustrates how common the fossils of a particular group are in a formation.
# Dinosaur abundance by formation
abun_fam_fm <- dino_loc %>%
count(fam, fm) %>%
ggplot(aes(x = fam, y = n, fill = fm)) +
geom_col(position = "dodge") +
labs(x = "Families/Clades", y = "Number of Specimens", fill = "Formations") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(abun_fam_fm)
Note: This graph shows abundances but the different sample sizes of each formation (see graph 1) makes lesser sampled formations harder to compare. A better alternative to make formations comparable is to use relative abundances.
A new data frame with a relative abundance value (r_abun) is required. The relative abundance represents the percentage of the total abundance that a specific group represents. This new data frame will be called “dino_abun”.
# Create a new data frame with relative abundance of dinosaur clades by formation
dino_abun <- dino_loc %>%
group_by(fm, fam) %>%
summarise(total_abun = sum(n())) %>%
mutate(r_abun = (total_abun / sum(total_abun))*100) %>%
ungroup() %>%
complete(fm, fam, fill = list(total_abun = 0, r_abun = 0)) # Filling out certain missing clades in formations with 0
This graph shows the relative abundance of dinosaur clades present in each formation. This better illustrates how common the fossils of a particular group are in each formation and makes comparisons between formations more practical. We can see clear trends where certain groups are more abundant in specific formations.
# Relative abundance of dinosaur clades by formation
r_abun_fam_fm <- ggplot(dino_abun, aes(x = fam, y = r_abun, fill = fm)) +
geom_col(position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "Family", y = "Relative Abundance (%)", fill = "Formation")
print(r_abun_fam_fm)
Note: Although this graph better illustrates abundance trends between formations, some formations can cover vast geographical areas. To get more detailed data, it might be necessary to further subdivide the formations. To do this, we need to look at the distribution of the areas and localities producing these fossils.
The following graphs represent the distribution of the above data across North America. The following map visualizations require map shape files to be loaded.
world <- map_data("world")
states <- ne_states(country = "united states of america", returnclass = "sf") # Load state boundaries
provinces <- ne_states(country = "canada", returnclass = "sf") # Load province boundaries
us_counties <- map_data("county") # Load US county boundaries
states_provinces <- rbind(states, provinces) # Combine state and province boundaries
This graph plots all counties and areas with records of dinosaur fossils. The graph illustrates areas where Late Cretaceous layers are exposed. Note that the Hell Creek Formation covers a vast geographical area spanning many states. In the case of Canadian areas without counties, the actual locality is plotted.
dino_dist_area <- ggplot(data = world) +
geom_map(map = world, aes(map_id = region), fill = "white", color = "black") +
geom_map(data = us_counties, map = us_counties, aes(map_id = region), fill = NA, color = "grey") + # Add US county boundaries
geom_sf(data = states_provinces, color = "black", fill = NA) + # Add state and province boundaries
geom_point(data = dino_loc, aes(x = long, y = lat, color = fm), size = 2) +
coord_sf(xlim = c(-115, -100), ylim = c(38, 53), expand = FALSE) + # Set limits for the map
labs(x = 'Longitude', y = "Latitude", color = "Formations") +
theme(axis.text.x = element_text(angle = -45, hjust = -0.1))
print(dino_dist_area)
Note: Although this graph illustrates the areas with dinosaur fossils, each point only indicates the presence of fossils in the specific county, not the actual location fossils are coming from. A better illustration would be to look at the distribution of localities (adjusted latitude and longitude values).
This graph plots the estimated location of the major fossil-bearing localities within each county. The graph better illustrates areas where Late Cretaceous layers are exposed. In some cases, the locality data was unable to be obtained, and has been left as the original point.
dino_dist_loc <- ggplot(data = world) +
geom_map(map = world, aes(map_id = region), fill = "white", color = "black") +
geom_map(data = us_counties, map = us_counties, aes(map_id = region), fill = NA, color = "grey") + # Add US county boundaries
geom_sf(data = states_provinces, color = "black", fill = NA) + # Add state and province boundaries
geom_point(data = dino_loc, aes(x = adj_long, y = adj_lat, color = fm), size = 2) +
coord_sf(xlim = c(-115, -100), ylim = c(38, 53), expand = FALSE) + # Set limits for the map
labs(x = 'Longitude', y = "Latitude", color = "Formations") +
theme(axis.text.x = element_text(angle = -45, hjust = -0.1))
print(dino_dist_loc)
Note: Some localities are represented by only a few isolated specimens. These can be filtered out to better illustrate the concentrations of fossils.
filtered_dino_dist_loc <- dino_loc %>%
group_by(area, adj_long, adj_lat, fm) %>%
summarise(n = n()) %>%
filter(n > 100) ### This filter can be adjusted to filter out sites with n number of specimens
This graph restricts the plots to localities with over 100 recorded specimens. This graph better illustrates clusters or “hot-spots” for dinosaur fossils.
fil_dino_dist_loc <- ggplot(data = world) +
geom_map(map = world, aes(map_id = region), fill = "white", color = "black") +
geom_map(data = us_counties, map = us_counties, aes(map_id = region), fill = NA, color = "grey") + # Add US county boundaries
geom_sf(data = states_provinces, color = "black", fill = NA) + # Add state and province boundaries
geom_text_repel(data = filtered_dino_dist_loc, aes(x = adj_long, y = adj_lat, label = area),
size = 2.5, hjust = 0.5, vjust = 0) +
geom_point(data = filtered_dino_dist_loc, aes(x = adj_long, y = adj_lat, color = fm), size = 2) +
coord_sf(xlim = c(-115, -100), ylim = c(38, 53), expand = FALSE) + # Set limits for the map
labs(x = 'Longitude', y = "Latitude", color = "Formations") +
theme(axis.text.x = element_text(angle = -45, hjust = -0.1))
print(fil_dino_dist_loc)
By filtering to areas with over 100 specimens, we can see fairly obvious clusters of sites
Scollard - Dry Island
Frenchman - Eastend, GNP
Hell Creek NW - Garfield, McCone
Hell Creek SE - Carter, Slope, Fallon, Harding
Lance - Weston, Niobrara
Denver - Denver
The previous plot can be adjusted to plot localities with less than 100 specimens. Some localities with less than 100 specimens can be added to the previously defined clusters based on distance. Black points on the map indicate localities with less than 100 specimens.
ggplot(data = world) +
geom_map(map = world, aes(map_id = region), fill = "white", color = "black") +
geom_map(data = us_counties, map = us_counties, aes(map_id = region), fill = NA, color = "grey") + # Add US county boundaries
geom_sf(data = states_provinces, color = "black", fill = NA) + # Add state and province boundaries
geom_text_repel(data = unique(dino_loc[, c("adj_long", "adj_lat", "area")]),
aes(x = adj_long, y = adj_lat, label = area),
size = 2, hjust = 0, vjust = 0,
max.overlaps = Inf) +
geom_point(data = dino_loc, aes(x = adj_long, y = adj_lat), size = 1) +
geom_point(data = filtered_dino_dist_loc, aes(x = adj_long, y = adj_lat, color = fm), size = 2) +
coord_sf(xlim = c(-115, -100), ylim = c(38, 53), expand = FALSE) + # Set limits for the map
labs(x = 'Longitude', y = "Latitude", color = "Formations") +
theme(axis.text.x = element_text(angle = -45, hjust = -0.1))
Some areas can be added to major clusters, and although other clusters exist, the sample sizes are too small to be meaningful.
Major groupings
Scollard - Dry Island
Frenchman - Eastend, GNP
Hell Creek NW - Garfield, McCone
Hell Creek SE - Carter, Slope, Fallon, Harding + Powder River, Bowman
Lance - Weston, Niobrara + Converse
Denver - Denver + Jefferson
Other groupings
Hell Creek E - Sioux, Corson, Ziebach
Hell Creek S - Butte, Perkins, Meade
Lance NW - Park, Big Horn
Lance SW - Sweetwater, Carbon
Undefined
Hell Creek - Dawson, Rosebud, Petroleum, Billings, Morton
Lance - Natrona, Goshen, Hot Springs
Add a new column to dino_loc which separates the formations into geographical subdivisions, outlined above.
# Assign subdivisions based on the values in the 'area' column
dino_loc$subdivision[dino_loc$area %in% c("Dry Island")] <- "Scollard"
dino_loc$subdivision[dino_loc$area %in% c("Eastend", "GNP")] <- "Frenchman"
dino_loc$subdivision[dino_loc$area %in% c("Garfield", "McCone")] <- "Hell Creek NW"
dino_loc$subdivision[dino_loc$area %in% c("Carter", "Slope", "Fallon", "Harding", "Powder River", "Bowman")] <- "Hell Creek SE"
dino_loc$subdivision[dino_loc$area %in% c("Weston", "Niobrara", "Converse")] <- "Lance E"
dino_loc$subdivision[dino_loc$area %in% c("Denver", "Jefferson")] <- "Denver"
dino_loc$subdivision[dino_loc$area %in% c("Sioux", "Corson", "Ziebach")] <- "Hell Creek E"
dino_loc$subdivision[dino_loc$area %in% c("Butte", "Perkins", "Meade")] <- "Hell Creek S"
dino_loc$subdivision[dino_loc$area %in% c("Park", "Big Horn")] <- "Lance NW"
dino_loc$subdivision[dino_loc$area %in% c("Sweetwater", "Carbon")] <- "Lance SW"
This can be plotted on the previous map to better visualize how the clusters are geographically organized. Most clusters show fairly obvious separation from each other. However, some clusters (Hell Creek SE, E, and S) are harder to distinguish, as they appear to be parts of a larger exposure.
ggplot(data = world) +
geom_map(map = world, aes(map_id = region), fill = "white", color = "black") +
geom_map(data = us_counties, map = us_counties, aes(map_id = region), fill = NA, color = "grey") +
geom_sf(data = states_provinces, color = "black", fill = NA) +
geom_point(data = dino_loc, aes(x = adj_long, y = adj_lat, color = subdivision), size = 2) +
coord_sf(xlim = c(-115, -100), ylim = c(38, 53), expand = FALSE) +
labs(x = 'Longitude', y = "Latitude", color = "Fm Subdivisions") +
scale_color_manual(values = brewer.pal(11, "Paired")) +
theme(axis.text.x = element_text(angle = -45, hjust = -0.1))
A new data frame, “dino_abun_sd”, is created with the relative abundance of dinosaur clades per subdivision. Subdivisions with less than 100 total specimens will be disregarded.
# Create a new data frame with relative abundance of dinosaur clades by subdivision
dino_abun_sd <- dino_loc %>%
group_by(subdivision, fm, fam) %>%
summarise(total_abun = sum(n())) %>%
mutate(r_abun = (total_abun / sum(total_abun))*100) %>%
group_by(subdivision) %>%
filter(sum(total_abun) >= 100) %>%
mutate(subdivision = factor(subdivision, levels = c("Scollard", "Frenchman", "Hell Creek NW", "Hell Creek SE", "Lance E", "Denver"))) %>%
filter(!is.na(subdivision))
This graph shows the relative abundance of dinosaur families per subdivision (with over 100 specimens). There are potentially N-S trends in relative abundance of certain groups.
# Relative abundance of dinosaur clades by subdivision
r_abun_fam_subdivision <- ggplot(dino_abun_sd, aes(x = fam, y = r_abun, fill = subdivision)) +
geom_col(width = 0.8, position = "dodge") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(x = "Family", y = "Relative Abundance (%)", fill = "Subdivision")
print(r_abun_fam_subdivision)
This table summarizes the relative abundances (in percent) of each family/clade, categorized by each subdivision.
dino_abun_sd_table <- dino_abun_sd %>%
group_by(fam, subdivision) %>%
summarise(r_abun = sum(r_abun)) %>%
pivot_wider(names_from = subdivision, values_from = r_abun)
names(dino_abun_sd_table)[1] <- 'Family/Clade'
dino_abun_sd_table <- dino_abun_sd_table[,c('Family/Clade', "Scollard", "Frenchman", "Hell Creek NW", "Hell Creek SE", "Lance E", "Denver")]
kable(dino_abun_sd_table, format = "html", digits = 2, table.attr = "class='table table-striped table-hover table-bordered', margin:auto;'")
| Family/Clade | Scollard | Frenchman | Hell Creek NW | Hell Creek SE | Lance E | Denver |
|---|---|---|---|---|---|---|
| Alvarezsauridae | NA | 0.09 | 0.11 | 0.07 | 0.10 | NA |
| Ankylosauridae | 6.77 | 0.84 | 0.97 | 0.66 | 1.19 | NA |
| Caenagnathidae | 0.75 | 1.31 | 0.07 | 1.25 | 0.13 | NA |
| Ceratopsidae | 15.79 | 26.92 | 27.13 | 29.74 | 32.36 | 32.63 |
| Dromaeosauridae | 9.96 | 29.27 | 14.57 | 5.86 | 7.46 | 4.56 |
| Hadrosauridae | 5.83 | 15.67 | 15.04 | 29.96 | 27.32 | 31.93 |
| Leptoceratopsidae | 6.02 | 0.09 | NA | 0.59 | 0.27 | NA |
| Nodosauridae | 0.38 | 0.09 | 0.25 | 0.22 | 0.36 | NA |
| Ornithomimidae | 8.46 | 6.10 | 3.17 | 4.25 | 1.46 | 2.46 |
| Pachycephalosauridae | 2.44 | 0.47 | 1.33 | 3.37 | 1.33 | NA |
| Paronychodon sp. | 2.82 | 0.38 | 5.27 | 0.88 | 10.54 | 2.81 |
| Richardoestesia sp. | 5.64 | 1.31 | 18.40 | 2.93 | 2.79 | 21.05 |
| Thescelosauridae | 7.71 | 5.91 | 2.42 | 7.55 | 4.51 | 0.70 |
| Troodontidae | 1.50 | 0.84 | 1.15 | 1.17 | 4.38 | 0.70 |
| Tyrannosauridae | 25.94 | 10.69 | 10.10 | 11.50 | 5.80 | 3.16 |
Northern-most Scollard Formation is the only formation to show a relative abundance of over 5% for Ankylosauridae and Leptoceratopsidae
Ornithomimidae and Thescelosauridae potentially show a gradual decrease in abundance from N to S
Alvarezsauridae and Nodosauridae are consistently rare in all formations
Ceratopsidae and Hadrosauridae are consistently abundant. Hadrosauridae show a larger difference in relative abundance between northern and southern areas (generalist vs specialist?)
Dominant small theropod groups (Dromaeosauridae, Paronychodon sp., Richardoestesia sp., Troodontidae) appear to vary by area
Troodontidae appear fairly consistently across all formations, although they are the most common in the Lance Formation
Dromaeosauridae are the dominant small theropod in the Frenchman Formation (which is possibly an artifact of lumping “small theropod” fossils into Dromaeosauridae). This will be further investigated in the near future
Paronychodon sp. are the dominant small theropod in the Lance Formation
Richardoestesia sp. are the dominant small theropod in both the Hell Creek and Denver Formations
Potentially irregular abundance of Tyrannosauridae specimens in
the Scollard Formation (this may affect the relative abundances of other
major groups in the Scollard)
This EDA revealed a number of potential trends to be pursued further as this project progresses. It will be necessary to statistically test the trends observed in this EDA but the presence of any potential trends at this stage is promising. If these trends are statistically significant and continue to persist with the addition of new specimens, it may provide valuable new data on the habitat preference and ecological interactions between various dinosaur groups immediately prior to the end-Cretaceous mass extinction.