BIOL 5404 Biological Data Science in R
Course Description: This course introduces the practical skills needed to answer biological research questions with large and complex datasets in a reproducible manner. There is a growing demand for data analyses that are openly available, well annotated, and reusable by others. In this course, we will cover how to tidy, transform, visualize, model, and communicate datasets in R, with a focus on large and complex data. The course is aimed at graduate students with some prior experience with statistics at the undergraduate or graduate level, but no prior experience with R is required.
| Professor: Dr. Roz Dakin | Office: CTTC 4440 |
| Fridays, 11:35 - 14:25 In person, synchronous course |
Southam Hall 303 |
Goals for this Course
- Develop proficiency in the R programming language
- Practice using R to explore, wrangle, graph, and model realistic (complex) research datasets
- Gain experience troubleshooting (analyzing & solving errors in R code)
- Apply (learn to use) a new data technique and/or package in R
- Create an independent data analysis following principles of reproducible research
- Use R markdown to communicate your work
Resources
We will use these two (free!) books:
- Introduction to Data Science by Rafael A. Irizarry
- R for Data Science 2e by Wickham, Çetinkaya-Rundel & Grolemund
We may also use Advanced R 2e by Hadley Wickham
Assignments
Evaluation
| Item | Weight | Details |
|---|---|---|
| Quizzes | 10% | Short MC quizzes on Brightspace each week |
| Assignments | 40% | 6 total, only your best 5 will count |
| Peer evaluation | 5% | On each assignment, you will grade your peers and provide feedback on their code |
| Independent report | 45% | Communicate the results of your own data wrangling and exploratory analysis, done individually |
Weekly Schedule
| Week | Date | Topics | Activities |
|---|---|---|---|
| 1 | Jan 10 | Intro, philosopy, tidy data |
|
| 2 | Jan 17 | R basics |
|
| 3 | Jan 24 | Programming basics |
|
| 4 | Jan 31 | tidyverse & data |
|
| 5 | Feb 7 | tidyverse & data |
|
| 6 | Feb 14 | Visualization, part 1 |
|
| WINTER BREAK |
|
||
| 8 | Feb 28 | Visualization, part 2 |
|
| 9 | Mar 7 | Exploratory data analysis |
|
| 10 | Mar 14 | ** No formal meeting ** |
|
| 11 | Mar 21 | Strings and dates |
|
| 12 | Mar 28 | markdown and git |
|
| Apr 13 | Independent report due |
Late Policy
Work that is not submitted by the deadline will receive a grade of 0, unless you have made a specific agreement with me before the deadline.
Academic Integrity
As a class, we will comply with Carleton’s guidelines on academic integrity.
It’s OK (and encouraged) to work with other students on weekly assignments and quizzes in this course. I strongly encourage you to help each other out when troubleshooting in R. When working together, you should strive to contribute so that working together is mutually beneficial.
The exploratory data analysis and independent report must represent your own original investigation of the data (though you can obtain the data anywhere you wish – there is no requirement that you generated the data).
As part of this course, you will also be expected to evaluate and provide feedback on work by your peers. Please respect each other. I expect you to contribute by providing your peers with thoughtful feedback, and to be fair and honest in your peer evaluations.
Statement on ChatGPT/Generative AI usage
I encourage you to use AI tools to help with troubleshooting in R.
As our understanding of the uses of AI and its relationship to student work and academic integrity continue to evolve, students are required to discuss their use of AI in any circumstance not described here with the course instructor to ensure it supports the learning goals for the course.
Accommodations
Please review the course schedule and contact me with any requests for academic accommodation during the first two weeks of class, or as soon as possible after the need is known to exist.