This essay appeared in the May 2021 issue of Choice (volume 58 | issue 9)
Identifying patterns in large data sets is of increasing importance in many fields. The Covid-19 pandemic, in particular, has heightened public awareness of the capacity of large-scale data analysis to inform policy decisions and protect human health. Calculating the fundamental reproductive number of SARS-CoV-2 transmission and tracking its variation over time, for example, can help inform policy decisions about implementing lockdowns.(1) Moreover, availability of reliable information can have an immediate impact on risk behavior(2), and the availability of viral sequence data is helping track the emergence and spread of new, more contagious variants.(3)
Together, the growing visibility of big-data analysis and the burgeoning demand for workers with relevant skills have increased undergraduate interest in beginning courses in bioinformatics. Students seeking to enroll in such courses often hail from one of two distinct academic backgrounds. Some are computer science students eager for an opportunity to apply their computational skills to biomedical issues of urgent importance for public health. Others are biology students well versed in conceptual areas of biology, but minimally experienced in data analysis beyond standard statistical calculations applied to small data sets collected during hands-on lab sessions required as part of their coursework.
Instructors designing introductory bioinformatics courses that enroll students from both these groups face an extreme version of a challenge that is nearly universal in teaching: how to design a curriculum that serves the needs of a varied group of students. The goal of this bibliographic essay is to outline a core set of goals—and gather supporting books and online resources—for courses that serve the needs of a varied group of bioinformatics students, and prepare them either for graduate study or for beginning jobs in bioinformatics.(4) The essay focuses on the design of courses that use biomedical data—for example, information on disease incidence or biological sequence evolution—and help students to acquire basic skills in the R statistical computing language, on the thought that they will naturally proceed to learn Python, another language essential for bioinformatics, in a subsequent course. The essential goal of the approach outlined here is to foster development of students’ skills in collaboration and problem solving for data analysis. Several of the resources discussed here will be applicable for big-data courses across a wide range of disciplines and coding platforms.
1. Meredith Wadman, “United States Rushes to Fill Void in Viral Sequencing,” Science 371, no. 6530 (February 12, 2021): 657–58, https://doi.org/10.1126/science.371.6530.657.
2. Riccardo Gallotti, Francesco Valle, Nicola Castaldo, Pierluigi Sacco, and Manlio De Domenico, “Assessing the Risks of ‘infodemics’ in Response to COVID-19 Epidemics,” Nature Human Behaviour 4, no. 12 (October 2020): 1285-1293, https://www.nature.com/articles/s41562-020-00994-6.
3. Jan Brink Valentin, Henrik Møller, and Søren Paaske Johnsen, “The Basic Reproduction Number Can Be Accurately Estimated within 14 Days after Societal Lockdown: The Early Stage of the COVID-19 Epidemic in Denmark,” PloS One 16, no. 2 (published February 16, 2021), https://doi.org/10.1371/journal.pone.0247021
4. As extensively discussed in Paramjeet S. Bagga, “Development of an Undergraduate Bioinformatics Degree Program at a Liberal Arts College,” Yale Journal of Biology and Medicine 85, no. 3 (September 2012): 309–21, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3447195/.
Diane P. Genereux is a scientist at the Broad Institute of MIT and Harvard. She has taught courses in genetics, bioinformatics, and mathematical biology. Her research seeks to discover the genomic and epigenomic basis of cellular tolerance to variable blood glucose and body temperature in diverse mammalian species.