Working in bioinformatics requires fluency across traditionally discrete disciplines, including genetics, genomics, coding, probability and statistics, and scientific communication. While a course must address each of these skill sets in turn, an exceptionally challenging—and valuable—experience for students is to undertake a collaborative class project that draws on this full skill set to formulate a scientific question, find and analyze existing data sets, and present findings through figures and writing.
Some students may arrive to class with ideas for a question they wish to address, and even for relevant, publicly available data sets they want to use. For most students, though, it will be useful that the instructor provide a list of suggested resources to inspire questions, and to enable assessment of what may be feasible. Some of the more complex problems presented on the Rosalind site may be suitable for group projects. Instructors seeking inspiration for developing problem sets of their own may wish to consult Phillip Compeau and Pavel Pevzner’s Bioinformatics Algorithms, which interacts with Rosalind and presents bioinformatics approaches in the context of challenging questions that the field has yet to resolve, with high potential to pique students’ interest.
For students ready to take a more active role in formulating questions, it will be useful to suggest data from a wide range of research areas and sources. The National Center for Health Statistics website, made available by the Centers for Disease Control and Prevention (CDC), offers a rich set of resources including life tables for the US population that incorporate data from the 1800s to the present. MD Anderson Cancer Center: Public Data Sets, published online by the University of Texas, includes data on cancer, and Workshop on Microbiome Data Package in Bioconductor as published by the Bioconductor open source software community offers the advantage of being readily accessible via R. Just as the public health and societal impacts of Covid-19 will be deeply familiar to students, so also the World Health Organization’s online collection COVID-19: Global Literature on Coronavirus Disease and state-level public health data—for example, Coronavirus / Michigan Data published on the Michigan.gov site—may be of particular interest. For global data, Our World in Data, published by the UK Global Change Data Lab, offers infection-incidence data annotated by nation and age group. Students may also explore health inequities using, for example, the DATA2020 search provided by the US Office of Disease Prevention and Health Promotion.