Skip to Main Content

Resources for Undergraduate Courses in Bioinformatics: Coding in R

By Diane P. Genereux

Coding in R

For many biology majors, coding will be the most forbidding aspect of an introductory bioinformatics course. The set of resources suggested here assumes that courses will aim for students to develop basic skills in R, a statistical computing language that is a mainstay of bioinformatic analysis and data visualization. These materials are selected with the goal of minimizing apprehension for students without experience in coding. Resources recommended for use earlier in the course are aimed at guiding students through the basic operations of using R. The selection of resources for use later in the course is informed by the conviction that trial and error is a major part of the programming process even for experienced professionals. To help students become comfortable with this “experimental” approach to problem solving, this section focuses on resources that will help students learn to decide for themselves how to address a given challenge and how to find appropriate guidance, offering them greater value than would be gained by learning any specific set of commands.

For biology students in particular, introductions to R that emphasize the power of even simple coding skills, presenting approaches clearly and offering examples rooted in biology, will be especially valuable. Getting Started with R: An Introduction for Biologists by Andrew Beckerman, Dylan Childs, and Owen Petchey is highly effective in presenting a rudimentary introduction to coding, but does not emphasize many specific biological examples. By contrast Practical R for Biologists by Donald Quicke, Buntika Butcher, and Rachel A. Kruft Welton is comparatively light in its presentation of rudimentary coding, but offers a rich set of specific examples and associated data sets that demonstrate the capacity of R in the context of case studies from ecology and evolution. In doing so, this work is exceptional in its capacity to link potentially arcane coding processes to specific concepts and questions that may be familiar to biology majors. Notably, the examples featured in this volume embrace the “messiness” typical of biological data, orienting students from the outset to the realities of data analysis. Meanwhile, as many introductory biology courses encourage students to use Excel for data analysis, another option is for a first course in bioinformatics to focus explicitly on helping students translate familiar Excel formulae into R code. Instructors taking this approach may wish to use R for Excel Users by John Taveras, which approaches data formatting and basic analyses from the perspective of a seasoned Excel user.

Though devoting some course time to introducing R will make it easier for students to transition into graduate study or jobs in bioinformatics, it is certainly possible for students to dive directly into at least small-scale analysis of biological sequence data using exclusively the web interfaces available from the National Center for Biotechnology Information (NCBI). Indeed, an approach that does not rely on coding, at least during the first few weeks of a course, may be preferable if the majority of students arrive without coding experience. Computational Biology: A Hypertextbook by Scott Kelley and Dennis Didulo introduces fundamental processes such as DNA sequence alignment and sequence-similarity searches using exclusively “what you see is what you get” (WYSIWYG) web-based interfaces, offering clear examples that students can follow and then extend to address questions of their own. A coding-free introduction may both reduce barriers to entry for some students and, paradoxically, highlight the value of learning coding in the future, as students are likely to become frustrated with the limitations of existing web-based tools, and come to appreciate the power of coding for writing their own applications. On the web, NCBI Training & Tutorials also offers a clear introduction to the agency’s extensive online data repositories, but these tutorials typically assume that users already have deep conceptual understanding, and require guidance only for implementation. The t-BioInfo site published by Pine Biotech (a data-analysis tools startup based in New Orleans) offers some online tutorials for beginners that may be especially useful for courses that focus on bioinformatics without coding