Skip to Main Content

Resources for Undergraduate Courses in Bioinformatics: Using UNIX

By Diane P. Genereux

Using UNIX

With their first steps in R, students will be able to upload data from text files assembled in Excel or GoogleSheets. As they progress to more complex work with larger data sets, though, it will become important for them to have at least rudimentary facility with the UNIX environment. How much attention an individual bioinformatics course gives to developing this skill set will depend in part on whether course goals are principally to orient students to questions and concepts in bioinformatics, in which case R alone may be sufficient, or are geared to preparing them for direct transition to professional work in bioinformatics, in which case experience with Unix will be essential.

For true beginners, M.G. Venkateshmurthy’s classic Introduction to Unix and Shell Programming (now mainly available online as a Safari book) is an invaluable resource. While the book assumes that readers have experience in at least some areas of data analysis, the text is accessible even for readers who lack experience with the structure of data-analysis systems. A clear understanding of such structures can help empower students to investigate for themselves how to address specific problems. For a guide that is focused instead around specific “how-tos,” William Shotts’s The Linux Command Line, 2nd Edition (available via O’Reilly Online Learning) offers step-by-step instructions for accessing and arranging data, including screenshots, which will help students to discover the types of small typographical errors that can undermine an entire analysis. For more advanced students comfortable with working independently, Developing Bioinformatics Computer Skills by Cynthia Gibas and Per Jambeck proceeds clearly but rapidly from initial setup of the Unix environment to its specific use for analysis of genomic sequence data.