Lawrence Technological University
College of Arts and Science
Departments of Mathematics, Computer Sciences and Biology

Handouts

An Annotated List of Open Source Software Tools for Computational Biology

   There are many excellent packages available. These are just my favorites (as of the later half of 2014.) Please write me with your suggestions. John M. Miller M.D.

Perl

   Perl 5 is quite stable and has many, many helpful Bioinformatics libraries. Perl 6 is an improvement in several ways but is not backward compatible with the large amount of existing Perl 5 code. If you are running OS X or some version of Linux Perl is already installed. If you are running Windows use Active State's Active Perl Community Edition.

Python

   Python is another good language. Python 2 has the largest collection of stable libraries. Python 3 has some improvements. But some important libraries have yet to have a stable Python 3 port. Python on the Raspberry PI is a nice inexpensive way to build a sensor network for your biology lab. If you are running OS X or some version of Linux Python is already installed. If you are running Windows use Active State's Active Python Community Edition.

Ruby

   The latest Ruby versions do not have the backward compatibility problems of Perl and Python. The integration with databases and a Web interface in Ruby on Rails is excellent. The bindings to work with R and Image Magick are good. If you are running OS X or some version of Linux Ruby may be already installed. If you are running Windows use Ruby's RubyInstaller. While on the Ruby site look at the "Getting Started" sidebar with "Ruby in Twenty Minutes" and "Ruby from Other Languages." JRuby is a handy option if your lab uses a lot of Java.

JavaScript

   Many problems with Web pages and many violations of the guidelines at Access-Board.gov are from bad JavaScript. However many biologists spend a lot of time in Web browsers and making those hours more efficient is important enough to learn some sensible JavaScript. Those who never cared for Perl's one-liners may like JavaScript bookmarklets. JavaScript comes with most modern browsers; but may have to be enabled in some institutions.

GNU C++

   Mostly the higher level, dynamically typed scripting languages will be more understandable to the biology team members. However, if the sensor or controller network in your lab includes small boards like the Arduino, then GNU C++ is what you want. Installing the whole GNU Compiler Suite on Windows is not so easy. However, all that you need comes with the Arduino IDE, a fairly easy install on Windows.

Lisp and Clojure

   Common Lisp and Scheme are good versions of this language. The eLisp version that is part of Emacs and Clojure is a version that works well with Java in the Java Runtime Environment.

Scala

   Scala is an elegant language that, like JRuby, allows you to use the extensive Java libraries and still express your algorithm in far fewer lines of code.

Emacs

   Emacs is the programmer's editor that I use for all the examples here. The key bindings are happily the same as for the Bash shell. Works on most any operating system. Has full support for regular expressions and has a spell checker.

LaTeΧ

   Labs produce lab reports and papers. LaTeΧ is excellent for these purposes. For a lab running various operating systems, Emacs and TeΧ Live are essentials. The Eitan Gurari's DraTeX package comes with TeX Live and is excellent for illustration.

R

   When you are tired of doing statistics with a spread sheet and still are not satisfied with the quality of the graphical results you want to publish, try R.

Octave

   Octave is a very nice alternative to Matlab.

ImageMagick

   Go to ImageMagick for the tools to make superior graphics. After installing ImageMagick you will want to get the bindings to go with your favorite language. e.g. RMagick.

Postscript

   Sometimes is easier and better to write PostScript directly than to use some program that writes it for you. In use you will use the Postscript built into the printing system you are using. In development try Ghostscript.

Processing

   DraTex and LaTeΧ may be a stretch for any visual artists in your team. They will like Processing and then processing.js.

Git

   Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. In short it is an architecture that should be at the core of medical records systems.

Revised August 30, 2014