Lawrence Technological University
College of Arts and Science
Department of Mathematics and Computer Sciences
Bioinformatics, Genomics and the Masters of Science in
Computer Science Program
The 1990's brought rapid advances to the life
sciences and some new terms like "bioinformatics." Much of the
productivity of these past 20 years has been a result of
collaboration between researchers from biology and computer
science. The explosion of data exchange over the Internet
fundamentally changed many collaborations in science and
engineering. I hope this page will suggest reasons why you
might want to join this particular collaboration -- from either
the computer or the biology side.
MCS 5603, Introduction to Bioinformatics,
and MCS 5613, Genomics are courses in the Masters of
Science in Computer Science Program that have developed into
interdisciplinary courses suitable for students of biology,
biomedical engineering and biomedical informatics.
These courses are hands-on introductions using, building, searching, and leveraging
life science databases
Interdisciplinary collaboration between computer science and
biology team members, the source of much of the recent rapid
gain in the life science knowledge base, is also considered.
The Web infrastructure for this collaboration will seem
familiar to those in both disciplines. The biology student
will learn a little about the workhorse of computer science,
the relational database management systems of Edgar Codd. The
computer student will learn a lot about the major, working,
life science database systems that are hierarchical and
extensively cross-linked without any regard for Codd's 12
Introduction to Bioinformatics takes a hands-on
approach to understanding the important algorithms used to
discover the information in the genetic code.
Genomics also takes a hands-on approach, but
to analyzing the information collected using techniques like those in
Introduction to Bioinformatics to try and understand the
behavior of the organism.
Because these courses are intended for both
computer students who want to learn more biology and biology
students who want to learn more computing, graduate status in
neither is a prerequisite. It is possible to begin with either
Introduction to Bioinformatics or Genomics, but taking these
two simultaneously is challenging.
Genetics and Biostatistics are examples of
additional courses that are recommended for interested students.
Points of interest for computer science
students in Introduction to Bioinformatics and Genomics
- Other courses in the Masters of Science in Computer Science
program cover relational database systems. Introduction to
Bioinformatics and Genomics are good opportunities to
practice database access with various hierarchical models
and information interchange with XML like languages.
- The theory limiting solutions for computationally
intensive and NP-complete problems is discussed in courses
like Theory of Computation. Such problems are common in
biology. Trying out a dynamic programming algorithm to
quantify the similarity of DNA sequences is an adventure in
the practice of solving such problems.
- One way to determine what an organism might do is to look at
pictures of the set of proteins in its cellular machinery.
Interesting problems include:
- Predicting the 3-dimensional shapes of proteins from
linear lists of their component amino acids.
- Rendering drawings of the shapes.
- Computing the similarities among 3-dimensional shapes
assembled from dissimilar linear sequences.
- Presenting a clear visual argument derived from a really
large set of data efficiently enough to provide timely
decision support is an art and a science and a valuable
Tufte is one of the few masters in this area.
- Managing the huge data sets in the life sciences is of interest
to some potential employers:
- Government agencies.
- The insurance industry.
- Biology and biomedical researchers.
Points of interest for life science students
in Introduction to Bioinformatics and Genomics
- Recently biology researchers have used fewer software tools
that were developed on Unix and never did work nicely on
their Windows laptops. Web tools have become better and
more available. This Web revolution has generally helped;
but, control of the changes in everyday lab tools has moved
away from researcher. The consequences of this are explored
in Part 3 of a series on Ars Technica by John
Timmer, where he discusses the difficulties of
reproducing scientific results when the digital tools are in
a constant state of flux. Hacker slang for helping to
ameliorate such problems is data munging, which represents an
area where life scientists can use all the help they can
- The amount of data to be munged is growing much faster than the
number of computer people with some biology background.
- Software vetted by the market place is already old.
- Jules J. Berman, a series editor for Jones and Bartlett, in
his Ruby Programming for Medicine and Biology,
articulately makes the case that at least one computer
language stands ready to help biology researchers (Section
"... Ruby is so easy to learn that healthcare workers and
biomedical researchers can do their own Ruby programming and
still find time to fulfill their many professional
obligations. You will probably find that Ruby programming
saves you time and increases your work productivity."
- Joseph Adler in his R in a Nutshell, discusses some
reasons why being able to do a little bespoke programming
would benefit a biology researcher. (e.g. page 179)
"Unfortunately for me, I chose to do data mining work
professionally. Everyone loves building models, drawing
charts, and playing with cool algorithms. Unfortunately
most of the time you spend on data analysis projects is
spent on preparing data for analysis. I'd estimate that 80%
of the effort on a typical project is spent on finding,
cleaning and preparing data for analysis. Less than 5% of
the effort is devoted to analysis. (The rest of the time is
spent on writing up what you did.)"
- When it comes time to publish your findings remember,
(From Edward Tufte's One-Day-Course)
"Good graphics with Microsoft Excel is an oxymoron."
- Practical languages like Ruby can help a biologist run
experiments that are far cheaper when done by a computer
than when done in a wet lab. A little bit of theory can
help the biologist decide which experiments can reasonably
be attempted in silicon.2
- Actually Codd had 13 rules, numbered
from 0 to 12 in typical computer science fashion. For example,
Rule 3 requires the systematic treatment of null or
empty values. A relational database management system
conforming to Rule 3, supports a representation of missing
or inapplicable information that is distinct from all regular
values (read, not zero, blank, 9999 or xxxx.) Over the years,
curators of life science databases have used various ad
hoc approaches to problematic data because as Dr. Julie
Zwiesler-Vollick has said, "Biology is messy."
- e.g. Alan Turing found half a
century ago that a computer, even bigger and faster than
those in use today, which knows accurately all the code (or genome)
of another computer (or organism) can not always predict
what the second computer would do.
Revised January 3, 2011