Lecture in Computational Biology on 14 March 2002

Content of the lecture on 14 March 2002

First, I have exposed in detail the Dynamic Programming solution to the maximum length common subsequence problem introduced during the previous lecture.

Next, I have both informally and formally introduced some problems of similarity among strings. We saw three main kinds of such problems:

• global similarity;
• local similarity;
• semiglobal similarity.
We have motivated the role of all three kinds of similarity problems here above in the realm of Computational Biology.
Considering only global similarity, we have proposed a well defined problem through the notions of alignment and scoring of an alignment. The problem is: given two sequences, produce an alignment for them which achieves the maximum score.

Homeworks

A very nice and quite high challange that we posed as possible homework already for the last time was the following: can you reduce the space complexity for finding a common subsequence of maximum length?

We decided that you can rest on this problem for the week-end.

I proposed an open research problem: find the minimum number of brakes to make a given line system good (in poly-time, of course). More details on this open problem, as well as other intriguing open problems in Computational Biology can be found in this paper (.pdf file) appeared in 2000 on Journal of Algorithms.

There is another problem that we begun proposing already in the previous lecture: try to put the algorithms seen for the maximum length common subsequence problem into real code. If you did not, then you can download from this Dynamic Programming codes page the c++ coding, done by Marco Rospocher, of the three solutions considered:

• simple recursion (exponential);
• recursion with memoization (polynomial);
• Dynamic Programming (polynomial).
At the same page you can also find the same three levels of solution and coding for another example (chain matrix multiplication) of Dynamic Programming as discussed in the Cormen et al. book (see reference here below). At the same page you can also find the proposal of coding one of the three (or all the three approaches) for the editing distance problem.

A last possible homework that we proposed was to design a dynamic programming algorithm for the global comparison among two strings problem as formalized through the notions of alignments and scores of alignments. Let me repeat what the formulation of the problem actually is: given two sequences, produce an alignment for them which achieves the maximum score.

More notions, references and details on Dynamic Programming can be found in the "Dynamic Programming" chapter of the book

• Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L., Introduction to Algorithms, MIT Press, Cambridge, MA; McGraw-Hill Book Co., New York (1990) ISBN 0-262-03141-8
An extremely good introduction to the problem of the similarity among two or more strings is given in Chaper 3 of the book recommended as a companion for this introductory course:
• J.C. Setubal and J. Meidanis, Introduction to Computational Molecular Biology, PWS Publishing Company. An International Thomson Publishing Company, (1997)

Have a nice reading and self-study!

 14-March-2002 © Dipartimento di Matematica - Università di Trento