Lecture in Computational Biology on 27 March 2002

Content of a lecture on 27 March 2002

In this lecture, we have introduced the notion of suffix tree trying also to indicate some of the possible uses of this data structure. The main body of the whole lecture has been the exposition of Ukkonen's linear time algorithm for constructing the suffix tree of any given string (over a finite alphabet). We have followed closely the exposition of this algorithm as given in Chapter 6 of the following book.

• Dan Gusfield, Algorithms on Strings, Trees, and Sequences, Computer Science and Computational Biology. Cambridge University Press, (1997)

In particular, to enter in the global architecture of the algorithm, we first propose an high level description of the algorithm, unfortunately taking cubic time to run. We then when trought a series of observations and implementational tricks as to finally get to a linear time implementation.

We proved the NP-completeness of the Double Digest Problem in Physical Mapping of chromosomes.

Homeworks

The algorithm we have seen today is quite complex. Maybe I should have spared you (and myself), but the point is that suffix trees are really important. Since the lecture has now been played, I would suggest you to read a description of Ukkonen's algorithm, to fix the ideas and fit the last details missed. It would also pay a lot to browse the many applications of suffix trees as listed in some text, read just the statement of the problems, and try to see how to solve the problem by means of suffix trees working on your own. For the kind of homeworks suggested here above the following book could serve as a companion.

• Dan Gusfield, Algorithms on Strings, Trees, and Sequences, Computer Science and Computational Biology. Cambridge University Press, (1997)

I have also suggested an homework that does not need any such companion (but of course you can look for a solution on several books, like the Garey & Johnson).
• Prove that KNAPSACK is NP-complete. (by a reduction from PARTITION);
• Prove that KNAPSACK and PARTITION are equivalent problems as far as polynomial time solvability is concerned.