Global Sequence Alignment

The idea behind this exercise is to implement the algorithm for global alignment between two sequences in order to: (a) assemble the alignment matrix, (b) return the similarity value between two sequences, and (c) the best alignment between the two sequences.

A Possible Solution

Algorithm that assembles the alignment matrix and returns the similarity value:

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. Similarity Algorithm input: sequences s and t output: similarity between s and t m = |s| // size of s n = |t| // size of t // a = matrix m*n for i = 0 to m do a[i,0] = i * g // g = penalty for gaps for j = 0 to n do a[0,j] = j * g // g = penalty for gaps for i = 1 to m do for j = 1 to n do a[i,j] = max(a[i-1,j] + g, a[i-1,j-1] + p(i,j), a[i,j-1] + g) return a[m,n] // similarity between s and t // function p(i,j) would be defined as follows: // if s[i] = t[j] // p(i,j) = +1 // else // p(i,j) = -1

Recursive algorithm that returns the best alignment based on the alignment matrix assembled in Similarity Algorithm:

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. Alignment Algorithm input: indices i,j, array a given by algorithm Similarity output: alignment in align-s, align-t, and length in len // variables align-s and align-t are global in algorithm if i = 0 and j = 0 then len = 0 else if i>0 and a[i,j] = a[i-1] + g then Align(i-1, j, len) len = len + 1 align-s[len] = s[i] align-t[len] = _ else if i>0 and j>0 and a[i,j] = a[i-1,j-1] + p(i,j) then Align(i-1, j-1, len) len = len + 1 align-s[len] = s[i] align-t[len] = t[j] else // has to be j>0 and a[i,j] = a[i,j-1] + g Align(i, j-1, len) len = len + 1 align-s[len] = _ align-t[len] = t[j]