Global Sequence Alignment

The idea behind this exercise is to implement the algorithm for global alignment between two sequences in order to: (a) assemble the alignment matrix, (b) return the similarity value between two sequences, and (c) the best alignment between the two sequences.

A Possible Solution

Algorithm that assembles the alignment matrix and returns the similarity value:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
Similarity Algorithm
    input: sequences s and t
    output: similarity between s and t

    m = |s| // size of s
    n = |t| // size of t

    // a = matrix m*n

    for i = 0 to m do
        a[i,0] = i * g // g = penalty for gaps

    for j = 0 to n do
        a[0,j] = j * g // g = penalty for gaps

    for i = 1 to m do
        for j = 1 to n do
            a[i,j] = max(a[i-1,j] + g,
                         a[i-1,j-1] + p(i,j),
                         a[i,j-1] + g)

    return a[m,n] // similarity between s and t

// function p(i,j) would be defined as follows:
// if s[i] = t[j]
//     p(i,j) = +1
// else
//     p(i,j) = -1

Recursive algorithm that returns the best alignment based on the alignment matrix assembled in Similarity Algorithm:

1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
Alignment Algorithm
    input: indices i,j, array a given by algorithm Similarity
    output: alignment in align-s, align-t, and length in len

    // variables align-s and align-t are global in algorithm

    if i = 0 and j = 0 then
        len = 0
    else if i>0 and a[i,j] = a[i-1] + g then
        Align(i-1, j, len)
        len = len + 1
        align-s[len] = s[i]
        align-t[len] = _
    else if i>0 and j>0 and a[i,j] = a[i-1,j-1] + p(i,j) then
        Align(i-1, j-1, len)
        len = len + 1
        align-s[len] = s[i]
        align-t[len] = t[j]
    else // has to be j>0 and a[i,j] = a[i,j-1] + g
        Align(i, j-1, len)
        len = len + 1
        align-s[len] = _
        align-t[len] = t[j]