Global/Local/Multiple Alignments
description
Transcript of Global/Local/Multiple Alignments
Global/Local/Multiple Alignments
by Boyang Wei
GlobalAlignment
How to run: • GlobalAlignment -s seq1 seq2 (-m)
• GlobalAlignment -f file1 file2 (-m)
Sample output: rns202-15.cs.stolaf.edu% GlobalAlignment -s ABB ABABBAB -
ABBA - B
GlobalAlignment
// represent each box in the matrixstruct box { int row; // row index int col; // column index int score; // best score vector<box*> from; // where does the score come from};
LocalAlignment
How to run: • LocalAlignment -s seq1 seq2 (-m)
• LocalAlignment -f file1 file2 (-m)
Sample output: rns202-15.cs.stolaf.edu% LocalAlignment -f seq1.txt seq2.txtAC|TAC|T G|TAC|
LocalAlignment
Sample output with matrix printed: rns202-15.cs.stolaf.edu% LocalAlignment -f seq1.txt seq2.txt -mAC|TAC|T G|TAC|
- A C T A C T - 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 T 0 0 0 2 1 0 2 A 0 2 1 1 4 3 2 C 0 1 4 3 3 6 5
Global & Local Alignment
Input: • both programs require exact two sequences as input
• not limited to A, T, G, C
• don't require capitalized character
• all kinds of characters will work
MultipleAlignment
How to run: • MultipleAlignment -s seq1 seq2 seq3 ...
• MultipleAlignment -f file
Sample output: rns202-15.cs.stolaf.edu% MultipleAlignment -s ATGC ATG ATCATGCATG -AT - C
MultipleAlignment
// class to store a character// its objects represent the letters in the sequence// NOTE! since the letter class is set up in this way, // it will store characters other than A, T, G, C as a gap,// so this program only works for input consisting of A, T, G, C
struct letter { float A; // percentage of A in this letter float T; // percentage of T in this letter float G; // percentage of G in this letter float C; // percentage of C in this letter float gap; // percentage of gap in this letter ......}
MultipleAlignment
// a self-defined string class// to store the sequence/string as a sequence of letter objects
struct sequence { vector<letter> seq; // the sequence
vector<int> gapPosition; // the gap positions at the end of aligning
int prev[2]; // index of the previous two sequences // (sometimes a sequence may be generated by // combining two sequence) ......}
MultipleAlignment
// calculate the score for two letter// either a match, mismatch, or partial match
float calculateScore(letter l1, letter l2) { float matchPercent = min(l1.A, l2.A) + min(l1.T, l2.T) + min(l1.G, l2.G) + min(l1.C, l2.C); float misMatchPercent = 1 - matchPercent; return matchPercent*matchScore + misMatchPercent*misMatchScore;}
MultipleAlignment
// combine two letters to generate a new oneletter sum(letter l1, letter l2) { // if one of them is a gap, return the other one if (l1.gap == 1) return l2; else if (l2.gap == 1) return l1; // otherwise, combine the percentages else { letter l((l1.A+l2.A)/2, (l1.T+l2.T)/2, (l1.G+l2.G)/2, (l1.C+l2.C)/2, (l1.gap+l2.gap)/2); // could just put 0 for this line return l; }}
MultipleAlignment
Input: • requires at least one sequence as input
• if read from file:
o first line: the number of input sequenceso rest lines: one sequence per line