Seqcomp Program
Seqcomp is a small program than runs a simple yet exhaustive analysis of two genomic sequences.
Sequence Comparison Algorithm
The basis of the seqcomp algorithm is a comparison of a region, or window, of fixed size from one sequence against the same size window in the second sequence. Each base in the window from the first sequence is compared to the base in the window from the second sequence (eg 1st vs 1st, 2nd vs 2nd, etc). Each base that matches scores one point, each mismatch scores zero.
All windows from the first sequence are compared to all windows in the second sequence. The window from the first sequence is compared against all windows in the second, and then the second window in the first sequence (shifted forward one base) is compared to all windows in the second sequence, etc, until all possible windows have been compared against each other.
Algorithm Example
Take 2 sequences, A and B. Choose a window of some size, say 10 bp. Scan across all the windows in sequence A. For each window in A, compare it to all the windows in B.
| A: |
GGACGCCCCAGGACACGACTGCTTTCTTCACCACACCTCTGACAGGACAGGACAGGGAGGAGGGGTAGAG |
| B: |
GGACAAATCAGGCCGGACAGGAGAGGGAGGGGTGGGGGACAGTGGGTGGGGATTCAGACTGCCAGCACTT |
| A1: |
GGACGCCCCA |
|
|||| || Score = 6 |
| B1: |
GGACAAATCA |
| A1: |
GGACGCCCCA |
|
| | Score = 2 |
| B2: |
GACAAATCAG |
And so on, A1 vs. B3...Bn (n = length - window + 1)
And then A2 vs. B1...Bn
etc.
Keep the best 10 matches for each window in A. If a tie for 10th place, keep all with that score.
Command:
seqcomp human_mck_pro.fa mouse_mck_pro.fa 50 10000 25 1 HvM_ckm_small_pro.xml
Five Arguments:
| First Sequence |
human_mck_pro.fa |
| Second Sequence |
mouse_mck_pro.fa |
| Window Size |
50 |
| Max Sequence to read in |
10000 |
| Hard Threshold |
25 |
| Output Type |
1 |
| Output File |
HvM_ckm_small_pro.xml |
Tristan De Buysscher,
tristan family.caltech.edu
|