Dotplot – Alignment of sequences related by descent from a common ancestor

6.16. Exercises#

  1. Implement the simple dotplot algorithm. Write a function that takes the following two sequences and returns an array with 1 where the sequences do not match and 0 where they do.

    seq1 = "CCTCTGAATAGGAGACAAGACCATGCAGGCATACTAGGTGGCGCACATAGATTT"
    seq2 = "CCTCTGAATAGGCGACGAAGACAAGACCATGCAGGCATAGGTGGCGCACATAGATTT"
    
  2. Write a function that returns cartesian coordinates for the same sequences, but with the \(x\) and \(y\) components separated.

    Using a smaller data set like the below, you can check your algorithm performs correctly.

    seq1 = "CCAAA"
    seq2 = "CCTCAG"
    
    x=(0, 0, 0, 1, 1, 1, 2, 3, 4)
    y=(0, 1, 3, 0, 1, 3, 4, 4, 4)
    
  3. Plot the cartesian coordinates using a scatter plot, with axis labels representing the sequence names.


Citations

[GM70] (1,2,3)

Adrian J. Gibbs and George A. Mcintyre. The Diagram, a Method for Comparing Sequences. Its Use with Amino Acid and Nucleotide Sequences. European Journal of Biochemistry, 16:1–11, 1970. URL: http://doi.wiley.com/10.1111/j.1432-1033.1970.tb01046.x.