6.1. Outline#
We initiate the topic by emphasising the role of sequences as the hereditary information store.
We then introduce \(k\)-mers as a basic quantity for describing biological sequences.
Algorithms for counting \(k\)-mers are introduced.
The relationship between \(k\)-mers and functional motifs is developed along with descriptions of experiments that are used to identify functional motifs.
We introduce Shannon’s entropy for measuring information content of sequences and extend it to identifying binding motifs.
The odds-ratio is introduced as a statistical method for identifying enrichment relative to background distribution.
We proceed to algorithms for examining whether sequences descend from a common ancestor.
The brute force dotplot algorithm for comparing related sequences is contrasted with an elegant dynamic programming algorithm.