8.7. Genetic Codes#
A genetic code is the fundamental information encoding system and a fascinating subject of study. cogent3
provides a dedicated object for handling genetic code information. The genetic codes included with cogent3
are indicated in the following table.
Code ID | Name |
---|---|
1 | Standard |
2 | Vertebrate Mitochondrial |
3 | Yeast Mitochondrial |
4 | Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma |
5 | Invertebrate Mitochondrial |
6 | Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear |
9 | Echinoderm Mitochondrial; Flatworm Mitochondrial |
10 | Euplotid Nuclear |
11 | Bacterial, Archaeal and Plant Plastid |
12 | Alternative Yeast Nuclear |
13 | Ascidian Mitochondrial |
14 | Alternative Flatworm Mitochondrial |
15 | Blepharisma Macronuclear |
16 | Chlorophycean Mitochondrial |
21 | Trematode Mitochondrial |
22 | Scenedesmus obliquus Mitochondrial |
23 | Thraustochytrium Mitochondrial |
24 | Rhabdopleuridae Mitochondrial |
25 | Candidate Division SR1 and Gracilibacteria |
26 | Pachysolen tannophilus Nuclear |
27 | Karyorelict Nuclear |
28 | Condylostoma Nuclear |
29 | Mesodinium Nuclear |
30 | Peritrich Nuclear |
31 | Blastocrithidia Nuclear |
32 | Balanophoraceae Plastid |
33 | Cephalodiscidae Mitochondrial |
27 rows x 2 columns
Use the top level get_code()
function to get a specific genetic code. Use a code ID from the above table.
from cogent3 import get_code
gc = get_code(1)
gc
aa | IUPAC code | codons |
---|---|---|
Alanine | A | GCT,GCC,GCA,GCG |
Cysteine | C | TGT,TGC |
Aspartic Acid | D | GAT,GAC |
Glutamic Acid | E | GAA,GAG |
Phenylalanine | F | TTT,TTC |
Glycine | G | GGT,GGC,GGA,GGG |
Histidine | H | CAT,CAC |
Isoleucine | I | ATT,ATC,ATA |
Lysine | K | AAA,AAG |
Leucine | L | TTA,TTG,CTT,CTC,CTA,CTG |
Methionine | M | ATG |
Asparagine | N | AAT,AAC |
Proline | P | CCT,CCC,CCA,CCG |
Glutamine | Q | CAA,CAG |
Arginine | R | CGT,CGC,CGA,CGG,AGA,AGG |
Serine | S | TCT,TCC,TCA,TCG,AGT,AGC |
Threonine | T | ACT,ACC,ACA,ACG |
Valine | V | GTT,GTC,GTA,GTG |
Tryptophan | W | TGG |
Tyrosine | Y | TAT,TAC |
STOP | * | TAA,TAG,TGA |
8.7.1. Useful GeneticCode
attributes#
sense_codons
The codons that encode an amino acid. The trinucleotide string is the key, and the single-character IUPAC code for the amino acid is the value.
codons
Maps all codon strings to corresponding amino acid IUPAC code.
synonyms
Maps all amino acid IUPAC codes to their codons. (The reverse of
codons
.)
8.7.2. Using GeneticCode
instances#
Genetic code objects act like dictionaries for trinucleotide strings or single-letter strings. The former is interpreted as RNA or DNA, the latter as the single-character amino acid code.
You can get the encoded amino acid from a RNA triplet
aa = gc["UAC"]
aa
'Y'
or DNA triplet.
aa = gc["TAC"]
aa
'Y'
The mapping from codon to amino acid is provided by the genetic code instance sense_codons
attribute. So, calling list()
on that dict returns just the sense codons [1].
list(gc.sense_codons)[:4]
['TTT', 'TTC', 'TTA', 'TTG']
You can get all the codons that encode an amino acid.
codons = gc["Y"]
codons
['TAT', 'TAC']
You can check whether a codon is a start
gc.is_start("ATG")
True
or stop codon
gc.is_stop("TAA")
True
Stop codons are represented by "*"
character.
gc["TGA"]
'*'
gc["*"]
['TAA', 'TAG', 'TGA']
You can translate a string.
gc.translate("TCGACCGTTTAAGCC")
'STV*A'
You can get the code as a Table,
table = gc.to_table()
table
aa | IUPAC code | codons |
---|---|---|
Alanine | A | GCT,GCC,GCA,GCG |
Cysteine | C | TGT,TGC |
Aspartic Acid | D | GAT,GAC |
Glutamic Acid | E | GAA,GAG |
Phenylalanine | F | TTT,TTC |
Glycine | G | GGT,GGC,GGA,GGG |
Histidine | H | CAT,CAC |
Isoleucine | I | ATT,ATC,ATA |
Lysine | K | AAA,AAG |
Leucine | L | TTA,TTG,CTT,CTC,CTA,CTG |
Methionine | M | ATG |
Asparagine | N | AAT,AAC |
Proline | P | CCT,CCC,CCA,CCG |
Glutamine | Q | CAA,CAG |
Arginine | R | CGT,CGC,CGA,CGG,AGA,AGG |
Serine | S | TCT,TCC,TCA,TCG,AGT,AGC |
Threonine | T | ACT,ACC,ACA,ACG |
Valine | V | GTT,GTC,GTA,GTG |
Tryptophan | W | TGG |
Tyrosine | Y | TAT,TAC |
STOP | * | TAA,TAG,TGA |
21 rows x 3 columns
See the cogent3 cookbook documentation for more on using genetic codes.
8.7.3. Exercises#
Identify all sense codons that differ from each other at only one of the codon positions. Group these pairs by codon position [2]. The following questions refer to these groupings.
1st, 2nd and 3rd codon position.
Pick a genetic code and, for each such codon position group, count the number of changes that are synonymous. Is there a difference between the codon position and the proportion of synonymous changes?
Does the property measured in the previous question differ between the genetic codes?
Categorise the codon differences by whether they are transition or transversion changes (see Point mutations). Assess whether the fraction of synonymous changes differs between transition and transversion changes.
Is there variation (between the genetic codes) in the number of stop codons? Assess this programmatically.
Hint: look at the attributes on the genetic code instance.