8.7. Genetic Codes#

A genetic code is the fundamental information encoding system and a fascinating subject of study. cogent3 provides a dedicated object for handling genetic code information. The genetic codes included with cogent3 are indicated in the following table.

Specify a genetic code using either 'Name' or Code ID (as an integer or string)
Code IDName
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold Mitochondrial; Protozoan Mitochondrial; Coelenterate Mitochondrial; Mycoplasma; Spiroplasma
5Invertebrate Mitochondrial
6Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear
9Echinoderm Mitochondrial; Flatworm Mitochondrial
10Euplotid Nuclear
11Bacterial, Archaeal and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Alternative Flatworm Mitochondrial
15Blepharisma Macronuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus obliquus Mitochondrial
23Thraustochytrium Mitochondrial
24Rhabdopleuridae Mitochondrial
25Candidate Division SR1 and Gracilibacteria
26Pachysolen tannophilus Nuclear
27Karyorelict Nuclear
28Condylostoma Nuclear
29Mesodinium Nuclear
30Peritrich Nuclear
31Blastocrithidia Nuclear
32Balanophoraceae Plastid
33Cephalodiscidae Mitochondrial

27 rows x 2 columns

Use the top level get_code() function to get a specific genetic code. Use a code ID from the above table.

from cogent3 import get_code

gc = get_code(1)
gc
Standard
aaIUPAC codecodons
AlanineAGCT,GCC,GCA,GCG
CysteineCTGT,TGC
Aspartic AcidDGAT,GAC
Glutamic AcidEGAA,GAG
PhenylalanineFTTT,TTC
GlycineGGGT,GGC,GGA,GGG
HistidineHCAT,CAC
IsoleucineIATT,ATC,ATA
LysineKAAA,AAG
LeucineLTTA,TTG,CTT,CTC,CTA,CTG
MethionineMATG
AsparagineNAAT,AAC
ProlinePCCT,CCC,CCA,CCG
GlutamineQCAA,CAG
ArginineRCGT,CGC,CGA,CGG,AGA,AGG
SerineSTCT,TCC,TCA,TCG,AGT,AGC
ThreonineTACT,ACC,ACA,ACG
ValineVGTT,GTC,GTA,GTG
TryptophanWTGG
TyrosineYTAT,TAC
STOP*TAA,TAG,TGA

8.7.1. Useful GeneticCode attributes#

sense_codons

The codons that encode an amino acid. The trinucleotide string is the key, and the single-character IUPAC code for the amino acid is the value.

codons

Maps all codon strings to corresponding amino acid IUPAC code.

synonyms

Maps all amino acid IUPAC codes to their codons. (The reverse of codons.)

8.7.2. Using GeneticCode instances#

Genetic code objects act like dictionaries for trinucleotide strings or single-letter strings. The former is interpreted as RNA or DNA, the latter as the single-character amino acid code.

You can get the encoded amino acid from a RNA triplet

aa = gc["UAC"]
aa
'Y'

or DNA triplet.

aa = gc["TAC"]
aa
'Y'

The mapping from codon to amino acid is provided by the genetic code instance sense_codons attribute. So, calling list() on that dict returns just the sense codons [1].

list(gc.sense_codons)[:4]
['TTT', 'TTC', 'TTA', 'TTG']

You can get all the codons that encode an amino acid.

codons = gc["Y"]
codons
['TAT', 'TAC']

You can check whether a codon is a start

gc.is_start("ATG")
True

or stop codon

gc.is_stop("TAA")
True

Stop codons are represented by "*" character.

gc["TGA"]
'*'
gc["*"]
['TAA', 'TAG', 'TGA']

You can translate a string.

gc.translate("TCGACCGTTTAAGCC")
'STV*A'

You can get the code as a Table,

table = gc.to_table()
table
Standard
aaIUPAC codecodons
AlanineAGCT,GCC,GCA,GCG
CysteineCTGT,TGC
Aspartic AcidDGAT,GAC
Glutamic AcidEGAA,GAG
PhenylalanineFTTT,TTC
GlycineGGGT,GGC,GGA,GGG
HistidineHCAT,CAC
IsoleucineIATT,ATC,ATA
LysineKAAA,AAG
LeucineLTTA,TTG,CTT,CTC,CTA,CTG
MethionineMATG
AsparagineNAAT,AAC
ProlinePCCT,CCC,CCA,CCG
GlutamineQCAA,CAG
ArginineRCGT,CGC,CGA,CGG,AGA,AGG
SerineSTCT,TCC,TCA,TCG,AGT,AGC
ThreonineTACT,ACC,ACA,ACG
ValineVGTT,GTC,GTA,GTG
TryptophanWTGG
TyrosineYTAT,TAC
STOP*TAA,TAG,TGA

21 rows x 3 columns

See the cogent3 cookbook documentation for more on using genetic codes.

8.7.3. Exercises#

Identify all sense codons that differ from each other at only one of the codon positions. Group these pairs by codon position [2]. The following questions refer to these groupings.

  1. Pick a genetic code and, for each such codon position group, count the number of changes that are synonymous. Is there a difference between the codon position and the proportion of synonymous changes?

  2. Does the property measured in the previous question differ between the genetic codes?

  3. Categorise the codon differences by whether they are transition or transversion changes (see Point mutations). Assess whether the fraction of synonymous changes differs between transition and transversion changes.

  4. Is there variation (between the genetic codes) in the number of stop codons? Assess this programmatically.

    Hint: look at the attributes on the genetic code instance.