Gene expression in a nutshell

6.4. Gene expression in a nutshell#

This section does not present an authoritative synopsis of gene expression. Instead, it aims to present a simplified overview of what we understand about the nature of information encoded in DNA that influences gene expression, including how that information is organised.

../_images/Gene_structure_eukaryote_2_annotated.svg

The structural elements of eukaryotic genes#

Image by Thomas Shafee

The structural elements of eukaryotic genes figure illustrates both the generalised information content associated with genes, and the stepwise processes via which a gene is transcribed and ultimately translated into a protein. I want to draw your attention to the annotated segments that are listed as regulatory sequences. Such segments exist at the 5’- and 3’- boundaries of a gene. Between these lies the gene body, consisting of exons and introns. Immediately flanking the gene are the UTRs (untranslated regions). Preceding the 5’-UTR is the promoter region which is further broken into “core” and “proximal”. The promoter region is the target for binding of molecules that facilitate and/or initiate transcription of the downstream gene into RNA. The transcript includes the UTRs as well as the gene body. One important feature not explicitly represented in this figure is the actual site at which transcription starts. This is commonly referred to as the TSS (transcription start site). The TSS is largely defined by the core promoter [JGHTK08]. As that citation argues, there can be variability in the TSS for an individual gene. We are primarily interested in how information is encoded in promoter regions.

A somewhat more detailed cartoon presenting concepts of regulatory encoding is shown in the Figure on regulatory element organisation. One elementary encoding unit of regulatory information is the TFBS (transcription factor binding site). These are the target binding sequences of transcription factors (or TFs, see Binding to DNA). As illustrated, some TFBS are localised proximal to the TSS while others are more distant. The exact distance (measured in terms of length of DNA sequence between the TFBS and the gene TSS) can be quite extensive [1]. Also of interest here is the occurrence of cis-regulatory modules (CRMs) where multiple TFBS occur in a cluster, indicating binding by multiple TFs is involved in regulating gene expression. This figure incorrectly implies that gene regulatory signals only exist outside the gene. Enhancer elements located within introns have been reported.

This is a grossly simplified representation of how gene regulation happens. Regulatory control is a complicated process mediated by multiple elements.

How can we, as data analysts, inform the understanding of this complex problem? All statistical, and / or computational, analyses should start simple. Start with a simple hypothesis, evaluate it, and define a new hypothesis. By iterating this process we can gradually build well founded, more complicated models. In this process, there should also be, ideally, empirical experiments.

6.5. Exercises#

  1. Make a visual model of how information is transformed from its genomic encoding into molecular action. From the above, draw a “simple” schematic [2] that shows the essential components of a gene. Add to that drawing elements that illustrate the presumed causal relationship of TFs and TFBs to the transcription of the gene into RNA. A drawing on paper is fine! You want this model to reflect the essential patterns of this process. Imagine trying to explain this process to a first year student using your schematic. (Your schematic should be simpler than the one above.)

  2. Use your model to explain the case of no gene expression.


Citations

[FLP+09]

Melissa J Fullwood, Mei Hui Liu, You Fu Pan, Jun Liu, Han Xu, Yusoff Bin Mohamed, Yuriy L Orlov, Stoyan Velkov, Andrea Ho, Poh Huay Mei, Elaine G Y Chew, Phillips Yao Hui Huang, Willem-Jan Welboren, Yuyuan Han, Hong Sain Ooi, Pramila N Ariyaratne, Vinsensius B Vega, Yanquan Luo, Peck Yean Tan, Pei Ye Choy, K D Senali Abayratna Wansa, Bing Zhao, Kar Sian Lim, Shi Chi Leow, Jit Sin Yow, Roy Joseph, Haixia Li, Kartiki V Desai, Jane S Thomsen, Yew Kok Lee, R Krishna Murthy Karuturi, Thoreau Herve, Guillaume Bourque, Hendrik G Stunnenberg, Xiaoan Ruan, Valere Cacheux-Rataboul, Wing-Kin Sung, Edison T Liu, Chia-Lin Wei, Edwin Cheung, and Yijun Ruan. An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462:58–64, 2009. doi:10.1038/nature08497.

[JGHTK08]

Tamar Juven-Gershon, Jer-Yuan Hsu, Joshua Wm Theisen, and James T Kadonaga. The rna polymerase ii core promoter - the gateway to transcription. Curr Opin Cell Biol, 20:253–9, 2008. doi:10.1016/j.ceb.2008.03.003.

[LAvBW+09]

Erez Lieberman-Aiden, Nynke L van Berkum, Louise Williams, Maxim Imakaev, Tobias Ragoczy, Agnes Telling, Ido Amit, Bryan R Lajoie, Peter J Sabo, Michael O Dorschner, Richard Sandstrom, Bradley Bernstein, M A Bender, Mark Groudine, Andreas Gnirke, John Stamatoyannopoulos, Leonid A Mirny, Eric S Lander, and Job Dekker. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326:289–93, 2009. doi:10.1126/science.1181369.