Prologue

2. Prologue#

Section author: Gavin Huttley

How is life encoded in nucleic acids? What is the minimum number of nucleotides required to encode a virus? Do all genetic differences between individuals contribute to phenotypic differences between them? What genetic change(s) causes humans to have unique cognitive abilities compared to our nearest great ape relatives? How many different types of microorganisms live in our gut and what role do they play in our biology? If you had \(\sim 10^{10}\) 250bp long DNA sequences that were randomly sampled from a single Human, could you assemble that individuals genome from scratch? If you sequence DNA extracted from soil, can you identify what species the DNA comes from? If you have an assembled genome of a novel species, can you identify where the genes in the genome are?

We do not know the answer to many of these questions. Despite that, and despite the fact that some of them are strictly concerned with biological phenomena, I am confident that they all require computers to attain a solution. I will go even further and claim that any question in the biological sciences we can come up with will require the use of computers to tackle it. Based on this, I think it is reasonable to make the general claim that biologists have a near absolute reliance on computers for undertaking their research [1]. But we can make that statement more specific: Biologists have a near absolute reliance on software to undertake their research [2].

Just about every part of a biology wet-lab has some type of digital device, from pipette’s and scales to centrifuges. From confocal microscopes to scanning tunnelling electron microscopes. The more transformed data becomes from the natural system, the greater the involvement of software (and our reliance on it) to control those devices. But science is an enterprise whose objective is much more than simply the accumulation of data. Similarly, software can be much more than instructions for controlling the capture of data.

Science is about producing general models of how the natural world works. Good scientific models are powerful because they allow us to make predictions about what we will observe in circumstances that have not yet been encountered. Computer programs are one way in which we represent those models. They provide us with the means to evaluate, on a large scale, how well our model matches actual data. In other words, software is integral to scientific practice.

The above is a big picture view of the role of software in science, but where does this course fit in? I can say with absolute certainty that we will not be answering most of the questions raised above (Doh!). What you will be doing is acquiring the skills necessary to understand what the analyses described above are, how to perform them and whether you should believe the results. You will develop the skills to write custom software whose output you can trust. In your future career, that software may be as simple as handling data formats, or completely novel models of biological systems. In all cases, your software will be key to solving the scientific questions that you find interesting.

At this stage, I’m sure you have many questions. Like, What is “bioinformatics”? I’ve heard the phrase “Data Science”, is that what bioinformatics is?

Data Science is the joint application of algorithm development and statistical modelling to extracting information from what is referred to as “big data”. Bioinformatics, also referred to as computational biology, certainly fits within this rather loose definition. In essence, Bioinformatics is the union of algorithms and statistics focussed on extracting information from big biological data sets to advance knowledge of biological systems.

What will I get out of this course? An understanding of how computer programs work. Improvements in your logical reasoning. An understanding of how to take advantage of computing to advance your own scientific interests. An appreciation of the pitfalls of software! All of this adds up to a superpower that will accelerate your work [5].