10. Introduction to version control#

Section author: Gavin Huttley

“Version control” refers to software tools that are designed to efficiently keep track of changes to your plain text files. As the phrase implies, the support relates to recording different versions of files. Here, the word version is not limited to a release version [1].

While used predominantly for programming [2], they can be applied to any application that uses plain text files [3]. Familiarity with version control is thus crucial for bioinformaticians. Here’s just a short list of some advantages from using it:

  • makes it easier to experiment with different solutions to a problem

  • makes it easier to collaborate with other people

  • makes moving your code between computers easier

  • makes it easier to ensure the reproducibility of your work

As a budding professional computational scientist, writing code is your core business so anything that makes that job easier is a good thing!

In this topic I provide a functional introduction [4] to the version control tool git. git is a sophisticated (and very complex) command line tool with extensive capabilities. It is not the only such tool available [5]. You will become familiar with using it in a terminal. Note that some IDE’s expose a sophisticated graphical user interface to using git that make using it much easier.

10.1. Getting set up#

These instructions are focussed on having a repository (see Glossary of key version control terms for definitions) that is hosted at GitHub and for which there will be a clone on a computer that you will use for writing your code. If you don’t already have one, sign up for a GitHub account.

You also need git installed on the machine where you will be writing / running the code [6].

On your developer machine you need to inform git what your user name and email address are. These details are used to “sign” every commit you make. This is your attribution and informs others who made what changes. On your developer machine, in the terminal

$ git config --global user.name "Your Name"
$ git config --global user.email "YourEmail@example.com"

I also strongly recommend to change the log message editor from the default (vim) to nano [7].

$ git config --global core.editor nano

10.2. A demo project#

10.2.1. Create a demo project on GitHub#

Once your account is setup, create a new repository. For the purpose of demonstration, I’m going to assume you name it demo.

Check the “Add a README file” option. Check the “Add .gitignore” option and select python from the popup. Check the “Choose a license” option and pick whichever one you like.

10.2.2. Cloning the repository to your development computer#

In this case, you will clone onto the machine where you will be developing your code. I assume you have gone through the process of creating an ssh key and followed GitHub’s instructions for adding that to your account [8].

$ git clone git@github.com:YourUserName/YourRepo.git

This creates a directory named YourRepo on the system.

10.2.3. Add a python file to your repository#

You first need to change into the directory that contains your repository. In the terminal, this is

$ cd YourRepo

When you list all [9] the contents of this directory you will see the .git directory

10.2.3.1. Create a file to add#

Note

Skip this step if you already have a file you want to add!

Now create a python file that contains just a print statement

$ echo 'print("Hello World")' > demo.py

10.2.3.2. Add a file#

We tell git we want to add a file to your repository using,

$ git add demo.py

This command just “stages” the file, meaning you have told git to include this change when you make the next commit.

10.2.3.3. Commit the file!#

You have not finished this until you commit the staged change!

$ git commit -m "Added a demo python script"

10.2.4. Look at the history of your repository#

$ git log

10.2.5. Push your change to GitHub#

$ git push

10.2.6. Tips for effective use of version control#

10.2.6.1. Do#

  • track text files

  • commit changes that are logically related

  • think of log messages as your lab notebook entries to help you (and others) to understand what you were thinking when changed the files

  • write meaningful log messages

  • commit often

  • push to GitHub often [10]

10.2.6.2. Do NOT#

  • add really big files to a repository

  • add binary files to a repository

  • add secrets [11] to a repository!

  • include a massive number of changes in one commit

10.3. Glossary of key version control terms#

add

Adding a file to a your repository.

clone

An independent copy of a repository. It is not required to be identical to the original.

commit

The act of recording changes to a file by version control software.

config

Configure the version control software.

conflict

Where someone else has made a change to a repository affecting the same lines as your change.

diff

A comparison of contents of two files / directories that shows only the differences.

.gitignore

A file that contains patterns that match files you do not want to be included in the repository.

log

Command to show the history of commits.

log message

Text that describes the purpose of the changes being committed to a repository.

manifest

Listing of files that are being tracked in a repository.

merge

The step of resolving conflicting repository versions.

repository

Short for software repository. This is a directory of (typically plain text source code) files pertaining to a project.

repo

See repository.

tracked

Refers to files whose contents are being recorded by version control software.

pull

Updating a repository by pulling changes from another (possibly on another computer) repository.

push

Pushing changes recorded locally to another (possibly on another computer) repository.

reset

See revert.

revert

To remove all changes made to the working copy of a file.

stage

Staging a file means informing git that changes to that file are to be included on the next commit step.

working copy

The files in a repository that are visible (they are not under the .git directory).