5.39. Extending capabilities using import#

In Python, the capabilities of the core name space is relatively stripped down. Much of the power of the language comes from the availability of modules. For instance, the math module is part of what is referred to as the Python standard library, i.e. it comes standard with all Python installations. How we gain access to these is through the import statement. math contains many basic mathematical operations, e.g. log() or sqrt(), as functions. We get access to those using the . notation.

import math

math.sqrt(4)
2.0

Modules have their own type.

type(math)
module

Just like standard Python objects, you can see what capabilities a module has using dir() [1] and help() works too.

dir(math)[:10]
['__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'acos',
 'acosh',
 'asin',
 'asinh']
math.log10(10)
1.0
math.log(10)
2.302585092994046

You can also import a specific module function using the keyword from

from math import sqrt

sqrt(4)
2.0

or multiple functions by separating them with a ,.

from math import sqrt, log2

log2(4)
2.0

Modules also serve to allow simplification of code. This enable putting logically related functions into a single file. They facilitate reuse of those functions in different programs, thus reducing redundancy and increasing the robustness of software.

Modules can be organised hierarchically, meaning that some modules are nested within others. How Python achieves this is actually dead simple, the name of a directory containing some Python scripts becomes the import name [2]. For instance, the Python standard library includes (among a multitude of goodies) the os module which is used for handling operating system related calls. Inside this module is another one called path that contain useful functions, among which is the dirname() function. Using . notation, we full specify that function as os.path.dirname.

import os

os.path.dirname("data/nested_dir/somefile.txt")
'data/nested_dir'

We can also import just that function

from os.path import dirname

dirname("data/nested_dir/somefile.txt")
'data/nested_dir'

You can renamed imported modules using the as keyword.

from math import sqrt as msqrt

msqrt(16)
4.0

5.39.1. “third party” libraries#

An even greater appeal of Python is the availability of highly sophisticated modules written by others.

Of particular note is numpy (numerical Python). This library is arguably the main reason Python is so popular in science. Numpy provides critical routines in numerical mathematics, particularly linear algebra. But it’s very broadly useful, being ~10x faster than straight Python implementations. It also allows succinct expressions for arrays and provides very useful methods on arrays.

Other invaluable libraries for science are Scipy, Pandas, Matplotlib, IPython and biology specific libraries (such as cogent3).

We will cover numpy in a separate section.

5.39.2. Why use libraries written by others?#

  • Widely scrutinised, so less chance of code errors

  • Typically better performance

  • May provide algorithms that are simply too difficult to write yourself!

There are an increasing number of Biology specific libraries. My own lab produces a number of open sourced library for genomic biology (e.g. cogent3, which we will use later in the course).

5.39.3. Writing your own modules#

Since a Python script is a module, then all you have to do is write your code in a python script. If that script is on what is called the python path, then it can be imported and any functions within can be used.

The python path refers to the places on your computer that Python will look for modules. The first is the directory from which the Python executable was started. The second is the “installed packages” location, typically a directory called site-packages which is “within” Python itself. The third is a custom location which you have to tell Python about, for instance using a special PYTHONPATH environment variable.