5.45. Data Visualisation using Plotly#

Plotly is a javascript based plotting library that has interfaces for Python and multiple other programming languages [1]. This javascript foundation enables its ability to display interactive plots within web browsers. Below I introduce the Plotly Express [2].

5.45.1. Scatter plots#

A quirk of Plotly is that cartesian coordinates (i.e. \(x\), \(y\) coordinates) are represented as separate series for the \(x\) and \(y\) components. For instance, let’s display a single point at \(x=1,y=3\) (or \((1,3)\)) on a scatter plot using the scatter() function. This returns a Plotly figure object which can be used for display.

import plotly.express as px

fig = px.scatter(x=[1], y=[3])
fig.show()

To add another point at (3, 7) we append 3 to the x list and 7 to the y list.

import plotly.express as px

fig = px.scatter(x=[1, 3], y=[3, 7])
fig.show()

5.45.2. Modifying axis labels#

We can specify our own labels for the x and y axes

import plotly.express as px

fig = px.scatter(x=[1, 3], y=[3, 7], labels={"x": "species 1", "y": "species 2"})
fig.show()

5.45.3. Modifying display#

5.45.4. Modifying figure size#

The figure dimensions will adjust to your browser width, unless you specify their width and/or height. The units for those settings are pixels. We make a square plot.

import plotly.express as px

fig = px.scatter(x=[1, 3], y=[3, 7], width=400, height=400)
fig.show()

5.45.4.1. Selecting different symbols and/or sizes#

Making more refined changes to display properties requires some inspection of the base objects. As mentioned above, dictionaries are the basis for all Plotly objects and the dict has two top-level components: “data” and “layout”. The data consists of a series of “traces”. Attributes, such as coordinates of scatter points and the type of plot are recorded in individual traces. Inspecting the last figure from above.

len(fig.data) # there's a single trace
1
fig.data[0]
Scatter({
    'hovertemplate': 'x=%{x}<br>y=%{y}<extra></extra>',
    'legendgroup': '',
    'marker': {'color': '#636efa', 'symbol': 'circle'},
    'mode': 'markers',
    'name': '',
    'orientation': 'v',
    'showlegend': False,
    'x': array([1, 3]),
    'xaxis': 'x',
    'y': array([3, 7]),
    'yaxis': 'y'
})

We can access an individual element using standard dictionary operations.

fig.data[0]["marker"]
scatter.Marker({
    'color': '#636efa', 'symbol': 'circle'
})

We can change these values and the change will affect the figure [3].

fig.data[0]["marker"]["size"] = 18
fig.data[0]["marker"]["symbol"] = "square"
fig.show()

5.45.5. Histograms#

import plotly.express as px
import numpy as np

x = np.random.randn(1000)

fig = px.histogram(x=x)
fig.show()
x[:10]
array([ 2.46586219,  0.07501903, -0.0100098 , -0.29889198,  0.46080855,
        0.69983817,  0.16465083,  1.02149517,  1.30200869,  0.7501532 ])

5.45.6. Bar charts#

When dealing with genomic data, we frequently deal with genomic coordinates. One type of question that is raised in these circumstances is whether observations are random across the genome [4]. We can use a bar plot to visually examine the density of observations.

This specific example is contrived as I’m using simulated data points, but the approach here will be useful.

Generate 100 random integers between 0 and 21.

from numpy.random import randint

nums = randint(low=0, high=21, size=100)

Use a builtin Python counter class to count the number of occurrences of the different integers [5].

from collections import Counter

counts = Counter(nums)
print(counts)
Counter({17: 10, 12: 8, 11: 7, 8: 7, 18: 7, 13: 6, 14: 6, 0: 6, 3: 6, 4: 5, 19: 5, 1: 5, 20: 4, 7: 4, 16: 3, 6: 3, 9: 3, 2: 2, 5: 2, 10: 1})

Generate the x and y series for plotting.

x, y = [], []
for n in sorted(counts):
    x.append(n)
    y.append(counts[n])

Construct the bar chart

import plotly.express as px

fig = px.bar(x=x, y=y)
fig.show()

5.46. Exercises#

  1. Look at the plotly documentation and convert one of the scatter plots into a line plot.

  2. In the bar chart example above, the numbers were generated from 10-31. The midpoint of this range is 20 (there are 10 smaller numbers and 10 larger numbers). Modify the x-axis values so that instead of showing the x-axis values rangig from 10 to 30, centred on 20, they range from -10 to 10, centred on 0. The result should look identical to the above but any current x-axis values < 20 will be negative.

  3. The elements of coords are conventional cartesian coordinates, i.e. \((x, y)\). Display them as a scatter plot.

    coords = [(2, 7), (-2, -4), (1, 3)]