5.24. Dealing with dictionaries#

The dict type is a mutable collection data structure that is extraordinarily useful. They provide super fast lookups of items and allow a more natural specification of how things are stored and retrieved. In general, a dictionary consists of key/value pairs [1]. The value is the object of interest and the key is how you retrieve it from a dictionary instance. This means that dictionaries are not “ordered”, i.e. you can’t rely on the first element in always appearing “first” if you start to loop over

5.24.1. Creating an empty dict#

There are two approaches to defining a dict, the first uses “empty” curly braces [2].

x = {}
x
{}

Or by calling the builtin dict() without any arguments.

x = dict()
x
{}

Dictionaries have a length, which in this case is 0.

len(x)
0

5.24.2. Creating a non-empty dict#

The syntax differs between the two different approaches. In the case of using {}, you separate keys from values using the : and from the next key/value pair using a ,.

names = {"first": "GAVIN", "last": "Huttley"}
names
{'first': 'GAVIN', 'last': 'Huttley'}

Note

Only immutable data types can be used for dictionary keys.

Using the dict() function allows a quite different approach that can only be applied if the keys are going to be strings. In this approach, the keys become argument names to dict() and the values are assigned to them in the function call. In the following, notice that we use what a standard keyword argument statement within a function call – we remove the quotes from the key and use = to assign the value.

names = dict(first="GAVIN", last="Huttley")
names
{'first': 'GAVIN', 'last': 'Huttley'}

Warning

This approach only works if the string can be a valid python variable. For instance “first_name” would work, but “first name” would not.

Note

Dictionary keys are always unique. If you make successive assignments to a dict with the same key name, you are simply overwriting the previous value for that key.

5.24.3. Retrieving values from a dict by “indexing”#

We obtain values from a dict instance using the key. Using the names instance, we can get the value corresponding to the key "first" using the standard looking indexing syntax (i.e. using []).

f = names["first"]
f
'GAVIN'

If you try to get a key that does not exist, Python raises a KeyError.

f = names["first name"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[7], line 1
----> 1 f = names["first name"]

KeyError: 'first name'

KeyError exception

5.24.4. Retrieving values from a dict using the get() method#

The get() method is an alternate to indexing using []. If a key does not exist, it defaults to return None instead of raising a KeyError.

v = names.get("first name")
type(v), v
(NoneType, None)

You can provide your own “default” value for when a key is missing. If we were using a dict to record counts of nucleotides, for instance, we can define a default value of 0 (for an alternate approach to counting).

counts = {}
seq = "ACGGCCG"
for nucleotide in seq:
    counts[nucleotide] = counts.get(nucleotide, 0) + 1

counts
{'A': 1, 'C': 3, 'G': 3}

5.24.5. Looping over a dict#

The dict object is an iterable data type. This means you can loop over it. This process returns the keys of the instance.

for k in counts:
    # printing both the key and it's value
    print(k, counts[k])
A 1
C 3
G 3

5.24.6. Seeing if a dict contains a key#

This is done using the in operator.

has_a = "A" in counts
has_a
True
has_t = "T" in counts
has_t
False

5.24.7. Displaying all the keys or all the values or all the items of a dict#

5.24.7.1. Getting all the keys#

To find what keys are present in a dict, we use the aptly named keys() method. This returns a custom type [3], which can be iterated over.

v = counts.keys()
type(v)
dict_keys

You can use that to get the keys as a different data type, e.g. a tuple or list, using the respective builtin functions.

keys = tuple(counts.keys())
keys
('A', 'C', 'G')

But you can get the same thing by passing the dict instance itself. This works because the tuple() and list() functions take an iterable as their argument and, as we showed above, iterating over a dict returns the keys.

keys = tuple(counts)
keys
('A', 'C', 'G')

5.24.7.2. Getting all the values#

This is what the values() method does! It returns a custom data type [3] which can be iterated over.

counts.values()
dict_values([1, 3, 3])

5.24.7.3. Getting all the key/value pairs#

We can achieve this by using the items() method which, again, returns a custom data type [3].

counts.items()
dict_items([('A', 1), ('C', 3), ('G', 3)])

A common usage pattern for the items() method is for looping with assignment unpacking.

for key, value in counts.items():
    print(f"key={key} and value={value}")
key=A and value=1
key=C and value=3
key=G and value=3

5.24.8. Adding new items to a dict#

Adding a new item to an existing dict is just an assignment.

counts["T"] = 0

5.24.9. Updating an existing item#

But where dicts become really valuable is when you need to dynamically update a value. We’ve shown this above in the case of constructing our dict of nucleotide counts (the counts are incremented). But consider the case when we have a mutable data type, such as a list, as the value. Let’s consider the following data

data = [['FlyingFox', '8.57'],
        ['DogFaced', '7.66'],
        ['edge.0', '4.66']]

Say we want to convert the second column to floats. We can do this by iterating over the rows and only convert the index 1. Another approach is to construct separate lists for each column and convert the entire column [4]. We start by defining our dictionary with the keys assigned values of empty lists. (I’m using assignment unpacking again.)

by_column = {"name": [], "stat": []}
for name, stat in data:
    by_column["name"].append(name)
    by_column["stat"].append(stat)

by_column
{'name': ['FlyingFox', 'DogFaced', 'edge.0'], 'stat': ['8.57', '7.66', '4.66']}

In the above, the by_column[<key name>] returns the value for that key. We can then directly access methods on that returned object using the . syntax (in this case, the append() method) which we use, appending a new value to. This is an example of method chaining (see Method chaining).

We can now apply our casting to the numerical column only.

by_column["stat"] = [float(v) for v in by_column["stat"]]
by_column
{'name': ['FlyingFox', 'DogFaced', 'edge.0'], 'stat': [8.57, 7.66, 4.66]}

This pattern of modifying the value associated with a key based on its current value is extremely useful.