5.42. Miscellaneous Python features#

5.42.1. Raw strings and escaping characters#

Some characters have special meaning and including them in a string requires “escaping” them. This can be done using a \ character. For instance, normally including the character \n in a string introduces a new line character.

text = "hello\nworld"

If we don’t want that to happen, we can either “escape” the \ character

text = "hello\\nworld"

or define it as a raw string by prefacing the string definition with the r character. The main advantage of this approach is it’s easier to write.

text = r"hello\nworld"

5.42.2. bytes strings#

A type of string. The available methods are substantially the same as for str objects. There are important exceptions. You can create a bytes instance using a special string prefix.

btext = b"some text"
b'some text'

We can convert a bytes instance to a standard string using the decode() method [1].

text = btext.decode(encoding="utf8")
'some text'

We can convert a standard string into a bytes instance using the encode() method.

back = text.encode(encoding="utf8")
b'some text'

5.42.3. open() files in binary mode#

Using mode="rb" opens a file in binary mode. The file contents are returned as bytes without any decoding.

with open("python/misc.rst", mode="rb") as infile:
    line = infile.readline()

b'.. jupyter-execute::\n'

5.42.4. Empty series evaluate to False#

One property of Python builtin series is that if they are empty, then they evaluate to False. This is referred to as Falsy and the converse is Truthy.

sample_data = ["some text", ""]
for text in sample_data:  # yes, lists are iterable too!
    if text:
        print("YES", text)
        print("NO Empty string")
YES some text
NO Empty string


I iterated over elements of the list sample_data. I also used conditionals within the for loop.

The values 0, 0.0 and None also evaluate to False.

5.42.5. Checking correctness using assert#

It’s essential to check the correctness of your code. Knowing where and when you do this is a skill that you will develop by programming. For now I just demonstrate the syntax for using the assert statement.

name = "Gav"
assert type(name) == str,  f"name {name} is not a string"
print("Sanity check passed!")
Sanity check passed!

This is what it looks like when it fails.

name = 0
assert type(name) == str, f"name {name} is not a string"
AssertionError                            Traceback (most recent call last)
Cell In[13], line 2
      1 name = 0
----> 2 assert type(name) == str, f"name {name} is not a string"

AssertionError: name 0 is not a string

5.42.6. “Comprehensions”#

A comprehension is a very succinct, and simple, for loop. They are quite fast and are useful. List comprehensions#

Here’s an example for converting floats into strings.

nums = [
s = [str(v) for v in nums]
 '0.8966182758861486'] Dictionary comprehensions#

So many uses for a dict! A simple demonstration, using the nums variable from above. Notice in this case I’m using multiple unpacking.

k_v = [["A", 0.1], ["C", 0.2], ["G", 0.3], ["T", 0.4]]
d = {k: v for k, v in k_v}
{'A': 0.1, 'C': 0.2, 'G': 0.3, 'T': 0.4}

5.42.7. Zipping / Unzipping series#

Say you have two data series, of equal length, and you want them combined into a single object. This can be done using the built-in zip(). For example, here’s a zip operation performed on two strings:

columns = list(zip(seq1, seq2))
[('A', 'A'), ('G', 'G'), ('T', 'T'), ('A', 'A'), ('A', 'A')]

You can also unzip series. For example, consider the following list of lists. We can decompose that into 2 separate series using zip with the argument prefaced by *.

coords = [[0, 23], [42, 42], [13, 27]]
x, y = zip(*coords)
(23, 42, 27)

5.42.8. Method chaining#

When you make multiple method calls on the “same” object, this is called “chaining” or “method chaining”. It can be done when the method call returns an object that contains the next method. These statements are read left to right. For example, in the following, I chain the string methods strip() and split().

text = "A\tB\t\n"
data = text.strip().split()
['A', 'B']

These types of expressions are used to save creating intermediate variables and, some argue, for clarity.