Pulling it together – writing your own programs

5.40. Pulling it together – writing your own programs#

5.40.1. Break the problem into pieces#

While it can seem daunting, the key to writing a program is to break the problem into smaller pieces and solve those individually. How do I recognise what to work on first, how can I decide what a piece is? I suggest thinking in terms of order of operations. To illustrate these steps let’s look at a simple equation.

\[x=b+\sqrt{b^2 + 4ac}\]

Here’s a listing of the parts:

\(b\)
the \(\sqrt{~}\), which contains

2.1. \(b^2\), which is

\(b\times b\)

2.2. \(4ac\), which is

\(4\times a \times c\)

To actually solve this problem requires we work from the inside (the most indented bullet points) towards the outside (the least indented bullet points). So let’s do this thing!

I’m going to rewrite the bullet list, defining the names of python variables I will use for each level.

b (well that was easy!)
sqrt_term

2.1. b_sq

2.2. four_ac

The first thing we have to do is define the variables we will use before we actually use them. We will just give them some starting numerical values (we know they have to be numbers because maths!) [1].

b = 5
a = 1.1
c = 32

That actually defines b (1.), so the first of our parts is solved. Now let’s solve go to the inner most pieces and define b_sq (2.1).

b_sq = b * b

Then four_ac (2.2).

four_ac = 4 * a * c

Now we can compute sqrt_term (2.).

from math import sqrt

sqrt_term = sqrt(b_sq + four_ac)

and the final solution (x)

x = b + sqrt_term
x

17.876334882255897

We can, of course, write this as a single statement.

x = b + (b**2 + 4 * a * c) ** 0.5
x

17.876334882255897

Now this is a simple problem. For more challenging problems, as discussed below, breaking problems into pieces and making sure each piece works is a more successful strategy.

5.40.2. Look for patterns#

Part of what we have just done is to look at the “problem” (execute an equation) and recognised patterns in it (based on mathematical order of operations). That approach also applies to more complicated challenges.

Let’s say we want to read in a plain text file which contains a header column followed by rows of numbers where fields are delimited by the tab character. Here is the first few lines of just such a file.

length      kappa
0.017963959082536105        8.567983199899585
0.036913880515213056        7.658395694530731

Algorithmically, the top level problems are:

Open the file (see Working with files)
Read the file line by line (see Working with files)

2.1. Transform each line into usable data

That last point is the inner most, so we focus our attention on the challenge of transforming lines. We look at the sample of the file to we identify any patterns and notice 2 features. The first is that all lines have the same number of fields (separated by \t). The second is that the header row is different in that the values are not numbers. We now modify the enumeration to give some more detail.

Open the file (see Working with files)
Read the first line in the file

2.1. Split the line into fields
Read the remaining lines in the file (see Working with files)

3.1. Split a line into fields

3.1.1. Convert the line items into float’s
Close the file (see Working with files)

So I suggest the place to start is 3.1.1. I’m going to write separate functions for each of these steps. The reason being that it allows us to reuse code [2], makes checking the code correctness easier and simplifies building more complex algorithms into being just the inclusion of already written functions.

We start this program with a function that takes a list of strings where every value needs to be converted into a float. I’m going to write it and test it, using an assert, with some sample data.

def cast_to_floats(values):
    """turns a series of strings into floats"""
    result = []
    for value in values:
        value = float(value)
        result.append(value)
    return result

sample = ["0.0", "24.3", "13.5"]
got = cast_to_floats(sample)
assert got == [0.0, 24.3, 13.5]

Yay! So that’s 3.1.1 out of the way. The next step out is solve 3.1. We also do this by writing a separate function that we check using some synthetic data and make sure it gives us the result we expect.

def line_to_fields(line):
    """splits at \t and cleans up the elements"""
    line = line.split("\t")
    # I think we should remove any leading / trailing white space from elements
    result = []
    for item in line:
        result.append(item.strip())
    return result

# this sample is \t delimited with a \n character at the end
# just as it would be if read from a file
sample = "0.0\t24.3\t13.5\n"
got = line_to_fields(sample)
assert got == ["0.0", "24.3", "13.5"]

Double Yay! That’s 3.1 (and thus 2.1) out of the way [3].

Returning to the task list, we remove the steps we’ve already done, making it simpler to see what remains.

Open the file
Read the first line in the file
Read the remaining lines in the file
Close the file

The first and last are easy (see Working with files). The remaining tasks (listed in the Exercise below) need to be solved before these 4 steps can all be combined into a single function. That function should use the line_to_fields() and cast_to_floats() functions that we have already written. At which point, job well done!

5.41. Exercises#

Using any text file, identify how to read just the first line.
Identify how to loop over all the lines in a file.
Identify how you can keep all the results of converting lines into floats.
Write a function parser() that completes the algorithm. You can apply it to the sample data you make up that looks like the above, or use this file.