5.27. Indexing and Slicing#

5.27.1. Indexing#

This works equivalently for python strings, tuples and lists. There are similarities to indexing other data types, and differences.

An index refers to the order number of an element in a series. Elements from instances of a series data type (e.g. str and list) can be referenced by their index number.

For all data types, the [] are used to specify the indices.

Note

Python indexes start at 0.

#       0123...
data = "ACGTACGTACGT"
print(data[1])
C

Indexing is also used for assignment. In the following we assign the value -2 to the 0-th index of the list more_data.

more_data = ["some text", 4, 23.4]
more_data[0] = -2
print(more_data)
[-2, 4, 23.4]

Assignment is not possible with an immutable data type like a string.

data[0] = "T"
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 data[0] = "T"

TypeError: 'str' object does not support item assignment

Indexing beyond the length of a series causes an exception. Here is a little gotcha. Although more_data is 3 elements long, there is no element at index 3. That’s because Python indexes start at 0.

print(more_data[3])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
Cell In[4], line 1
----> 1 print(more_data[3])

IndexError: list index out of range

5.27.2. Indexing in multiple-dimensions#

Using the list of lists I created earlier.

seq_records = [["label 1", "AA"], ["label 2", "TT"]]
seq_records[0]
['label 1', 'AA']
seq_records[0][1]
'AA'
seq_records[1][1]
'TT'

5.27.3. Slicing#

Slicing is just an indexing operation that refers to a range of elements. A slice operation allows you to select a sequential ordering of elements. The syntax for a slice is [start:end:stride], but some of these terms are optional.

  • start refers to the first index from which elements will be sampled. Defaults to 0.

  • end refers to the index up (but not including) to which the elements will be sampled. Defaults to the length of the series.

  • stride refers to the separation between selected elements. Defaults to 1.

data
codon1 = data[:3]
codon1
'ACG'

Note

I omitted the start and just used the :. Python interpreted this as “slice from the start of the string up to (but not including) index 3”.

Negative slicing works from the end.

data[-3:]
'CGT'

You can even specify a stride, which causes the slice to occur in steps of the specified length. Below I set the stride =3 (which is what you would do if you wanted to select 1st codon positions, for example).

data[0:9:3]
'ATG'

Slicing to beyond the length of a series does not cause an exception.

data[:15]
'ACGTACGTACGT'

5.28. Exercises#

  1. Consider the dict defined below

    d = {0: "value for 0", ("a-key",): "funky key"}
    

    Get each value of d using index notation [1].

  2. What does nums[::-1] do on the following?

    nums = [0, 1, 2, 3, 4]
    
  3. For the simple protein coding DNA sequence ATGATGATG [2], use a slice to extract the first codon [3]. Do the same for the last codon.

  4. For the same sequence, use a slice operation to obtain the first nucleotide of each codon, i.e. you should produce ["A", "A", "A"]. Do this for the second codon position (producing ["T", "T", "T"]) and then the third codon position.

  5. Split the sequence ATGAAATAA into codons (non-overlapping letter triples). (The most succinct solution uses a list comprehension.)