5.27. Indexing and Slicing#
5.27.1. Indexing#
This works equivalently for python strings, tuples and lists. There are similarities to indexing other data types, and differences.
An index refers to the order number of an element in a series. Elements from instances of a series data type (e.g. str
and list
) can be referenced by their index number.
For all data types, the []
are used to specify the indices.
Note
Python indexes start at 0.
# 0123...
data = "ACGTACGTACGT"
print(data[1])
C
Indexing is also used for assignment. In the following we assign the value -2
to the 0
-th index of the list more_data
.
more_data = ["some text", 4, 23.4]
more_data[0] = -2
print(more_data)
[-2, 4, 23.4]
Assignment is not possible with an immutable data type like a string.
data[0] = "T"
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[3], line 1
----> 1 data[0] = "T"
TypeError: 'str' object does not support item assignment
Indexing beyond the length of a series causes an exception. Here is a little gotcha. Although more_data
is 3 elements long, there is no element at index 3. That’s because Python indexes start at 0.
print(more_data[3])
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[4], line 1
----> 1 print(more_data[3])
IndexError: list index out of range
5.27.2. Indexing in multiple-dimensions#
Using the list of lists I created earlier.
seq_records = [["label 1", "AA"], ["label 2", "TT"]]
seq_records[0]
['label 1', 'AA']
seq_records[0][1]
'AA'
seq_records[1][1]
'TT'
5.27.3. Slicing#
Slicing is just an indexing operation that refers to a range of elements. A slice operation allows you to select a sequential ordering of elements. The syntax for a slice is [start:end:stride]
, but some of these terms are optional.
start
refers to the first index from which elements will be sampled. Defaults to 0.end
refers to the index up (but not including) to which the elements will be sampled. Defaults to the length of the series.stride
refers to the separation between selected elements. Defaults to 1.
data
codon1 = data[:3]
codon1
'ACG'
Note
I omitted the start
and just used the :
. Python interpreted this as “slice from the start of the string up to (but not including) index 3”.
Negative slicing works from the end.
data[-3:]
'CGT'
You can even specify a stride, which causes the slice to occur in steps of the specified length. Below I set the stride =3 (which is what you would do if you wanted to select 1st codon positions, for example).
data[0:9:3]
'ATG'
Slicing to beyond the length of a series does not cause an exception.
data[:15]
'ACGTACGTACGT'
5.28. Exercises#
Consider the
dict
defined belowd = {0: "value for 0", ("a-key",): "funky key"}
Get each value of
d
usingindex
notation [1].What does
nums[::-1]
do on the following?nums = [0, 1, 2, 3, 4]
For the simple protein coding DNA sequence
ATGATGATG
[2], use a slice to extract the first codon [3]. Do the same for the last codon.For the same sequence, use a slice operation to obtain the first nucleotide of each codon, i.e. you should produce
["A", "A", "A"]
. Do this for the second codon position (producing["T", "T", "T"]
) and then the third codon position.Split the sequence
ATGAAATAA
into codons (non-overlapping letter triples). (The most succinct solution uses a list comprehension.)