7. Indices, slices and half-open intervals¶
Strings are the first kind of ordered collection we have come across in Python. Objects in each of these types can contain multiple objects, and we use indices to refer to individual elements. Python also has a notation called slicing to pick out subsets of multiple elements. We explore this functionality here with each of these collections, and note that indexing is useful with other Python collections that we will see, too.
Here, we use strings to examine index properties and notations in Python, but these all apply directly to other ordered collections.
7.1. Indexing¶
When we have types that contain more than one element, it is useful to be able to refer to individual elements. For example, if we have the string:
my_str = 'SOMETHING'
... we might want to be able to specify the element M
. That is
the role indices play, and here we discuss indexing notation.
In mathematics, you might have used indices with vector or matrix elements, or in sequences. These are often written with subscripts, so we might say that is an indexed quantity, and each element is the ith one. We can extend this notation to the ordered collections here.
The first thing to note is that we cannot actually write subscripts in
programming---we simply don't have the text formatting. So, instead
of opt for denoting indices within square brackets [...]
. Thus,
if we see in mathematics, we would write X[8]
in
programming.
Secondly, indices themselves must be of type int. We cannot have "element 1.5" from the string list---it would have to be either element 1 or 2.
Thirdly, we need to know the allowed range of values that indices can take.
For the starting index value, Python makes the choice to use 0, which is called zero-based indexing. This is also the syntax choice made in the C and C++ programming languages (and differs from the one-based indexing used in Fortran, Matlab and R, for example). We just have to get used to this, but it's not too bad!
For the final or last index, we must consider the length
N = len(...)
of a given string (or of any collection), which simply quantifies how many elements are in it. In our zero-based indexing case, the index of the last element is then given byN-1
.
Therefore, we can now specify the character C
within the string
above: my_str[2]
. Again, the index value is not 3, due to our
zero-based syntax. We could specify any other character in the string
by using indices 0 through 8, inclusively.
Note
Human language tends to be more "one-based" than zero-based.
So, sometimes it can be confusing that the "first element"
in a string my_str[0]
, while element my_str[1]
is
the second element, etc.
When in doubt, using computer syntax will be exact, and therefore that should be given preference over human language. Stating, "Show me the [1]th element," or "Show me the index-1 element," is clearer than, "Show me the second element." We must be clear about which element(s) we mean in any context.
Consider:
S = 'Walking Still'
The quotes that wrap a string (whether single, double or triple) are not part of it, and hence not included in its length or in index counting; they just define the boundaries of it. Thus each element (spaces included!) has its own index as follows:
string element: |
|
|
|
|
|
|
|
|
|
|
|
|
|
index: |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
From the above, we should be unsurprised to see the following displays of string length, type and values:
print("len of S :", len(S))
print("val of elements 0, 2 and 6 :", S[0], S[2], S[6])
... produce:
len of S : 13
val of elements 0, 2 and 6 : W l g
Q: Let var = 'The fox jumps'
. What is the largest index we
can use? And what are each of the following?
print( var[1] )
print( var[4] )
print( var[6] + var[7] + var[8] )
7.2. Index ranges and special index behavior¶
Consider the following string:
Y = 'abcdefg'
Quickly check yourself about what are the values of the following:
len(Y)
, type(Y)
, Y[1]
and Y[6]
. This should remind us
that Python collections are zerobased. Since len(Y)
is 7, the
allowed indices are integers in the interval .
What happens if we try putting in an index that is too large, like:
print(Y[7])
? We get a helpful error message:
1IndexError Traceback (most recent call last)
2Input In [5], in <module>
3----> 1 print(Y[7])
4
5IndexError: string index out of range
As we should expect, Python doesn't like us asking for a string
element that is too large, since it doesn't exist. The IndexError
here is appropriate: our index selector is "out of bounds" of the
allowed range of this string.
What if we try putting an index that is too small---that is, negative---like:
.. code-block:: Python
print(Y[-1])
? Well, we might expect a similar error message to the previous one, and in most other programming languages we would receive one. However, the Python interpreter doesn't complain, and produces this:
g
! Amazing. What has happened here? Python has made a conversion of
the indices internally: when a negative index -P
is used to
select a string (or collection) element, Python will evaluate it as
N-P
, where N is the object's length. In the above case,
, so the index -1
was converted to 7-1
, which is
6
, and indeed, Y[6]
contains the value g
. This
functionality can be both convenient and a danger: in a long string,
counting backward from the end can simplify element selection and
reduce a chance of mistakes. However, if the occurrence of a negative
index should really signal a bug in calculations, the interpreter
won't tell us. So use this power wisely!
Q: Let var = 'The fox jumps'
. What is var[2]
? What is
var[-2]
? What is var[13]
? What is var[-13]
? What is
var[-14]
?
7.3. Half-open intervals¶
From above, we see that for a collection of length N, the range of allowed indices can be expressed mathematically in any of the following ways (and note again that indices must be integers, which we don't write separately but just assume from here onwards):
index notation:
closed interval:
half-open interval:
These are all mathematically equivalent, so it is useful to recognize each and be able to move among them conceptually. However, in practice, it is also worth noting that Python tends to use the half-open interval notation quite often. If we call the boundaries in the interval "start" and "stop" (which is also common Python terminology), then an interval [start, stop) means that start is included within the interval while stop is not.
This half-open interval preference might seem bit of an odd choice at first, but it has several convenient features. Even here, notice that we don't need to include "-1" in the half-open interval case, making the syntax is a bit cleaner. So, we should start getting practice thinking in half-open intervals, because that is the more common Python syntax!
The two other important parameters or quantities associated with an interval are the step, which is the interval between the elements, and the total number of elements N (which is often called num in Python help docstrings). For the indices above, the step was 1, because we were just using consecutive integers, and the total number was N. How general are these relations? Let's take a look at a couple other examples, allowing start to be nonzero and step to differ from unity (these cases will apply just below, when discussing index slicing).
Ex. 1: What are the start, stop, step and N for this integer interval: ? We can read start=4 and stop=10 immediately, and we know step=1 also. If we check the set of values, we see they would be , therefore N=6.
Ex. 2: Consider the same interval as in Ex. 1, but let's define step=2. By definition, start=4 and stop=10 are the same as above, but the set of values are , therefore N=3.
Can we observe a pattern? Indeed, in general for half-open intervals, the following important relation holds:
(1)¶
The use of the ceiling function (which rounds values up to the next integer) is necessary: when step is not 1, the difference between stop and start need not be an exact multiple of step, but N must be an integer. We must also have , so any possible negative result would be ceilinged at 0.
And actually, the above relation holds even if do not restrict start, stop and step to be integer. It is a very general relation, that will be useful with different Python functions and behavior that use half-open intervals.
Q: What are start, stop, step and N for the following set of values: ? Does the above relation hold?
+ show/hide codeQ: What are start, stop, step and N for the following set of values: ? Does the above relation hold?
+ show/hide code7.4. Slicing¶
We can also specify an interval of indices in order to select more
than one element at a time, with what is called slicing. Let's
use the same Y = 'abcdefg'
string defined above. To select
elements with indices [3]
, [4]
and [5]
, we would use the
following:
Y[3:6]
That is, we define the start and stop of an open interval [start,
stop)
, and apply it as Y[start:stop]
. The output is itself a
string, the substring starting at the [start] element and going up to
(but not including!) the [stop] element. Note that the length the
output substring here is stop - start
---this convenient estimate
for length is one of the benefits of having a half-open interval.
Note
Just because we write half-open slice intervals with the
mathematical syntax , that does not mean that
we change how we bracket the indices on the Python variables
themselves. In our code, we only use square brackets for
indexing: Y[A:B]
.
There are some special cases in this syntax. If the start is 0, one does not need to include it. Or if one wants to extend the interval through the last index, one does not need to include a stop value. Thus:
print( Y[:3] )
print( Y[3:] )
print( Y[:] )
produces:
abc
defg
abcdefg
There is an extended syntax, as well, where you can select multiple
elements separated by a given step: Y[start:stop:step]
. The
following provide examples of usage (note that negative steps are
allowed):
print( Y[1:4:2] )
print( Y[3::3] )
print( Y[::2] )
print( Y[::-1] )
... with the outputs:
bd
dg
aceg
gfedcba
The relations from the half-open interval discussion, above, hold here, and are worth reviewing. We can estimate the size of the substring (that is, estimating its number of elements N) using Eq. (1). The step can be any integer, positive or negative, but when the step is negative, one would likely want to have a start that is greater than the step. Otherwise, one will end up with a null string (which is certainly allowed, if that is what one wants!).
Slicing is useful to select subsets of strings, with a pretty broad generality.
Q: What are the outputs of each of the following, using the
string S = 'Walking Still'
, above?
1print( S[-1] )
2print( S[:3] )
3print( S[3:] )
4print( S[:] )
5print( S[1:4:2] )
6print( S[3::3] )
7print( S[::2] )
In summary:
Indices are useful, if not necessary, features of working with ordered collections.
We should recognize indices when they appear in square brackets.
Python uses zero-based indexing, so allowed integer values of indices run from 0 through N-1, inclusively.
Negative indices are mapped to "count from the end of the collection" in Python, which can be convenient. It can also hide indexing mistakes, so we should be careful when using them.
Slicing provides a nice way to select subsets of our collection. We should recognize the half-open interval syntax used here (and which will be used in many Python functions and notations).
7.5. Practice¶
What is the output of each of the following lines of code:
1print('Days of the week'[4:12:3]) 2print('Days of the week'[9]) 3print('Days of the week'[-7]) 4print('I love python programming'[:8]) 5print('I love python programming'[4:]) 6print('You love python programming'[::3]) 7print('You love python programming'[-8:-2]) 8print('I love python programming'[:-5])
- Given the string
very_true = 'I love Python programming'
, use indices/slicing to display:i. the substring I love Python.ii. the word Python.iii. every fifth character starting from the firsto
. What is the full set of allowed index values for the string
'apple'
?Fix each of the following expressions based on the shown error message:
C = 'Here we go.' print(BC[1.0:3])
D = 'There we went.' print(D[2:)
What are start, stop, step and N for the following set of values, and does the relation in the above Eq. (1) hold?:
What stop value should be used to have values if start=-10 and step=5?