16. Loops, I: for¶
We have noted previously how Python will generally move through a script file while interpreting it (its flow): by default, the interpreter starts at the top of the file and moves downward, executing each successive line of commands/expressions one time.
We also saw one way that we could affect this flow, using the if/elif/else conditionals: these allow us to direct the interpreter to execute some commands only if some condition(s) were true and perhaps skip over others. Even in that case, however, each set of commands in a conditional branch were only executed once, at most.
In the present section, we introduce a control statement that allows us to have a set of commands be executed repeatedly, by instructing Python to "loop" back over a set of lines for a specific number of times. This allows us to translate a large class of mathematical expressions (particularly ones that use indices) to compact code, which yet again conveniently mirrors the maths itself. All this is done with a for loop.
16.1. Motivating example for loops¶
In the previous sections we looked at making arrays and then at visualizing some. These objects store a set of values of a particular type in sequence. There were several ways to create and provide values for arrays. We could quickly make an array with a constant value (all zeros or ones) or one with evenly spaced values, but anything else became much more difficult: we had to type the values ourselves. While this might understandable with entering essentially arbitrary values, what about when we have a mathematical formula?
Let's revisit making the arrays used in the basic plotting scenario: to plot the relation over the interval We do this by making two arrays of the same length, one of x-values and the other of corresponding y-values. We can make the former with a nice, evenly spaced set of values through our chosen interval:
x = np.linspace(-2, 2, 5)
Mathematically, we now have a set of indexed x-values, and our quadratic relation means we should be able to generate the accompanying y-values for each one. This could look like the following rule (first declaring y, formally):
(1)¶
OK, we can translate that into Python by first defining an (empty) array of floats, and then assigning the values to each element, one-by-one:
y = np.zeros(5) # same length as x, initialized with 0s
y[0] = x[0]**2
y[1] = x[1]**2
y[2] = x[2]**2
y[3] = x[3]**2
y[4] = x[4]**2
That translation is doable, and it does produce the correct numbers, but what if our arrays each had 1000 elements? We would have to get pretty good at copying-and-pasting, and that seems like far too much work.
Well, let's go back to the mathematics first. Looking at the relations in (1), we might notice a pattern in the relations that generate each y-component: each expression is the same, just changing the index. So, we could write it more compactly as a single rule that applies many times, as follows:
(2)¶
This is a nice compact mathematical expression. Is there a nice way to translate this to programming? Yes, there is, using the same keyword as we see there: for.
We can think of the expression in (2) as having two parts, each of which translates pretty directly: specification of the index values over which the relation hold; and then the expression itself, to be performed for each index value. We could translate each part as in the following table:
Math |
Comp |
|
---|---|---|
|
||
|
||
|
Note
Recall that np.arange
has a
default start=0
and step=1
. Due to the half-open
interval specification "[start, stop)", a stop
value of
5 is appropriate. If we wrote the possible integer indices
as , this would be directly appreciable.
And np.arange
would "guess" the appropriate dtype of int
here.
We put together the above lines with a bit of specific syntax, creating our for loop:
1y = np.zeros(5)
2
3for i in np.arange(5):
4 y[i] = x[i]**2
As promised, it very closely matches the mathematical description in
(2). Just the order of pieces has been rearranged to begin
the code structure with for
, which is a new keyword, similar to
if
, which we met in the earlier introduction to conditionals. Also similar is the style of denoting which expressions
will be inside the for-loop, by indenting them---in this case, only
the line y[i] = x[i]**2
.
The syntax and behavior are detailed below, but briefly what happens
in the above code is as follows. The interpreter reaches the keyword
for
in Line 3, and will then assign the first value of
np.arange(5)
(which is 0) to i
, and then execute Line 4, which
would read y[0] = x[0]**2
. After that, the code does not
proceed to any Line 5, but instead it loops back to Line 3, and
i
receives the next value from the array (which is 1); then it
executes Line 4, which now reads y[1] = x[1]**2
; and loops back
to Line 3. This continues until i
has iterated through all the
values in the array and executed Line 4 a final time, after which the
code finally proceeds to Line 5. In the end y
looks as it should:
array([4., 1., 0., 1., 4.])
For completeness, we would write the full bit of code, including the
definition of x
and a couple small tweaks, as:
1N = 5
2x = np.linspace(-2, 2, N)
3y = np.zeros(N)
4
5for i in range(N):
6 y[i] = x[i]**2
Briefly, so many quantities depend on the number of elements in one
array that it is nicer to have a single variable defined with that
value. If we want to change the problem to have 100 points, now we can
do so by just changing the value of N
, rather than hunting for
every corresponding use of 5
. Additionally, we changed
np.arange(N)
to range(N)
. Why? Well, the syntax and output
of values are the exact same here; however, as discussed below,
range
is a more efficient way of generating the index values
(especially for large N
), and it only outputs ints. So we will
typically prefer to use range
in these cases.
There are a few things to note:
This is very powerful. Instead of using N lines of code to populate N elements, we can just use 2 lines! Try replacing
5
with100
in the above for-loop code. Much nicer than extending the first programmatic approach?This type of statement can be used quite generally when there is a pattern or rule among elements in an array. If we can write a clear mathematical expression in terms of the index, the rest of our work is just translating a couple lines, like above.
For-loops are also useful outside of working with arrays: any time we want simple repetition of a command, or repetition with variations on a pattern, or counting, or ... the list of applications is extensive.
Take a look again at the mathematical expression and at the computational for-loop. Read each one out loud. Note how similar they are. This is yet another excellent example of what we mean that programming is not something totally foreign or new: it is mainly translating mathematics to be understood by a computer.
Computers are not smart. So we have to learn Python's keywords (here,
for
andin
) and use its syntax (e.g.,:
and indentation). But computers are fast at calculations. So we could generate 100,000 values here in a split second. That makes it worth us spending a bit of time to get comfortable with programming.
16.2. for
: basic syntax and repetition¶
The primary purpose of the for
loop is to allow us to repeat a set
of commands a specific number of times. There is also a variable that
changes value each time, which we can use in various ways or just
ignore. The basic syntax to repeat one or more commands is:
1for <loop variable> in <iterable>:
2 <a command to repeat>
3 <maybe more commands to repeat>
4<other commands, unrepeated>
The keyword for
at the start of Line 1 alerts Python to the start
of the looping structure. Then a loop variable is given, that
will take values in sequence from the iterable one at a time. In
the motivating example above, our first iterable was an array of ints,
but we can actually use any Python type that has a collection of
values as an iterable, even non-numeric ones like a string. We can
also use functions that generate a set of values one-by-one such as
range
, which was also introduced above---the values matched those
of the np.arange
array, but we didn't need to store them all at
once. The colon :
ends the line, part of Python syntax we have
seen before with conditionals; the "for" structure is complete.
Following the colon, we have one or more lines of commands/expressions to be repeatedly evaluated each time the loop variable gets a new value. We specify the commands in the "for" loop by indenting them (just as we did to specify which commands were "in" a conditional branch). Any subsequent line that is not indented is outside the loop, and will not be repeated.
How does Python evaluate this "for" structure? When the interpreter reaches Line 1, it assigns the first iterable value to the loop variable, and then executes all the indented lines in order; these lines can contain the loop variable, in which case their expressions are evaluated with that first value being used. After reaching the end of the indented lines, instead of proceeding to the next line, Python loops back to Line 1. It now assigns the next iterable value to the loop variable, and executes the lines inside the loop again; any lines using the loop variable now use its new value when they are evaluated. Python then loops back to Line 1, and the pattern repeats until all values of the iterable have been used once. After the evaluating the commands with that last value, the interpreter exits the loop and continues on to the evaluate the unindented commands like normal (without repetition); if the loop was at the end of the program, it just stops.
Consider the following example:
1for var in range(-2, 2):
2 print("- I am inside the loop :)")
3 print("-- Right now, var =", var)
4 var2 = var**2
5 print("--- & var-squared =", var2)
6print("I am outside :(")
7print("And out here now, var =", var)
Here, the loop variable is var
and the iterable is range(-2,
2)
, which sequentially creates integers in the interval [-2, 2)
.
The Python interpreter reads Line 1 and the loop variable receives its
first value from the iterable (var=-2
). Then it executes Lines
2-4, knowing that these are inside the loop due to their indentation
after :
. Since it knows there are more values in the iterable, it
loops back to Line 1 to reassign the loop variable the next one
(var=-1
). Then it executes Lines 2-4, etc. until it has worked its
way through the iterable. At that point, it exits the for-loop and
continues on to Line 5, which is just executed once. This produces
the following output:
1- I am inside the loop :)
2-- Right now, var = -2
3--- & var-squared = 4
4- I am inside the loop :)
5-- Right now, var = -1
6--- & var-squared = 1
7- I am inside the loop :)
8-- Right now, var = 0
9--- & var-squared = 0
10- I am inside the loop :)
11-- Right now, var = 1
12--- & var-squared = 1
13I am outside :(
14And out here now, var = 1
Indeed, the loop variable var
has a different value at each
iteration (or "during each loop over the commands"). It can be used
in any expressions. When Python exits the loop, var
maintains the
last value it had, and it can be used as a normal variable in the
code.
Q: After the above loop (where var
is the loop variable),
does var2
exist after the loop finishes? If so, what value
does it have?
16.3. for
: with arrays¶
One of the most powerful uses of the for
loop is to combine it
with arrays. Why is this so?
Well, any array has to have a well-defined length to be created; for
example for some 1D array A
, we can find its length as N =
len(A)
. We then know that the set of indices of A
's elements
are the integers in the range . In the common for-loop
syntax for i in range(N):
, what are the values i
takes across
all repetitions? Conveniently, each of the integer values in
. So, if we want to go through each element of an
array and perform some calculation (using each element's value,
assigning each element's value, etc.), then the for-loop has a very
convenient structure, such as:
1A = np.zeros(N)
2for i in range(N):
3 A[i] = <some expression, likely using i>
Notice that we do have to declare (and initialize) A
first, before
looping over all of its values. This seems logical: before we can
talk about filling in element A[37]
, we have to now that that
element even exists. (Python will not just create an array with at
least 38 entries for us if we assign to A[37]
---we need to define
A
before using any of its elements.)
Consider for example, wanting to make some integer values according to this mathematical expression:
(3)¶
We can translate each of the pieces as follows:
Math expression
Comp expression
B = np.zeros(N, dtype=int)
B[k] = k**2 - 4*k + 2
for k in range(N)
... where we have again used the fact that range
behaves like
np.arange
. Therefore the above mathematical sequence would be
stored in an array by adding one final piece of information: an actual
value for N, for example 8:
N = 8
B = np.zeros(N, dtype=int)
for k in range(N):
B[k] = k**2 - 4*k + 2
Note how similar the mathematical formula and the programmed version are!
Why do we initialize the storage array with 0s? Well, consider the following potential coding mistake:
N = 8
Berr = np.zeros(N, dtype=int)
for k in range(4): # mistake in range, not N
Berr[k] = k**2 - 4*k + 2
That doesn't lead to an error from the Python interpreter, but what
happens when we print(Berr)
? We see:
[ 2 -1 -2 -1 0 0 0 0]
... where the last four values of 0
jump out pretty quickly,
and we can go back to our code see our mismatch in array length and
iterable size. (Though, there are certainly times where 0s in the
output will be expected---thus, we have to know the context and have
an expectation for whether seeing zeros is actually bad or not. But
many times it is a useful guard.) In many cases, it is just a matter
of convenience to initialize with zeros, namely for this demonstrated
reason.
Q: What do you need to change about the (correct) version of the above for-loop to be able to calculate over the interval ?
+ show/hide responseNote
Above, we have seen the explicit statement of vector set and size provided in each case, e.g., , . This is fairly formal, and not always included when problems are expressed, particularly in more applied realms. Even if it is not provided, we should consider what such a statement would look like based on the problem at hand in order to guide our array declaration.
If we altered the mathematical expression in (3) slightly to the following one:
(4)¶
... then what factors do we need to consider in adapting our
previous result to calculate an array C
? Well, some things to
think about might be:
How many elements will
C
have, and what is their dtype? (Always a consideration when using arrays.)How will we calculate the functional values of ?
Does using a function like in the elementwise expression change the way we assign elements in an array?
Q: Ponder the above questions, and write your own code to
calculate C
for N = 8
.
Q: Plot both B
and C
(where N = 8
for both) on the
same graph (since we covered plotting arrays earlier).
In summary, the key feature of both arrays and for-loops in this context is that there is that the length/size of each are known. Thus, it is straightforward to traverse all the elements in a 1D array with the index values from the for-loop, repeating one or more commands in each case. The mathematical part of each element will often translate directly, as well, so be sure to write down clear expressions before starting your code. Any time you are creating arrays of sequences or other values with a formula based on index, consider using a for-loop!
16.4. range
syntax¶
The above examples all specified intervals of the loop index starting
from 0, by including a single argument to be the "stop" value; this is
often useful using a for-loop to go through all elements in some
array. However, one does not need to be restricted to this, and one
may be interested in other applications where different ranges of loop
indices are desired. The range
function can take up to three
arguments, changing the "start" and "step" as well. The syntax
mirrors that of np.arange
presented here,
and in typical Python style the interval is half-open, specified as
[start, stop)
. One can verify the following cases (we will talk
about list
s later, but for now we use them to store+display the
values produced by range
):
list(range(10)) # def: start = 0, step = 1
list(range(4, 12)) # def: step = 1
list(range(3, 15, 2))
These yield the following, respectively:
[0, 1, 2, 3, 4, 5, 6]
[4, 5, 6, 7, 8, 9, 10, 11]
[3, 5, 7, 9, 11, 13]
Consider, for example, printing odd numbers from 1 to 100. This can be done quickly with the range settings in a for-loop:
for n in range(1, 100, 2):
print("odd numbers: ", n)
This could also be done having a loop index from , since each nth odd number can be written as . Translating this to a for-loop, one has:
for n in range(50):
print("the same odd numbers: ", 2*n+1)
As is often the case in programming, there are several ways to do a
given task. Sometimes multiple ones are equivalent, while sometimes
other considerations of a problem dictate which is preferable or
meaningfully more efficient. For example, in the above case the
values of n
themselves differ, which may or may not matter in an
actual programming case. Additionally, the range
arguments in the
first case clearly show the interval of printed values , while those in the latter clearly show how many times the loop
will run (50). Which is more useful or easier to think about likely
depends on the program and the programmer.
16.5. Sequences and recursion¶
As noted above, whenever we want to store mathematical sequences, we might first consider using arrays. Why is that? Because the sequences are ordered sets of values and they use indices (starting at 0), just like arrays---so it is a convenient match. And consider going through the sequence (or array), index-by-index (or element-by-element): what would be a convenient way to traverse the indices? How about a for-loop...
Actually, several of the above examples have already demonstrated the
application of for-loops and arrays with sequences. We introduced the
expressions in (3) and (3) by saying they were arrays,
but we could have just as easily said that they were sequences to be
saved as arrays. Even (2) can be viewed as a sequence for
:math:y_i
. It can be pretty hard to tell the two apart in
computing, actually.
But we should note that all the previous examples happened to be from a particular category of sequence, non-recursive ones. That is, no element dependended explicitly on any other one in the same sequence. However, it is certainly possible to represent recursive sequences in essentially the same manner, with one extra mathematical consideration to translate. For example, a rule like would be recursive: each element is half of the previous element.
A recursion rule states that a given element depends on one or more previous elements, which in turn depend on one or more previous ones, etc. This could go on forever, and we wouldn't be able to actually calculate any values unless we were provided with an extra piece of information: an explicit starting value (or a set of them). This is called having an initial condition. For example, consider the following sequence:
(5)¶
Note that it mostly looks the same as previous sequences above except that:
An extra expression is included, the initial condition (here, the [0]th element's value).
The range of indices in the general formula does not start at 0 (here, it starts at 1).
We can translate both of these aspects into the computational
representation, again following the mathematics veeery closely. For
point #1, we just have to assign a value to S[0]
separately from
the loop, and since it is called an initial condition, we might take
the hint that it should be done before the loop starts. For point
#2, we just have to make sure that the the values our loop variable
gets start with 1 instead of 0, which is controlled with our iterable
range---we can do that!
The remaining considerations remain the same: knowing the total number of elements and their dtype, initializing the array, translating the mathematical expression, rewriting mathematical subscripts as bracketed indices, etc.
Q: Have a try at translating the mathematical expression in Eq. (5) to Python. What does your final answer look like?
+ show/hide codeAfter making your own code, check how similar the above expression (and yours) is to the mathematical one above. Note how the new aspects translate directly---we really don't have any "extra considerations" to ponder here, we just let the mathematics guide us as usual. The above code outputs the following (and check that you have the same element values, to within reasonable floating point approximation):
+ show/hide code outputThis methodology generalizes for any numbers of elements that are specified by the initial conditions. We might just have more lines of initial conditions, and a new starting index. It is good to first make a mental note of whether your sequence is recursive or not. But in general if you first write out the full mathematical formulation for the sequence---including the range of indices and element type---that will clearly show how to do the coding. So be sure to do that, and make your life easier!
Note
Above, Python's choice of the number of decimals to specify for each element may seem odd; we discuss controlling this feature with "string formatting" in a later section.
Q: What would happen in the above calculation if we forgot to
follow the correct range range(1, N)
, instead using
range(N)
? Do we get an error, different results or no change?
Why?
Q: What are the first ten elements of the following sequence mentioned above , if ?
+ show/hide codeNote
16.6. Elementwise multiplication and dot products¶
Let's take two mathematical vectors of real numbers, each of length 12, i.e., . Consider performing elementwise multiplication to create a new vector , so its new components would be calculated as:
(6)¶
The new vector would necessarily have the same length as the other two, so .
We are now always on the lookout for indexed expressions with patterns that allow us to write them compactly. For the above, we might notice we could write the above rules compactly as:
(7)¶
Now that we have written it down mathematically and clearly, we should see a lot of similarity in structure with our previous examples.
Q: Translate Eq. (7) to Python, and calculate W
from:
U = np.array([5.3, 3.3, 2.3, 5. , 9. , 1.9, 2.9, 7.4, 3.2, 6.9, 0.6, 1. ])
V = np.array([9.1, 5.8, 6.3, 2. , 9.4, 4.2, 5.4, 5.5, 3. , 0.9, 8.3, 2.5])
+ show/hide code output
Let's now try taking the dot product of the two vectors. This procedure is described mathematically in index notation as:
(8)¶
The second line contains a compact way of writing the dot product, using the summation notation with . Note that the output in this case is a scalar, not a vector.
We see indices, but how can we translate Eq. (8)? Let's start
by reading the summation syntax: for i in range 0 through 11, sum up
the product of and . This expression sounds
and looks like it actually has several features that are similar to
Eqs. (6) and (7) above: we go through all indices
and calculate the elementwise products for each one.
But instead of storing the resulting products in separate [i]th
components, we add them all up and accumulate the result in D
.
So, if we think about forming our programmatic expression, we might
think of some pieces. We know we need to calculate U[i] * V[i]
for each index i, and also do something to store it each iteration;
here is a start to coding this:
1N = 12
2for i in range(N):
3 # zeroth attempt: a schematic or brainstorming idea
4 <somehow store U[i]*V[i] within D>
How can we accumulate the calculated products? As a first guess at
what to put there, we might write something like the following (NB:
since we are making a few attempts, we use D1
for try #1, D2
for try #2, etc. as the variable storing the summation, to be able to
compare results:
1N = 12
2for i in range(N):
3 # first attempt at accumulating products (spoiler alert: not successful)
4 D1 = U[i]*V[i]
5
6print("D1 is:", D1)
What is the output? D1 is: 2.5
. Does that seem correct? It
definitely seems a little smaller than what I would expect given the
numbers in the original arrays... In fact, the above code
overwrites D1
at each iteration, and we end up with D1
being
assigned the result of only U[11]*V[11]
, rather than accumulating
this sum with the other eleven elementwise products. To see this, you
could put a print(D)
command inside the loop, to see what
intermediate values of D1
really are---this is a useful debugging
approach.
Q: Add a print
inside the loop to debug what is happening in
this part of the code.
+ show/hide code output
The way forward is to use an operator that doesn't just assign the
value each time, but adds to the value already in D1
from the
previous operation. We have seen such an operator previously, namely the in-place addition +=
. Think
about what the following would do in Line 3 for each iteration:
1N = 12
2for i in range(N):
3 # second attempt at accumulating products (better)
4 D2+= U[i]*V[i]
It would first take the result U[0]*V[0]
and add it to D2
; in
the next iteration, it would take U[1]*V[1]
and add it to D2
;
etc. That is what a summation should do!
... except that, if we run the above code, we get the following error:
<ipython-input-46-20ecc4e1957a> in <module>
1 N = 12
2 for i in range(N):
----> 3 D2+= U[i]*V[i]
NameError: name 'D2' is not defined
... because in the first loop iteration, we tried to add to D2
,
even though it hadn't been defined or assigned any value yet.
Note
You might not see an error message, because you might have
D2
already used somewhere in your existing example code.
Even so, we need to properly define D2
for this
application here, because the it is accumulating value, not
being replaced/overwritten.
We must initialize D2
before we start our summation in the loop,
picking an appropriate type+value to start. What value would be
useful? Well, before we add U[0]*V[0]
, it D2
shouldn't have
any magnitude, so how about one more try:
1N = 12
2D3 = 0.0 # initialize our accumulator
3
4for i in range(N):
5 # third attempt: accumulating well, and D3 is initialized well :)
6 D3+= U[i]*V[i]
7print("D3 is:", D3)
At this point we should get the more reasonable estimate of D3
:
D3 is: 264.09
This might have been a lengthy discussion, but now we see how to
translate the math expression for summation (and in a compact way):
use a for-loop with the in-place summation +=
and an initial
condition like D = 0.0
.
Q: What would the following tweaked-version of the above code produce? What (bad thing) is happening?
1N = 12
2for i in range(N):
3 D4 = 0.0
4 D4+= U[i]*V[i]
5print("D4 is:", D4)
Q: What might be a better way to define N
in all of the
following summation code examples?
Note
A quick final note here: summations do not always just
increase with each iteration or have large values. Here,
all components of U
and V
were positive, so D
did increase with each product. But quite often,
components can be negative or zero, so the "accumulation" of
products can decrease or not increase the value of D
.
16.7. Practice¶
What are the first 31 multiples of 17 (not including 0)?
Make an array of 25 evenly spaced floats in the interval . Then print out a 3 column table of values: each element index, each element value, and the cube of each element value.
What is the sum of the first 31 multiples of 17 (not including 0)?
Let
c
be an array of 151 floating point values evenly spaced in the interval . Make an arrayz
where:(9)¶
where . Plot the result (i.e.,
z
vsc
).Translate to an array:
(10)¶
Translate to an array:
(11)¶
The following is a method for calculating the square root of a positive number S (known as the Babylonian method or Heron's method):
(12)¶
Use 15 iterations to estimate , and . Check the approximation by squaring the result. (NB: one can actually select to be any positive number; setting it to be closer to the actual square root accelerates the convergence. However, is fine for convergence here.)
- a) Write a code to calculate, store and print the first 20 numbers of the Fibonacci sequence, whose nth digit is the sum of the previous two. The first few numbers are:0, 1, 1, 2, 3, 5, 8, 13, ...b) What is the ratio of successive elements of the Fibonacci, with the larger number in the numerator (ignoring the divide-by-zero case)? In the limit of having an infinite number of terms, this value approaches and defines the Golden Ratio, .
Write a code block that asks the user to input an integer
n
and then draws a right angle isosceles triangle with asterisks ('*'
) of heightn
. For example, forn=4
, the output should look like:* ** *** ****
Make an array whose values are , for . Also make an array whose values are , over the same domain of . (Choose a number of elements per array that makes a pretty plot.)
Plot
P
vst
andQ
vst
on the same graph (label each).Plot:
P
vsQ
on a separate graph.
Consider the sequence defined by and . Ask the user for an integer , then calculate and print the first terms of the sequence in a simple table. For example, for we would get:
i x[i] --------- 0 2 1 3 2 5 3 9 4 17