14. Arrays, I, and indexing: 1D data¶
In this section we will be using a lot of the NumPy module, so it should be imported:
import numpy as np
14.1. Translating mathematical vectors to arrays¶
Up to this point, we have discussed storing just one mathematical value at a time in a variable. That is, we have translated scalar quantities to computational types like int, float, bool and complex. What about when we have more general algebraic forms like vectors, tensors and matrices? In this section, we discuss translating vectors into our codes, and will later generalize to the other cases.
In mathematics and physics books, there are lots of different notations for vectors. Consider an example case of a 3-dimensional (3D) velocity vector, which is made up of three scalar components. You might see a mathematical variable written in one of these ways to visually denote its "vectorness": , , or . And you might see any of the following subscript notations for writing the components that comprise it:
(For the moment, we are ignoring any distinction between row- and column-vectors, using what is called "ordered set" notation; we will be more specific later when we interact with matrices.) In the representations above, each ith component is referenced with an index as , but notice how there are several valid systems of subscripts accepted across math and physics. Within each, the allowed values of index i is determined by the initial index and the length of the vector.
An additional mathematical consideration is that we often explicitly reference the number set to which the vector components belong (though, sometimes physicists take this for granted, or assume it is implied!). For velocity, we might expect that component values would be real numbers, so we would write:
This notation shows both the number of components (3) and the kind of number each component is (real, ). In general, all components of a vector come from a single set, rather than being mixed (a tensor could have multiple sets, but that is beyond our present scope).
OK, so let us now translate this vector into Python. We will essentially have all of the same parts, just with some small differences in naming.
Firstly, the mathematical vector is stored as an
array on the computer, which we might choose to call v
.
Unlike in the math/physics cases, there is no bold font or arrow we
can use to visually mark it as a vector. So, we will just have to
remember it or perhaps reflect it in the variable name (e.g.,
varr
). If we are ever unsure, we can become certain by checking
the type of the variable. In Python, arrays are part of the NumPy
module, and the type is called np.ndarray
: the np.
is the
module abbreviation, and the "nd" before array stands for
"N-dimensional".
What are called "components" in a vector are called elements of
the array. And while we still use indices to refer to specific
elements, computing doesn't use subscripts so we put the index values
in square brackets v[...]
. Additionally, Python uses just one
indexing notation, to make life easier for both the interpreter and
the programmer. Python indices are integers starting from zero, which
is called zero-based counting. Therefore, we would denote and
access elements of the array v
as: v[0], v[1], v[2]
. (Note
that different programming languages have different counting systems;
C and C++ use zero-based counting, while Fortran, R and Matlab count
from one.)
And just as we specified the number of components and their type when creating the vector, so will we specify the length of the 1D array (its number of elements, often referred to as N) and the type of each element---and all elements of an array will have a single type, specifically called their datatype. This process is called declaring the array.
One additional consideration is that when we create an array, we must also give it values. In math, we could define an abstract vector and work with it without explicit values---not so in programming. However, we can change the element values later (keeping the same type), if we want. This process of specifying initial element values is called initializing the array.
The following table summarizes the translation from mathematical vectors to computational arrays:
Math name/aspect
Comp name/aspect
vector
array
component
element
index
index
index in subscript
index in square bracket
[ ]
x[17]
y
has length 9 and datatype int;y
has ints
Thus, the array notation closely follows that of the mathematics/physics communities, with some minor tweaks. And, importantly, much of the mathematics for computing with vectors will also translate directly into programs, as well.
Note
There is a small terminology difference with "dimension"
when referring to either a (math) vector or (comp) array.
Consider the vector from
above, which would get translated to an array y
of 9
ints. We would say that we have: a 9-dimensional vector,
and 1-dimensional array (with 9 elements). So, 9 is
always the number of components or elements, respectively,
but we tend to use the word "dimension" differently. We
don't know why.
14.2. Basic 1D array properties¶
There are many ways to generate 1D arrays and to assign values to them in Python. But in each case, we need to define the same set of fundamental properties from the outset:
the length (or number of elements), which is half of declaring the array;
the datatype (or dtype) of its elements of its elements, which is the other half of declaring the array;
some initial set of values for the elements, which may be altered later. This action is called initialization.
Again, all elements of a particular array must be of the same type---one cannot mix floats and ints within an array.
We saw above that the computational step of declaring the size and type of an array mirrors the mathematical process of defining a vector. When introducing a vector in a mathematical derivation, one would often describe it with something like:
; that is, is a vector of 24 integers,
; that is, a 3D vector of real numbers,
; that is, is a 10-dimensional vector of complex numbers,
etc.
Additionally, having to do the same in Python makes sense from a "computers are physical machines" sense. We want to tell Python how much space to allocate for the array: knowing both the number of spots and the element datatype are needed to determine the total amount of size. (Truth be told, in Python the size allocated per datatype can be flexible; in other languages it is much stricter.)
We now discuss ways of generating arrays.
14.3. np.array
: arrays of known values¶
Most basically, one can make an array directly using the
np.array()
function:
import numpy as np
v = np.array([4, 5, 6])
print("Here is my full array:", v)
print("Here is the first element of my array:", v[0])
(Note the required syntax of using of both parentheses (...)
and
square brackets [...]
here.) Then v
is an array containing
three elements, and that the [0]th value is 4. We can check the
array's fundamental properties of type and dtype. See the distinct
outputs of:
type(v)
and:
type(v[0])
Since we entered all integer values when using np.array
, Python
will guess that int is an appropriate dtype. Specifically, it uses
the particular variety of int called int64
in NumPy; the "64"
refers to the number of bits used to store each value.
Q: Above we only looked at a single element's dtype. Is this good enough, or should we check the same for each element?
+ show/hide responseThe other fundamental property of any 1D array, as mentioned above, is
its length. We get this from from Python's len()
function:
print("Length of v is:", len(v))
If we wish to change any value in the array, all we need to do is assign a new value to that particular element. For example, after:
v[1] = -100
print(v)
yields:
[ 4 -100 6]
What happens if we try to input, say, a float value into this already-declared array? Well, the result of the following:
v[2] = 19.9999
print(v)
... is:
[ 4 -100 19]
Again, this array will only hold int values, so Python converted
that new value to int before assigning it. Hence, the new value
actually input looks like the result of int(19.9999)
.
Q: What do you think the dtype of each element will be if we input the following array?
w = np.array([7, 8.5, 9])
Note
Above, we have often displayed the array by printing it.
There is no deep reason for this---it just looks nicer.
Consider the difference between displaying the short array
above directly with v
:
array([ 4, -100, 99])
... or with print(v)
:
[ 4 -100 99]
There is no difference in the values shown, just in aesthetics.
14.4. np.zeros
: arrays of all zeros¶
As another example of NumPy functionality to make arrays, we discuss
np.zeros()
. This function makes an array whose elements are each
zero of a given (constant) type.
Looking at the docstring at the top of the help for this function
(np.zeros?
):
1Docstring:
2zeros(shape, dtype=float, order='C')
3
4Return a new array of given shape and type, filled with zeros.
... we see that there is one required argument called shape
;
for a 1D array, this is just the number of elements it will have, or
its length. The kwarg dtype
controls the (data)type of the
elements, and we see by default the elements would have a float
type.
So if, we run the following:
np.zeros(5)
... we see the output array:
array([0., 0., 0., 0., 0.])
Indeed, that looks like 5 zeros, each of which are floats. To see some other cases if we change the number or dtype, consider these examples:
Z1 = np.zeros(7) # def: dtype=float
Z2 = np.zeros(4, dtype=bool)
Z3 = np.zeros(6, dtype=complex)
print(Z1)
print(Z2)
print(Z3)
... which respectively yield:
[0. 0. 0. 0. 0. 0. 0.]
[False False False False]
[ 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j 0.+0.j]
Sometimes, arrays of zeros are referred to as empty arrays. But just because they look "empty", that doesn't mean that they aren't useful! We will see later that this is a really useful way initialize arrays whose elements will be filled in afterwards, such as generating a sequence or obeying a formula.
Q: Check the datatype of elements in Z2
. And how would you
change its last value to to True
?
14.5. np.ones
: arrays of all ones¶
We briefly note that a similar function to np.zeros
exists, called
np.ones()
. You can read the helpfile with np.ones?
, but
essentially it mirrors np.zeros
in everything, except that it
initializes all array values to unity in that type. So the
following:
N1 = np.ones(8) # def: dtype=float
N2 = np.ones(4, dtype=bool)
N3 = np.ones(6, dtype=complex)
print(N1)
print(N2)
print(N3)
... produces:
[1. 1. 1. 1. 1. 1. 1. 1.]
[ True True True True]
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]
This will also be useful in cases later.
14.6. np.arange
: evenly spaced arrays¶
One particularly useful family of 1D arrays is those with evenly spaced elements. For example, when making plots, we might want evenly spaced values over part of the x-axis. In general, the functions to make these arrays require knowing the endpoints (start and stop) of a desired interval of values, and then either the spacing between elements (step) or the number of elements (num or N) in the array.
For np.arange()
, we use the start, stop and step size to specify
the array. Recalling Python helpfile syntax,
we can read how to use this function from the top of its help
(np.arange?
):
1Docstring:
2arange([start,] stop[, step,], dtype=None)
3
4Return evenly spaced values within a given interval.
5
6Values are generated within the half-open interval ``[start, stop)``
7(in other words, the interval including `start` but excluding `stop`).
8...
We see one required argument, the stop
value. Additionally, we
see that a half-open interval is used: the start value is
included in the interval, but the stop value isn't. Most Python
intervals are half-open, so we should get used to this formulation.
So how are start
and step
, which we see in square brackets,
specified? Reading down further in the np.arange
help, we
see some additional information on these parameters:
1Parameters
2----------
3start : number, optional
4 Start of interval. The interval includes this value. The default
5 start value is 0.
6stop : number
7 End of interval. The interval does not include this value, except
8 in some cases where `step` is not an integer and floating point
9 round-off affects the length of `out`.
10step : number, optional
11 Spacing between values. For any output `out`, this is the distance
12 between two adjacent values, ``out[i+1] - out[i]``. The default
13 step size is 1. If `step` is specified as a position argument,
14 `start` must also be given.
start
and step
are listed optional parameters; they have
default values of 0
and 1
, respectively. And indeed, when
arguments are shown in the help docstring surrounded by square
brackets, it means that we can provide one or both of them new values
as arguments-by-position, or we can choose to not specify them (and
they will keep their default values). Because they are optional
arguments by position, so they get determined by the number of
arguments provided: If we input one argument, that value is
interpreted as stop
; if we input two arguments, they are
interpreted as start
and stop
, in that order; if we input
three arguments, they are start
, stop
, and step
.
Let's look at a few usage cases to understand this better---for each example, we note what default value(s) will be used based on the number of inputs we have:
D1 = np.arange(10) # def: start = 0, step = 1
D2 = np.arange(-4, 8) # def: step = 1
D3 = np.arange(-15, 15, 3) # no default
D4 = np.arange(start=-15, stop=15, step=3) # no default, all keyword args
Looking at how D1
is made, we have specified 10
to be the stop
and used defaults for the other values, so the array is created by
numbers in half-open interval , traversed in steps of
size 1. The syntax of D4
just shows that these values can also be
specified as keywords, funnily enough. The above assignments yield the
following, respectively, when printed:
[0 1 2 3 4 5 6 7 8 9]
[-4 -3 -2 -1 0 1 2 3 4 5 6 7]
[-15 -12 -9 -6 -3 0 3 6 9 12]
[-15 -12 -9 -6 -3 0 3 6 9 12]
Readers can verify that the expected values and number of elements
occur in each case. We highlight, however, that the single argument
example to make D1
shows that np.arange(N)
is a very efficient
way to generate the first N
integers, which might come in useful
later...
Another thing to notice is the length of array in each case, and how it relates to the start, stop and step. In any of the above cases, the number of elements N has the following relation:
(1)¶
... with the caveat that if step were not an exact factor of the
interval range, then the actual number of elements would be the
ceiling of N, which could be calculated as int(np.ceil(N))
.
Additionally, N
cannot be negative, instead having a minimum value
of zero.
Finally, we note it is possible to have a negative step. In that
case one would like want to have start > step
. Consider:
X = np.arange(5, -5, -1)
... for which print(X)
displays:
[ 5 4 3 2 1 0 -1 -2 -3 -4]
Q: How would you use np.arange
to make an array of values
in the interval with steps of one half?
Q: How would you use np.arange
to make an array of the
first 20 even numbers (including zero)?
14.7. np.linspace
: more evenly spaced arrays¶
The function np.linspace()
has a lot of similarity to
np.arange()
:
It also outputs an array of evenly spaced values.
When we use it, we specify
start
andstop
values, and it has a close relation among these andstep
andN
.
However, there are two important differences:
When we specify the range of values, we use a closed interval interval:
[start, stop]
. This is one of the few instances of using a fully closed (rather than half-open) interval in Python.When creating the array, we select the total number of elements
N
it will have (rather thanstep
size). Thestep
in this case will be calculated internally based on the following relation:(2)¶
The above relation always holds (to within floating point precision/rounding error) since the step is not constrained to be an int. And having a negative
step
is allowed, for example ifstart > stop
.
Q: Look at the help for np.linspace
. What is the default
number of elements the array will have? And are there any
constraints on the number of elements? (NB: the keyword/option is
num
, instead of the N
we have referred to above.)
Looking at the helpfile with np.linspace?
, we see that exactly two
arguments are required: start
and stop
. Let's try a couple
cases:
L1 = np.linspace(10, 20) # use default num value
L2 = np.linspace(-2, 2, num=11) # specify num
Printing the above arrays yields the following, respectively:
[10. 10.20408163 10.40816327 10.6122449 10.81632653 11.02040816
11.2244898 11.42857143 11.63265306 11.83673469 12.04081633 12.24489796
12.44897959 12.65306122 12.85714286 13.06122449 13.26530612 13.46938776
13.67346939 13.87755102 14.08163265 14.28571429 14.48979592 14.69387755
14.89795918 15.10204082 15.30612245 15.51020408 15.71428571 15.91836735
16.12244898 16.32653061 16.53061224 16.73469388 16.93877551 17.14285714
17.34693878 17.55102041 17.75510204 17.95918367 18.16326531 18.36734694
18.57142857 18.7755102 18.97959184 19.18367347 19.3877551 19.59183673
19.79591837 20. ]
[-2. -1.6 -1.2 -0.8 -0.4 0. 0.4 0.8 1.2 1.6 2. ]
You can verify the length of each array is as it should be. Looking
at the outputs, we might notice that the L1
elements look pretty
"messy": they each have a looooot of significant digits. In contrast,
the L2
elements appear to have "cleaner" steps, with fewer
decimals. While this may not really matter (numbers are just
numbers), in some cases one might prefer "cleaner" set array elements.
We actually can control this, if we want, by choosing num
carefully, so that step
is a "rounder" number.
Q: Looking at the formula for step
in
Eq. (2), can you suggest a number close to 50
that might provide "rounder" step sizes?
Q: We can eyeball the step values in some cases. But a better way would be to get the step from the array data itself. How could we do this?
+ show/hide codeQ: Use the approach from your previous question to calculate the actual
step size for L2
. Are you surprised by it? What is happening?
The reader can decide on preferred properties and step sizes when creating arrays.
14.8. Final comment¶
We will use arrays a lot in programming, and we will explore more general shapes and useful properties that they have. We have already seen how they play the role of translating vectors into computing.
Arrays are one example of an ordered collection in Python. That
is, they store many things (of a particular type), and we access the
elements through their indices (which store the order). Additionally,
arrays are mutable, which means we can reassign their element
values (e.g., if x
is an array of all zeros, we might write x[4]
= -100
).
We will meet other collections later on, both ordered and unordered, as well as mutable and immutable. Some of these properties will dictate what kinds of collections we choose to use in various instances.
In summary:
Arrays are useful objects which help us store ordered collections of numerical values. In Python, we often use the NumPy module when creating and using them.
Using arrays, we can translate mathematical operations for essentially any expression that uses subscripts or indices. This includes sequences, vectors, time series, coordinate pairs in plots, and much, much more.
Arrays are mutable, meaning we can change the values within an existing array.
The (data)type of an array is both constant and fixed. Thus, all elements have the same datatype (like int, or float, etc.), and we cannot change the type of elements once we have created an array.
Array values can be initialized in many ways:
directly with
np.array()
, which can be useful for short arrays (or when converting from other Python collection types)with
np.zeros()
ornp.ones()
, which are helpful for initializing arrays whose values we will likely change within the code.with
np.arange()
ornp.linspace()
, which created sequences of evenly spaced numbers within an interval, and will be especially useful when plotting (making x-axis values) or creating the domain of some input function.
14.9. Practice 1¶
Make the following arrays with (your choice of) NumPy functions:
.
.
An array
A
of integers in the range .An array
xint
of integers in the range .An array spanning the range , with values evenly spaced by 0.2.
An array
xfl
of 81 evenly spaced values in the range .
Answer each of the following:
Let
arr1 = np.arange(10)
. What is the type ofarr1[2]
?Let
arr2 = np.zeros(100, dtype=float)
. What is the type ofarr2
?Let
arr3 = np.arange(15, dtype=float)
. What is length ofarr3
?Let
arr4 = np.ones(19, dtype=bool)
. What is value ofarr4[3]
?Let
arr5 = np.zeros(5, dtype=int)
. What is value ofarr5[5]
?Let
arr6 = np.arange(-3, 3, 2, dtype=int)
. What is value ofarr6[-1]
? What is the length ofarr6
?Let
arr7 = np.linspace(-3, 3, 2, dtype=int)
. What is value ofarr7[-1]
? What is the length ofarr7
?Let
arr8 = np.linspace(-5, 5, 11)
. What is the length ofarr8[3:6]
? What is the length ofarr8[:6]
? What is the length ofarr8[6:]
? What is the length ofarr8[:]
?Let
arr9 = np.arange(0, 20, 2)
. What is the value and type ofarr9[:5]
?
Fix each of the following expressions based on the shown error message:
B = np.arange(-5, 5, 11, type=int)