14. Arrays, I, and indexing: 1D data

In this section we will be using a lot of the NumPy module, so it should be imported:

import numpy as np

14.1. Translating mathematical vectors to arrays

Up to this point, we have discussed storing just one mathematical value at a time in a variable. That is, we have translated scalar quantities to computational types like int, float, bool and complex. What about when we have more general algebraic forms like vectors, tensors and matrices? In this section, we discuss translating vectors into our codes, and will later generalize to the other cases.

In mathematics and physics books, there are lots of different notations for vectors. Consider an example case of a 3-dimensional (3D) velocity vector, which is made up of three scalar components. You might see a mathematical variable written in one of these ways to visually denote its "vectorness": \vec{v}, \overline{v}, \underline{v} or \mathbf{v}. And you might see any of the following subscript notations for writing the components that comprise it:

\mathbf{v} &= ( v_x, v_y, v_z ) \\
\mathbf{v} &= ( v_1, v_2, v_3 ) \\
\mathbf{v} &= ( v_0, v_1, v_2 )

(For the moment, we are ignoring any distinction between row- and column-vectors, using what is called "ordered set" notation; we will be more specific later when we interact with matrices.) In the representations above, each ith component is referenced with an index as v_i, but notice how there are several valid systems of subscripts accepted across math and physics. Within each, the allowed values of index i is determined by the initial index and the length of the vector.

An additional mathematical consideration is that we often explicitly reference the number set to which the vector components belong (though, sometimes physicists take this for granted, or assume it is implied!). For velocity, we might expect that component values would be real numbers, so we would write:

\mathbf{v} \in \mathbb{R}^3

This notation shows both the number of components (3) and the kind of number each component is (real, \mathbb{R}). In general, all components of a vector come from a single set, rather than being mixed (a tensor could have multiple sets, but that is beyond our present scope).

OK, so let us now translate this vector into Python. We will essentially have all of the same parts, just with some small differences in naming.

Firstly, the mathematical vector \textbf{v} is stored as an array on the computer, which we might choose to call v. Unlike in the math/physics cases, there is no bold font or arrow we can use to visually mark it as a vector. So, we will just have to remember it or perhaps reflect it in the variable name (e.g., varr). If we are ever unsure, we can become certain by checking the type of the variable. In Python, arrays are part of the NumPy module, and the type is called np.ndarray: the np. is the module abbreviation, and the "nd" before array stands for "N-dimensional".

What are called "components" in a vector are called elements of the array. And while we still use indices to refer to specific elements, computing doesn't use subscripts so we put the index values in square brackets v[...]. Additionally, Python uses just one indexing notation, to make life easier for both the interpreter and the programmer. Python indices are integers starting from zero, which is called zero-based counting. Therefore, we would denote and access elements of the array v as: v[0], v[1], v[2]. (Note that different programming languages have different counting systems; C and C++ use zero-based counting, while Fortran, R and Matlab count from one.)

And just as we specified the number of components and their type when creating the vector, so will we specify the length of the 1D array (its number of elements, often referred to as N) and the type of each element---and all elements of an array will have a single type, specifically called their datatype. This process is called declaring the array.

One additional consideration is that when we create an array, we must also give it values. In math, we could define an abstract vector and work with it without explicit values---not so in programming. However, we can change the element values later (keeping the same type), if we want. This process of specifying initial element values is called initializing the array.

The following table summarizes the translation from mathematical vectors to computational arrays:

Math name/aspect

\rightarrow

Comp name/aspect

vector

\rightarrow

array

component

\rightarrow

element

index

\rightarrow

index

index in subscript

\rightarrow

index in square bracket [ ]

~~\displaystyle x_{17}

\rightarrow

x[17]

\mathbf{y} \in \mathbb{Z}^9

\rightarrow

y has length 9 and datatype int; y has N=9 ints

Thus, the array notation closely follows that of the mathematics/physics communities, with some minor tweaks. And, importantly, much of the mathematics for computing with vectors will also translate directly into programs, as well.

Note

There is a small terminology difference with "dimension" when referring to either a (math) vector or (comp) array. Consider the vector \mathbf{y} \in \mathbb{Z}^9 from above, which would get translated to an array y of 9 ints. We would say that we have: a 9-dimensional vector, and 1-dimensional array (with 9 elements). So, 9 is always the number of components or elements, respectively, but we tend to use the word "dimension" differently. We don't know why.

14.2. Basic 1D array properties

There are many ways to generate 1D arrays and to assign values to them in Python. But in each case, we need to define the same set of fundamental properties from the outset:

  1. the length (or number of elements), which is half of declaring the array;

  2. the datatype (or dtype) of its elements of its elements, which is the other half of declaring the array;

  3. some initial set of values for the elements, which may be altered later. This action is called initialization.

Again, all elements of a particular array must be of the same type---one cannot mix floats and ints within an array.

We saw above that the computational step of declaring the size and type of an array mirrors the mathematical process of defining a vector. When introducing a vector in a mathematical derivation, one would often describe it with something like:

  • \mathbf{g} \in \mathbb{Z}^{24}; that is, \mathbf{g} is a vector of 24 integers,

  • \mathbf{v} \in \mathbb{R}^3; that is, \mathbf{v} a 3D vector of real numbers,

  • \mathbf{c} \in \mathbb{C}^{10}; that is, \mathbf{c} is a 10-dimensional vector of complex numbers,

etc.

Additionally, having to do the same in Python makes sense from a "computers are physical machines" sense. We want to tell Python how much space to allocate for the array: knowing both the number of spots and the element datatype are needed to determine the total amount of size. (Truth be told, in Python the size allocated per datatype can be flexible; in other languages it is much stricter.)

We now discuss ways of generating arrays.

14.3. np.array: arrays of known values

Most basically, one can make an array directly using the np.array() function:

import numpy as np

v = np.array([4, 5, 6])
print("Here is my full array:", v)
print("Here is the first element of my array:", v[0])

(Note the required syntax of using of both parentheses (...) and square brackets [...] here.) Then v is an array containing three elements, and that the [0]th value is 4. We can check the array's fundamental properties of type and dtype. See the distinct outputs of:

type(v)

and:

type(v[0])

Since we entered all integer values when using np.array, Python will guess that int is an appropriate dtype. Specifically, it uses the particular variety of int called int64 in NumPy; the "64" refers to the number of bits used to store each value.

Q: Above we only looked at a single element's dtype. Is this good enough, or should we check the same for each element?

+ show/hide response

The other fundamental property of any 1D array, as mentioned above, is its length. We get this from from Python's len() function:

print("Length of v is:", len(v))

If we wish to change any value in the array, all we need to do is assign a new value to that particular element. For example, after:

v[1] = -100

print(v) yields:

[   4 -100    6]

What happens if we try to input, say, a float value into this already-declared array? Well, the result of the following:

v[2] = 19.9999
print(v)

... is:

[   4 -100   19]

Again, this array will only hold int values, so Python converted that new value to int before assigning it. Hence, the new value actually input looks like the result of int(19.9999).

Q: What do you think the dtype of each element will be if we input the following array?

w = np.array([7, 8.5, 9])
+ show/hide response

Note

Above, we have often displayed the array by printing it. There is no deep reason for this---it just looks nicer. Consider the difference between displaying the short array above directly with v:

array([   4, -100,   99])

... or with print(v):

[   4 -100   99]

There is no difference in the values shown, just in aesthetics.

14.4. np.zeros: arrays of all zeros

As another example of NumPy functionality to make arrays, we discuss np.zeros(). This function makes an array whose elements are each zero of a given (constant) type.

Looking at the docstring at the top of the help for this function (np.zeros?):

1Docstring:
2zeros(shape, dtype=float, order='C')
3
4Return a new array of given shape and type, filled with zeros.

... we see that there is one required argument called shape; for a 1D array, this is just the number of elements it will have, or its length. The kwarg dtype controls the (data)type of the elements, and we see by default the elements would have a float type.

So if, we run the following:

np.zeros(5)

... we see the output array:

array([0., 0., 0., 0., 0.])

Indeed, that looks like 5 zeros, each of which are floats. To see some other cases if we change the number or dtype, consider these examples:

Z1 = np.zeros(7)                  # def: dtype=float
Z2 = np.zeros(4, dtype=bool)
Z3 = np.zeros(6, dtype=complex)

print(Z1)
print(Z2)
print(Z3)

... which respectively yield:

[0. 0. 0. 0. 0. 0. 0.]
[False False False False]
[ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]

Sometimes, arrays of zeros are referred to as empty arrays. But just because they look "empty", that doesn't mean that they aren't useful! We will see later that this is a really useful way initialize arrays whose elements will be filled in afterwards, such as generating a sequence or obeying a formula.

Q: Check the datatype of elements in Z2. And how would you change its last value to to True?

+ show/hide code

14.5. np.ones: arrays of all ones

We briefly note that a similar function to np.zeros exists, called np.ones(). You can read the helpfile with np.ones?, but essentially it mirrors np.zeros in everything, except that it initializes all array values to unity in that type. So the following:

N1 = np.ones(8)                   # def: dtype=float
N2 = np.ones(4, dtype=bool)
N3 = np.ones(6, dtype=complex)

print(N1)
print(N2)
print(N3)

... produces:

[1. 1. 1. 1. 1. 1. 1. 1.]
[ True  True  True  True]
[1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j 1.+0.j]

This will also be useful in cases later.

14.6. np.arange: evenly spaced arrays

One particularly useful family of 1D arrays is those with evenly spaced elements. For example, when making plots, we might want evenly spaced values over part of the x-axis. In general, the functions to make these arrays require knowing the endpoints (start and stop) of a desired interval of values, and then either the spacing between elements (step) or the number of elements (num or N) in the array.

For np.arange(), we use the start, stop and step size to specify the array. Recalling Python helpfile syntax, we can read how to use this function from the top of its help (np.arange?):

1Docstring:
2arange([start,] stop[, step,], dtype=None)
3
4Return evenly spaced values within a given interval.
5
6Values are generated within the half-open interval ``[start, stop)``
7(in other words, the interval including `start` but excluding `stop`).
8...

We see one required argument, the stop value. Additionally, we see that a half-open interval is used: the start value is included in the interval, but the stop value isn't. Most Python intervals are half-open, so we should get used to this formulation.

So how are start and step, which we see in square brackets, specified? Reading down further in the np.arange help, we see some additional information on these parameters:

 1Parameters
 2----------
 3start : number, optional
 4    Start of interval.  The interval includes this value.  The default
 5    start value is 0.
 6stop : number
 7    End of interval.  The interval does not include this value, except
 8    in some cases where `step` is not an integer and floating point
 9    round-off affects the length of `out`.
10step : number, optional
11    Spacing between values.  For any output `out`, this is the distance
12    between two adjacent values, ``out[i+1] - out[i]``.  The default
13    step size is 1.  If `step` is specified as a position argument,
14    `start` must also be given.

start and step are listed optional parameters; they have default values of 0 and 1, respectively. And indeed, when arguments are shown in the help docstring surrounded by square brackets, it means that we can provide one or both of them new values as arguments-by-position, or we can choose to not specify them (and they will keep their default values). Because they are optional arguments by position, so they get determined by the number of arguments provided: If we input one argument, that value is interpreted as stop; if we input two arguments, they are interpreted as start and stop, in that order; if we input three arguments, they are start, stop, and step.

Let's look at a few usage cases to understand this better---for each example, we note what default value(s) will be used based on the number of inputs we have:

D1 = np.arange(10)                          # def: start = 0, step = 1
D2 = np.arange(-4, 8)                       # def: step = 1
D3 = np.arange(-15, 15, 3)                  # no default
D4 = np.arange(start=-15, stop=15, step=3)  # no default, all keyword args

Looking at how D1 is made, we have specified 10 to be the stop and used defaults for the other values, so the array is created by numbers in half-open interval [0, 10), traversed in steps of size 1. The syntax of D4 just shows that these values can also be specified as keywords, funnily enough. The above assignments yield the following, respectively, when printed:

[0 1 2 3 4 5 6 7 8 9]
[-4 -3 -2 -1  0  1  2  3  4  5  6  7]
[-15 -12  -9  -6  -3   0   3   6   9  12]
[-15 -12  -9  -6  -3   0   3   6   9  12]

Readers can verify that the expected values and number of elements occur in each case. We highlight, however, that the single argument example to make D1 shows that np.arange(N) is a very efficient way to generate the first N integers, which might come in useful later...

Another thing to notice is the length of array in each case, and how it relates to the start, stop and step. In any of the above cases, the number of elements N has the following relation:

(1)\mbox{\ttfamily N = (stop - start)/step\,,}

... with the caveat that if step were not an exact factor of the interval range, then the actual number of elements would be the ceiling of N, which could be calculated as int(np.ceil(N)). Additionally, N cannot be negative, instead having a minimum value of zero.

Finally, we note it is possible to have a negative step. In that case one would like want to have start > step. Consider:

X = np.arange(5, -5, -1)

... for which print(X) displays:

[ 5  4  3  2  1  0 -1 -2 -3 -4]

Q: How would you use np.arange to make an array of values in the interval [-3, 3) with steps of one half?

+ show/hide code

Q: How would you use np.arange to make an array of the first 20 even numbers (including zero)?

+ show/hide code

14.7. np.linspace: more evenly spaced arrays

The function np.linspace() has a lot of similarity to np.arange():

  • It also outputs an array of evenly spaced values.

  • When we use it, we specify start and stop values, and it has a close relation among these and step and N.

However, there are two important differences:

  • When we specify the range of values, we use a closed interval interval: [start, stop]. This is one of the few instances of using a fully closed (rather than half-open) interval in Python.

  • When creating the array, we select the total number of elements N it will have (rather than step size). The step in this case will be calculated internally based on the following relation:

    (2)\mbox{\ttfamily step = (stop - start)/(N - 1)\,,}

    The above relation always holds (to within floating point precision/rounding error) since the step is not constrained to be an int. And having a negative step is allowed, for example if start > stop.

Q: Look at the help for np.linspace. What is the default number of elements the array will have? And are there any constraints on the number of elements? (NB: the keyword/option is num, instead of the N we have referred to above.)

+ show/hide response

Looking at the helpfile with np.linspace?, we see that exactly two arguments are required: start and stop. Let's try a couple cases:

L1 = np.linspace(10, 20)          # use default num value
L2 = np.linspace(-2, 2, num=11)   # specify num

Printing the above arrays yields the following, respectively:

[10.         10.20408163 10.40816327 10.6122449  10.81632653 11.02040816
 11.2244898  11.42857143 11.63265306 11.83673469 12.04081633 12.24489796
 12.44897959 12.65306122 12.85714286 13.06122449 13.26530612 13.46938776
 13.67346939 13.87755102 14.08163265 14.28571429 14.48979592 14.69387755
 14.89795918 15.10204082 15.30612245 15.51020408 15.71428571 15.91836735
 16.12244898 16.32653061 16.53061224 16.73469388 16.93877551 17.14285714
 17.34693878 17.55102041 17.75510204 17.95918367 18.16326531 18.36734694
 18.57142857 18.7755102  18.97959184 19.18367347 19.3877551  19.59183673
 19.79591837 20.        ]

[-2.  -1.6 -1.2 -0.8 -0.4  0.   0.4  0.8  1.2  1.6  2. ]

You can verify the length of each array is as it should be. Looking at the outputs, we might notice that the L1 elements look pretty "messy": they each have a looooot of significant digits. In contrast, the L2 elements appear to have "cleaner" steps, with fewer decimals. While this may not really matter (numbers are just numbers), in some cases one might prefer "cleaner" set array elements. We actually can control this, if we want, by choosing num carefully, so that step is a "rounder" number.

Q: Looking at the formula for step in Eq. (2), can you suggest a number close to 50 that might provide "rounder" step sizes?

+ show/hide value response

Q: We can eyeball the step values in some cases. But a better way would be to get the step from the array data itself. How could we do this?

+ show/hide code

Q: Use the approach from your previous question to calculate the actual step size for L2. Are you surprised by it? What is happening?

+ show/hide response

The reader can decide on preferred properties and step sizes when creating arrays.

14.8. Final comment

We will use arrays a lot in programming, and we will explore more general shapes and useful properties that they have. We have already seen how they play the role of translating vectors into computing.

Arrays are one example of an ordered collection in Python. That is, they store many things (of a particular type), and we access the elements through their indices (which store the order). Additionally, arrays are mutable, which means we can reassign their element values (e.g., if x is an array of all zeros, we might write x[4] = -100).

We will meet other collections later on, both ordered and unordered, as well as mutable and immutable. Some of these properties will dictate what kinds of collections we choose to use in various instances.

In summary:

  • Arrays are useful objects which help us store ordered collections of numerical values. In Python, we often use the NumPy module when creating and using them.

  • Using arrays, we can translate mathematical operations for essentially any expression that uses subscripts or indices. This includes sequences, vectors, time series, coordinate pairs in plots, and much, much more.

  • Arrays are mutable, meaning we can change the values within an existing array.

  • The (data)type of an array is both constant and fixed. Thus, all elements have the same datatype (like int, or float, etc.), and we cannot change the type of elements once we have created an array.

  • Array values can be initialized in many ways:

    • directly with np.array(), which can be useful for short arrays (or when converting from other Python collection types)

    • with np.zeros() or np.ones(), which are helpful for initializing arrays whose values we will likely change within the code.

    • with np.arange() or np.linspace(), which created sequences of evenly spaced numbers within an interval, and will be especially useful when plotting (making x-axis values) or creating the domain of some input function.

14.9. Practice 1

  1. Make the following arrays with (your choice of) NumPy functions:

    1. \displaystyle \textbf{v} = (-1.1, 5, 7.6).

    2. \displaystyle \textbf{u} = (2, 4, 6).

    3. An array A of integers in the range [0, 11].

    4. An array xint of integers in the range [-15, 15].

    5. An array spanning the range [-15, 15], with values evenly spaced by 0.2.

    6. An array xfl of 81 evenly spaced values in the range [0, 20].

  2. Answer each of the following:

    1. Let arr1 = np.arange(10). What is the type of arr1[2]?

    2. Let arr2 = np.zeros(100, dtype=float). What is the type of arr2?

    3. Let arr3 = np.arange(15, dtype=float). What is length of arr3?

    4. Let arr4 = np.ones(19, dtype=bool). What is value of arr4[3]?

    5. Let arr5 = np.zeros(5, dtype=int). What is value of arr5[5]?

    6. Let arr6 = np.arange(-3, 3, 2, dtype=int). What is value of arr6[-1]? What is the length of arr6?

    7. Let arr7 = np.linspace(-3, 3, 2, dtype=int). What is value of arr7[-1]? What is the length of arr7?

    8. Let arr8 = np.linspace(-5, 5, 11). What is the length of arr8[3:6]? What is the length of arr8[:6]? What is the length of arr8[6:]? What is the length of arr8[:]?

    9. Let arr9 = np.arange(0, 20, 2). What is the value and type of arr9[:5]?

  3. Fix each of the following expressions based on the shown error message:

    1. B = np.arange(-5, 5, 11, type=int)