Arrays, II ======================================= .. contents:: :local: .. highlight:: python In this section, we will assume we did first import ``numpy`` using its standard abbreviation ``np``: .. code:: python import numpy as np Index notation ---------------- The index notation from translating mathematical vectors to arrays carries directly to translating matrices. In computing, these are just referred to as multidimensional arrays, or more commonly just "arrays" (in actual fact, a multidimensional array is just an array whose elements are also arrays, an "array of arrays", but that normally is not relevant when using them). All the above notations and terms apply. Matrices typically have multiple dimensions and therefore multiple indices; these also translate directly from subscripts in the mathematics to square brackets in computing. For example, the element in the first row and column of a 2-dimensional matrix :math:`\mathbf{A}` would be ``A[0, 0]``; the next element in the column would be ``A[1, 0]``; etc. .. NTS: this section came from the first arrays page: is for higher dimensional cases One can make 2D (or higher dimensional) arrays in a similar way, explicitly stating each element one-by-one. For example:: sigma2 = np.array([[0, -1j], [1j, 0]]) (Firstly, comparing the use of square brackets with the 1D examples above might explain why multidimensional arrays are called "arrays of arrays".) In this case the entered numbers appear to be from different sets (complex and integer), but as noted above, arrays have only a single dtype. ``type(sigma2[0, 0])`` shows that the chosen type is ``complex``, the more "complicated" of the types. This is verified by checking the value of ``sigma2[0,0]``, which is ``0.+0.j``. To check the dimensions of the array, we can use NumPy's "shape()" function:: print("The shape of sigma2 is:", np.shape(sigma2)) Note that when np.shape() returns the shape of a multidimensional array, it does so as a collection of numbers in parentheses. This is actually a new Python type that will be discussed later, but it is a good to think of the several numbers describing shape-- e.g., here, ``(2, 2)``-- as a single object. As knowing the shape of an array is a key part of using it, we will make much use of this function (and of "len()", which always returns a single value int). .. NTS: more from the arrays, I, discussion to be moved here: In specifying a multidimensional array, the shape is a collection of numbers, and as noted above, we can group them in parentheses to pass them as a single argument:: Z4 = np.zeros((5,2)) # def: dtype=float Z5 = np.zeros((3,4), dtype=bool) yielding:: [[ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.] [ 0. 0.]] [[False False False False] [False False False False] [False False False False]] If we had tried to pass the two dimensions without the parentheses, such as:: Z4 = np.zeros(5,2) then Python would interpret this as passing two separate arguments, the first a 1D shape and the second a dtype; since ``2`` is not a known type, the following error message would occur:: TypeError: data type not understood Matrices with more than two dimension can be created in the same manner. Since we don't have higher dimensional screens, though, they get displayed to the user as 2D "slices" of the full array:: Z6 = np.zeros((2,2,3), dtype=int) yields:: [[[0 0 0] [0 0 0]] [[0 0 0] [0 0 0]]] .. NTS : vectorized operations moved here **Numerical operations on arrays:** All arithmetic operates elementwise:: import numpy as np v = np.array([4, 2, 5, 9, 0]) print("v + 1 = ", v + 1) # Output: v + 1 = [ 5 3 6 10 1] print("2**v = ", 2**v) # Output: 2**v = [ 16 4 32 512 1] u = np.array([0, 1, 2, 3, 4]) print("u + v = ", u + v) # Output: u + v = [ 4 3 7 12 4] print("u * v = ", u * v) # Output: u * v = [ 0 2 10 27 0] As noticed from the example above, **array multiplication is not a dot product**. The later would be obtained by the following:: print("u.dot(v) = ", u.dot(v)) # Output: u.dot(v) = 39 or:: print("dot(u,v) = ", np.dot(u,v)) # Output: dot(u,v) = 39 Creating arrays with known values ----------------------------------- In the previous sections, we have introduced various ``numpy`` functions for easily creating special types of vectors/matrices: .. code:: python np.zeros((3,4)) .. parsed-literal:: array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) .. code:: python np.ones((2,3)) .. parsed-literal:: array([[ 1., 1., 1.], [ 1., 1., 1.]]) .. code:: python v3 = np.arange(10) #Output: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) v4 = np.linspace(-2, 2, 5) #Output: array([-2., -1., 0., 1., 2.]) We have also introduced list comprehensions and it is important to note that an array can be created by simply sending the resulting list to the ``array`` constructor: .. code:: python squares = np.array([n**n for n in range(6)]) print(squares) .. parsed-literal:: array([[ 0., 0., 0., 0.], [ 0., 0., 0., 0.], [ 0., 0., 0., 0.]]) array([ 0, 1, 4, 9, 16, 25]) Python offers more functions to create arrays. np.full Creates an array of the given shape initialized with the given value. Here's a 2x3 array full of π. .. code:: python np.full((2,3), np.pi) .. parsed-literal:: array([[ 3.14159265, 3.14159265, 3.14159265], [ 3.14159265, 3.14159265, 3.14159265]]) Creating arrays with random values ----------------------------------- Sometimes, we need to create arrays initialized with random values. Many functions are available in NumPy's random module to create such arrays. We'll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. .. code:: python a1 = np.random.randint(20, size=6) # One-dimensional array a2 = np.random.randint(20, size=(3, 4)) # Two-dimensional array a3 = np.random.randint(20, size=(3, 4, 5)) # Three-dimensional array Run these lines several times and look at the resulting arrays. You will realize that each time you run the code, different values are generated. In order to ensure that the same random arrays are generated each time this code is run, we'll use NumPy's random number generator, which we will *seed* with a set value before the random arrays creation : .. code:: python np.random.seed(0) # seed for reproducibility Furthermore, it is possible to create a random array by sampling from a specific distribution. For example, here is a 3x4 array initialized with random floats between 0 and 1 (uniform distribution): .. code:: python np.random.rand(3,4) .. parsed-literal:: array([[ 0.15802446, 0.43477402, 0.81614133, 0.62811013], [ 0.57390644, 0.69407189, 0.89299862, 0.58584783], [ 0.25014968, 0.10522317, 0.36393147, 0.72826021]]) Here's a 3x4 array containing random floats sampled from a univariate normal distribution (Gaussian distribution) of mean 0 and variance 1: .. code:: python np.random.randn(3,4) .. parsed-literal:: array([[-0.37698199, -0.71092541, 0.77823299, 1.8110648 ], [ 0.1160549 , -0.21966641, -0.31370215, -0.33533435], [ 1.38330925, 0.29539509, 0.28905152, 0.1313947 ]]) To give you a feel of what these distributions look like, let's use the matplotlib module: .. code:: python import matplotlib.pyplot as plt .. code:: python plt.hist(np.random.rand(100000), normed=True, bins=100, histtype="step", color="blue", label="rand") plt.hist(np.random.randn(100000), normed=True, bins=100, histtype="step", color="red", label="randn") plt.axis([-2.5, 2.5, 0, 1.1]) plt.legend(loc = "upper left") plt.title("Random distributions") plt.xlabel("Value") plt.ylabel("Density") plt.show() .. image:: media/RandomArraysDistribution.png Attributes of arrays ----------------------------------- Each array has attributes ``ndim`` (the number of dimensions), ``shape`` (the size of each dimension), and ``size`` (the total size of the array): .. code:: python np.random.seed(0) a = np.random.randint(10, size=(4, 3)) .. code:: python print("a: ", a) print("a ndim: ", a.ndim) print("a shape:", a.shape) print("a size: ", a.size) .. parsed-literal:: a: [[5 0 3] [3 7 9] [3 5 2] [4 7 6]] a ndim: 2 a shape: (4, 3) a size: 12 Another useful attribute is the ``dtype``, the data type of the array (which we discussed previously in `Arrays, I: Creating arrays`): .. code:: python print("dtype:", a.dtype) .. parsed-literal:: dtype: int64 Accessing single elements ---------------------------------------------------- In a one-dimensional array, a single element is accessed by its index as discussed in the section `Arrays, I`. In a multi-dimensional array, items can be accessed using a comma-separated sequence of indices: .. code:: python np.random.seed(0) a = np.random.randint(10, size=(4, 3)) .. parsed-literal:: array([[5 0 3], [3 7 9], [3 5 2], [4 7 6]]) .. code:: python a[0, 0] .. parsed-literal:: 5 .. code:: python a[2, 0] .. parsed-literal:: 3 .. code:: python a[2, -1] .. parsed-literal:: 2 Values can also be modified using any of the above index notation: .. code:: python a[0, 0] = 8 .. parsed-literal:: array([[8 0 3], [3 7 9], [3 5 2], [4 7 6]]) Keep in mind that, NumPy arrays have a fixed type. This means, for example, that if you attempt to insert a floating-point value into an integer array, the value will be silently converted. .. code:: python a[1,2] = 4.2 # this will be converted to integer! .. parsed-literal:: array([[8, 0, 3], [3, 7, 4], [3, 5, 2], [4, 7, 6]]) Slicing of arrays --------------------------------------- Multi-dimensional slices work in the same way as one-dimensional arrays (introduced in `Arrays, I`), with multiple slices separated by commas. For example: .. code:: python np.random.seed(1) x = np.random.randint(10, size=(3, 4)) .. parsed-literal:: array([[5, 8, 9, 5], [0, 0, 1, 7], [6, 9, 2, 4]]) .. code:: python x[:2, :3] # two rows, three columns .. parsed-literal:: array([[5, 8, 9], [0, 0, 1]]) .. code:: python x[:3, ::2] # all rows, every other column .. parsed-literal:: array([[5, 9], [0, 1], [6, 2]]) One commonly needed routine is accessing of single rows or columns of an array. This can be done by combining indexing and slicing, using an empty slice marked by a single colon (``:``): .. code:: python print(x[:, 0]) # first column of x .. parsed-literal:: [5, 0, 6] .. code:: python print(x[0, :]) # first row of x .. parsed-literal:: [5, 8, 9, 5] In the case of row access, the empty slice can be omitted for a more compact syntax: .. code:: python print(x[0]) # equivalent to x[0, :] One extremely important and useful thing to know about array slices is that they return *views* rather than *copies* of the array data. To better illustrte this, let us consider our two-dimensional array from before: .. code:: python print(x) .. parsed-literal:: [[5 8 9 5] [0 0 1 7] [6 9 2 4]] Let's extract a :math:`2 \times 2` subarray from this: .. code:: python x_subar = x[:2, :2] print(x_subar) .. parsed-literal:: [[5 8] [0 0]] Now if we modify this subarray, we'll see that the original array is changed! Observe: .. code:: python x_subar[1, 0] = 57 print(x_subar) .. parsed-literal:: [[5 8] [57 0]] .. code:: python print(x) .. parsed-literal:: [[ 5, 8, 9, 5], [57, 0, 1, 7], [ 6, 9, 2, 4]] This default behavior is actually quite useful: it means that when we work with large datasets, we can access and process pieces of these datasets without the need to copy the underlying data buffer. Nevertheless, it is sometimes useful to instead explicitly copy the data of interest to a new array to be able to modify the copy without affecting the original. This can be most easily done with the ``copy()`` method: .. code:: python x_subar_copy = x[:2, :2].copy() print(x_subar_copy) .. parsed-literal:: [[5 8] [0 0]] If we now modify this subarray, the original array is not affected: .. code:: python x_subar_copy[1, 0] = 42 print(x_subar_copy) .. parsed-literal:: [[ 5 8] [42 0]] .. code:: python print(x) .. parsed-literal:: [[5 8 9 5] [0 0 1 7] [6 9 2 4]] Reshaping of arrays ------------------- Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the ``reshape`` method. For example, if you want to put the numbers 1 through 16 in a :math:`4 \times 4` grid, you can do the following: .. code:: python grid = np.arange(1, 17).reshape((4, 4)) print(grid) .. parsed-literal:: [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]] Note that for this to work, the size of the initial array must match the size of the reshaped array. Another common reshaping pattern is the conversion of a one-dimensional array into a two-dimensional row or column matrix. .. code:: python x = np.array([1, 2, 3]) # row vector via reshape x.reshape((1, 3)) .. parsed-literal:: array([[1, 2, 3]]) .. code:: python # column vector via reshape x.reshape((3, 1)) .. parsed-literal:: array([[1], [2], [3]]) Actually, we can reshape into any shape, as long as the elements required for reshaping are equal in both shapes. Concatenation and splitting of arrays -------------------------------------- All of the preceding routines worked on single arrays. It's also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We'll take a look at those operations here. Concatenation of arrays ~~~~~~~~~~~~~~~~~~~~~~~ Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines ``np.concatenate``, ``np.vstack``, and ``np.hstack``. ``np.concatenate`` takes a tuple or list of arrays as its first argument, as we can see here: .. code:: python x = np.array([1, 2, 3]) y = np.array([4, 5, 6]) np.concatenate([x, y]) .. parsed-literal:: array([1, 2, 3, 4, 5, 6]) You can also concatenate more than two arrays at once: .. code:: python z = [7, 8, 9] print(np.concatenate([x, y, z])) .. parsed-literal:: [1 2 3 4 5 6 7 8 9] It can also be used for two-dimensional arrays: .. code:: python grid1 = np.array([[1, 2, 3], [4, 5, 6]]) grid2 = np.array([[ 7, 8, 9], [10, 11, 12]]) .. code:: python # concatenate along the first axis np.concatenate([grid1, grid2]) .. parsed-literal:: array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]]) .. code:: python # concatenate along the second axis (zero-indexed) np.concatenate([grid1, grid2], axis=1) .. parsed-literal:: array([[ 1, 2, 3, 7, 8, 9], [ 4, 5, 6, 10, 11, 12]]) For working with arrays of mixed dimensions, it can be clearer to use the ``np.vstack`` (vertical stack) and ``np.hstack`` (horizontal stack) functions: .. code:: python x = np.array([1, 2, 3]) grid = np.array([[4, 5, 6], [7, 8, 9]]) # vertically stack the arrays np.vstack([x, grid]) .. parsed-literal:: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) .. code:: python # horizontally stack the arrays y = np.array([[99], [100]]) np.hstack([grid, y]) .. parsed-literal:: array([[ 4, 5, 6, 99], [ 7, 8, 9, 100]]) Similarly, ``np.dstack`` will stack arrays along the third axis. Splitting of arrays ~~~~~~~~~~~~~~~~~~~ The opposite of concatenation is splitting, which is implemented by the functions ``np.split``, ``np.hsplit``, and ``np.vsplit``. For each of these, we can pass a list of indices giving the split points: .. code:: python x = [5, 8, 9, 4, 3, 0, 1, 7, 6, 10] x1, x2, x3 = np.split(x, [3, 5]) print(x1, x2, x3) .. parsed-literal:: [5 8 9] [4 3] [ 0 1 7 6 10] Notice that *N* split-points, leads to *N + 1* subarrays. The related functions ``np.hsplit`` and ``np.vsplit`` are similar: .. code:: python grid = np.arange(16).reshape((4, 4)) grid .. parsed-literal:: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) .. code:: python upper, lower = np.vsplit(grid, [2]) print(upper) print(lower) .. parsed-literal:: [[0 1 2 3] [4 5 6 7]] [[ 8 9 10 11] [12 13 14 15]] .. code:: python left, right = np.hsplit(grid, [2]) print(left) print(right) .. parsed-literal:: [[ 0 1] [ 4 5] [ 8 9] [12 13]] [[ 2 3] [ 6 7] [10 11] [14 15]] Similarly, ``np.dsplit`` will split arrays along the third axis. .. NTS: examples from arrays, earlier: #. An :math:`\displaystyle 2\times3` matrix of bools, with the first row ``True`` and the second row ``False``. #. A matrix of zeros: :math:`~~\displaystyle \texttt{Mint} \in \mathbf{Z}^{5\times10}`. #. A matrix of zeros: :math:`~~\displaystyle \texttt{Mfl} \in \mathbf{R}^{2\times3\times4}`. Print the shape of ``Mfl`` to verify its dimensions.