NumPy is a first-rate library for numerical programming. It is widely used in academia, finance and also in the industry.
The Pandas library introduced in Chapter 9 is also built on top of NumPy, providing high-performance, easy-to-use data structures and data analysis tools, making data manipulation and visualization more convinient.
If you have Anaconda installed, then numpy was already installed together with it.
If you have a standalone Python3 and Jupyter Notebook installation, open a command prompt / terminal and type in:
pip3 install numpy
The numpy package is a module which you can simply import. It is usually aliased with the np
abbreviation:
import numpy as np
The most important structure that NumPy defines is an array data type formally called a numpy.ndarray
- for N dimensional array.
import numpy as np
a = np.zeros(3)
a
type(a)
NumPy arrays are somewhat like native Python lists, except that:
The most important of these dtypes are:
float64
: 64 bit floating-point numberint64
: 64 bit integerbool
: 8 bit True or False
There are also dtypes to represent complex numbers, unsigned integers, etc.The default dtype for arrays is float64
:
a = np.zeros(3)
type(a[0])
If we want to use integers we can specify it:
a = np.zeros(3, dtype=int)
type(a[0])
Here b
is a flat array with no dimension - neither row nor column vector.
The dimension is recorded in the shape
attribute, which is a tuple.
b = np.zeros(10)
b.shape
To give it dimension, we can change the shape
attribute:
b.shape = (10, 1)
b
Make it a 2 by 2 array:
b = np.zeros(4)
b.shape = (2, 2)
b
Dimension can also be specified initially when using the np.zeros()
function.
b = np.zeros((2, 2))
b
You can probably guess what np.ones
creates.
b = np.ones(10)
b
We have already discussed np.zeros()
and np.ones()
.
Set up a grid of evenly spaced numbers.
b = np.linspace(2, 4, 5)
b
Create an identity matrix.
b = np.identity(3)
b
NumPy arrays can be created from Python lists, tuples, etc.
b = np.array([10, 20])
b
The data type can also be configured, here float
is equivalent to np.float64
:
b = np.array((10, 20), dtype=float)
b
Create a 2 dimensional, 2 by 2 array:
b = np.array([[1, 2], [3, 4]])
b
For a flat array, indexing is the same as Python sequences.
c = np.linspace(1, 2, 5)
c
c[0]
c[1:3]
c[-1]
For 2D arrays we use an index position for each dimension.
d = np.array([[1, 2], [3, 4]])
d
d[0, 1]
Note that indices are still zero-based, to maintain compatibility with Python sequences.
Columns and rows can be extracted as follows:
d[0, :]
d[:, 1]
NumPy arrays of integers can also be used to extract elements.
indices = np.array((0, 2, 3))
c[indices]
A NumPy array of boolean values can be used to filter elements at the True
locations.
e = np.array([0, 1, 1, 0, 0], dtype=bool)
e
c[e]
Numpy arrays have useful methods, many of them should be familiar from previous lectures.
f = np.array((3, 2, 4, 1))
f
f.sort() # Sorts a in place
f
f.sum() # Sum
f.mean() # Mean
f.max() # Max
f.argmax() # Returns the index of the maximal element
f.cumsum() # Cumulative sum of the elements
f.cumprod() # Cumulative product of the elements
f.var() # Variance
f.std() # Standard deviation
f.shape = (2, 2)
f
f.transpose() # or simpy f.T
Many of the methods discussed above have equivalent functions in the NumPy namespace, e.g.:
print("Sum: {0}".format(np.sum(f)))
print("Mean: {0:.2f}".format(np.mean(f)))
The operators +
, -
, *
, /
and **
all act elementwise on NumPy arrays.
a = np.array([1, 2, 3, 4])
b = np.array([5, 6, 7, 8])
a + b
a * b
a + 10
a * 10
Multi dimensional arrays follow the same general rules.
a.shape = (2, 2)
b.shape = (2, 2)
a + b
a + 10
a * b
Calculate the dot product of two NumPy arrays.
np.dot(a, b)
The @
operator does the same thing.
a @ b
Calculate the cross product of two NumPy arrays.
np.cross(a, b)
Generate random numbers of the standard normal distribution:
g = np.random.randn(3)
g
Generate random integers between a lower (inclusive) and a higher (exclusive) bound:
g = np.random.randint(0, 100, 5)
g
NumPy arrays are mutable data types, like Python lists. In other words, their contents can be altered (mutated) in memory after initialization.
To make an independent copy of a NumPy array, the np.copy()
function can be used.
h = g
i = g.copy()
h[0] = 42
print(g)
print(h)
print(i)
The np.vectorize()
creates a vectorized function, which can be performed on a NumPy array in an elementwise manner.
# is_even() can be called on an integer number
def is_even(x): return x % 2 == 0
# is_even_vectorized() can be called on an array of integers
is_even_vectorized = np.vectorize(is_even)
is_even_vectorized(g)
The NumPy function np.where()
provides a vectorized alternative.
np.where(g % 2 == 0, 1, 0)
As a rule, comparisons on arrays are done elementwise.
z = np.array([2, 3])
y = np.array([2, 3])
z == y
y[0] = 5
z == y
z != y
The situation is similar for >
, <
, >=
and <=
.
We can also do comparisons against scalars:
x = np.linspace(0, 10, 5)
x
x > 3
This is particularly useful for conditional extraction:
cond = x > 3
x[cond]
Of course we can - and frequently do - perform this in one step:
x[x > 3]
k = np.array([[1, 2], [3, 4]])
k
Compute the determinant:
np.linalg.det(k)
Compute the inverse:
np.linalg.inv(k)
Generate 20 evenly distributed number between 0 and 10 into x
. Generate the sine function value into y
for each elements in x
.
x = np.linspace(0, 10, 20)
y = np.sin(x)
print(x)
print(y)
Generate 100 evenly distributed number between 0 and 10 into xvals
. Calculate the interpolated values into yinterp
for each elements in xvals
, based on x
and y
.
xvals = np.linspace(0, 10, 100)
yinterp = np.interp(xvals, x, y)
print(xvals)
print(yinterp)
Visualize the results on a plot. (For plotting, see Chapter 10.)
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(x, y, 'o')
plt.plot(xvals, yinterp, '-x')
plt.show()