Efficient Python arrays : Numpy#

REFS:

  • Scipy lecture notes

  • A primer of Scientific Programming with Python

  • https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

  • https://jalammar.github.io/visual-numpy/

  • https://realpython.com/numpy-tutorial/

The numerical python package, numpy, allows for easy and efficient manipulation of vectorial data. In this context, it is important to remark some of the so-called vectorial computation. Let’s give an example. Imagine a two dimensional vector \(a\). How would you represent it using python? You coul use either a list, a tuple, etc:

a1 = [x, y]
a2 = (x, y)

Of course, it can be generalized to more dimensions:

a1 = [x, y, z, w]
a2 = [0.0, -0.9, 1, -3, 9, ... , 90]

Managing vectors by using these typical python constructs is very good from a general programmer point of view, but it has its costs. For example, iterating over a lits by means of a for loop could be very slow. Actually, for typical problemas of computational mathematics and physics, only homogeneous storage structs with fast access are needed, like the arrays of languages like c++, fortran, etc. Arrays might be more limited than general lists, but could vastly outperform lists and tuples for vectorial computations. The numpy package provides array like structs to perform fast mathematical operations on numerical data.

Furthermore, vectorial operations are natural on numpy arrays. But what is a vectorial operation? Let’s assume we have a vector \(v\) of \(n\)-components. What would be the meaning of something like u = sin(v) ? in the context of vectorial computing, it would mean to apply the function sin to every component of the vector v, therefore \(u_i = \sin(v_i)\). In general, given a function \(f\), the expression \(u = f(v)\) means \(u_i = f(v_i)\). Numpy allows for this kind of computing. What would be the meaning of u = v^2*cos(v)*e^v + 2?

https://miro.medium.com/max/700/1*D-I8hK4WXC8wtpR5tvR0fw.png

See: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

Basic numpy#

Try the following code snippet:

import numpy as np
a = np.array([0, 1.0, '3', 5])
print (a)
a.dtype
['0' '1.0' '3' '5']
dtype('<U32')

This means: we are importing the numpy package with the name np. Then, we create an numpy array by using the np.array function from a list of values, and assign the result to the variable a. What is a? use a? . The array a has several attributes which you can use later, like the shape and type of the internal data.

Let’s now compare the efficiency of a list versus a numpy array, by means of the %timeit magic function of ipython:

L = range(10000)
%timeit [i**2 for i in L]
415 μs ± 87.4 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
a = np.arange(10000)
%timeit a**2
3.03 μs ± 14.9 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

You can extract some info like the shape and the dimension as

a.ndim
1
a.shape
(10000,)
a.dtype
dtype('int64')

Alternative ways to create arrays#

a = np.arange(10)
b = np.arange(1, 9, 2)

The function linspace is very useful. It allows to create a uniform partition on a given interval for a given number of points. For example, to create an array of 100 points uniformly on the interval [2, 3], you can use

a = np.linspace(2, 3, 100)
print a
  Cell In[7], line 2
    print a
    ^
SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

Check the documentation.

You can also create several other types of useful arrays like

a = np.ones((3, 4)) # shape is a tuple
print a
a = np.random.rand(4)
print a
https://miro.medium.com/max/700/1*cyN_FxUVbkdDyrULhfTIGw.png

Ref: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

# Truncation errors
data = np.arange(0.5, 0.8, 0.1)
print(f"{data = }") # should not include 0.8
# Solution: dont use floats, use integers
data = 0.1*np.arange(5, 8)
print(f"{data = }")
data = 0.5 + 0.1*np.arange(0, 3)
print(f"{data = }")

Indexing and slicing#

Numpy allows for powerfull and efficient access to internal members of arrays

from IPython.core.display import Image 
Image(filename='slicing.png')

Copies and Views#

A slicing operation creates a view of the original array, not a copy (in contrast, an slice of a list creates a new list). A modification of a view, modifies the original array:

a = np.arange(10)
print (a)
b = a[::2]
print (b)
b[0] = 12
print (b)
print (a)

But, if you really need a copy, you can use the .copy method:

a = np.arange(10)
print (a)
c = a[::2].copy()
c[0] = 12
print (a)
print (c)

Fancy indexing (masked arrays)#

Using arrays to mask indexes is an useful way to manipulate arrays. Numpy provides this functionality. Note: Fancy indexing creates copies, not views.

Boolean masks#

np.random.seed(3)
a = np.random.random_integers(0, 20, 15)
print (a)
print (a % 3 == 0)
mask = (a % 3 == 0)
print ("Mask = ", mask)
extract_from_a = a[mask]
print (extract_from_a)
print (a)
a[a%3 == 0] = -1
print (a)
https://miro.medium.com/max/700/1*nFGcXav_xxD7TXGiRYMpHg.png

Ref: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

Indexing with integers#

a = np.arange(0, 100, 10)
print a
a[[2, 3, 2, 4, 2]]
a[[9, 7]] = -100
print a
Image(filename='fancy_indexing.png')

Numerical operations (or vector computing)#

As stated at the beginning, numpy arrays are well suited for the so called numerical computing. In this section we will see some examples to get familiar with this kind of operations. https://miro.medium.com/max/700/1*XbZqTBXZmnQtVZ_llbGJTw.png From: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

a = np.array([1, 2, 3, 4])
print (a) 
print (a + 1)
print (2*a)
b = np.ones(4) + 1
print (b)
b-a
j = np.arange(5)
2**(j + 1) - j

Performance comparison:

a = np.arange(10000)
%timeit a + 1
l = range(10000)
%timeit [i+1 for i in l]

Take into account that array multiplication should be done through function np.dot

You can use a lot of internal functions (https://numpy.org/doc/stable/reference/routines.math.html) and random generators (https://numpy.org/doc/stable/reference/random/index.html)

2D Arrays#

https://miro.medium.com/max/700/1*aLMuXA81pDXaw0J0QdKvRQ.png

REF: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

Axis#

https://miro.medium.com/max/700/1*jmXqsVUNaBaUsBAkHgqb3A.png

REF: https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

Matrix aggregation#

https://jalammar.github.io/images/numpy/numpy-matrix-aggregation-4.png

REF: https://jalammar.github.io/visual-numpy/

https://miro.medium.com/max/1400/1*SJbmp5_qJixVztVMJQn9sw.png https://miro.medium.com/max/1400/1*ZUFxtl3Z2cdDZhcLUNpE0Q.png

Exercises#

Reading a file#

Create a Reading a two column file: Make a program who reads a two column file and stores the first column in a list called x and the second one in a list called y. Then convert the list to arrays and plot them. Test it with some example file.

# YOUR CODE HERE
raise NotImplementedError()

Reading file with comments#

Extend the previous exercise to be able to read a data file with comments. The comment chracter is supposed to be #. Every line starting with # should be ignored. Test.

# YOUR CODE HERE
raise NotImplementedError()

Using loadtxt#

Improve exercise 1 and 2 by using the numpy.loadtxt() function. You should rerad the documentation. Test and compare.

# YOUR CODE HERE
raise NotImplementedError()

Printing with comments#

Write a program which prints tabulated data for a given function, but also printing some comments on it using the # character. Use the previous program to make sure you can read back the data.

# YOUR CODE HERE
raise NotImplementedError()

Kinematics from file#

Assume that you are given a file which has printed the values \(a_0, a_1, \ldots, a_k\) for the acceleration of a given system at specified intervals of size \(\Delta t\), that is, \(t_k = k\Delta t\). Your task is to read those values and to compute the velocity of the system at some time \(t\). To do that remember that the acceleration can be given as \(a(t) = v'(t)\). Therefore, to find \(v\), you must integrate the acceleration as

\[v(t) = v(0) + \int\limits_0^t d\tau\ a(\tau).\]

If \(a(t)\) is only known at discrete points, as in this case, you have to approximate the integral. You can use the trapezoidal rule to get

\[v(t_k) \simeq \Delta t \left( \frac{1}{2}a_0 + \frac{1}{2}a_k + \sum\limits_{i=1}^{k-1}a_i \right), 1 \le k \le n-1 .\]

Assume that \(v(0) = 0\). Your program should: Read the values for \(a\) from the array. Then, compute the values for velocity and finally plot the acceleration and the velocity as a function of time. Good test cases for this problem are null values for the acceleration, and constant values for the acceleration, whose theoretical solution you already know. The \(\Delta t\) value should be specified at the command line (use the sys module to read command line arguments).

# YOUR CODE HERE
raise NotImplementedError()

3D matrix example: Image#

https://jalammar.github.io/images/numpy/numpy-grayscale-image.png https://jalammar.github.io/images/numpy/numpy-color-image.png

REF: https://jalammar.github.io/visual-numpy/

Image tutorial example#

We will use the following kitty image: https://files.realpython.com/media/kitty.90952ca484f1.jpg from https://realpython.com/numpy-tutorial/ . The local name will be just kitty.jpg .

11-IntroPython/kitty.jpg
import numpy as np
import matplotlib.image as mpimg

img = mpimg.imread("kitty.jpg")
print(type(img))
print(img.shape)
import matplotlib.pyplot as plt
imgplot = plt.imshow(img)
# Now remove the red and green channels
output = img.copy()  # The original image is read-only!
output[:, :, :2] = 0
mpimg.imsave("kitty-blue.jpg", output)
img_blue = mpimg.imread("kitty-blue.jpg")
imgplot_blue = plt.imshow(img_blue)
# Now make it only green
# Now let's make it gray-scale
# This means averaging all the channels
averages = img.mean(axis=2)  # Take the average of each R, G, and B
mpimg.imsave("kitty-bad-gray.jpg", averages, cmap="gray")
img_gray = mpimg.imread("kitty-bad-gray.jpg")
imgplot_gray = plt.imshow(img_gray)
# Now improve it by using the luminosity method:
# Differente weigths for each channel
weights = np.array([0.3, 0.59, 0.11])
grayscale = img @ weights # dot product
mpimg.imsave("kitty-good-gray.jpg", grayscale, cmap="gray")
img_gray = mpimg.imread("kitty-good-gray.jpg")
imgplot_gray = plt.imshow(img_gray)
#!pip3 install --user scikit-image
!conda install -y scikit-image
from skimage import data
cat = data.chelsea()
type(cat)
cat.shape
plt.imshow(cat)

Exercise#

Obtain the following image (zoom on 1/4 to 3/4 on each axis): 11-IntroPython/chelsea-zoom-green.jpg