Python arrays : Numpy

Python arrays : Numpy#

REF: Scipy tutorial and A primer of Scientific Programming with Python.

The numerical python package, numpy, allows for easy and efficient manipulation of vectorial data. In this context, it is important to remark some of the so-called vectorial computation. Let’s give an example. Imagine a two dimensional vector \(a\). How would you represent it using python? You coul use either a list, a tuple, etc:

    a1 = [x, y]
    a2 = (x, y)

Of course, it can be generalized to more dimensions:

    a1 = [x, y, z, w]
    a2 = [0.0, -0.9, 1, -3, 9, ... , 90]

Managing vectors by using these typical python constructs is very good from a general programmer point of view, but it has its costs. For example, iterating over a lits by means of a for loop could be very slow. Actually, for typical problemas of computational mathematics and physics, only homogeneous storage structs with fast access are needed, like the arrays of languages like c++, fortran, etc. Arrays might be more limited than general lists, but could vastly outperform lists and tuples for vectorial computations. The numpy package provides array like structs to perform fast mathematical operations on numerical data.

Furthermore, vectorial operations are natural on numpy arrays. But what is a vectorial operation? Let’s assume we have a vector \(v\) of \(n\)-components. What would be the meaning of something like u = sin(v) ? in the context of vectorial computing, it would mean to apply the function sin to every component of the vector v, therefore \(u_i = \sin(v_i)\). In general, given a function \(f\), the expression \(u = f(v)\) means \(u_i = f(v_i)\). Numpy allows for this kind of computing. What would be the meaning of u = v^2*cos(v)*e^v + 2 ?

REFS:

https://jalammar.github.io/visual-numpy/
https://betterprogramming.pub/numpy-illustrated-the-visual-guide-to-numpy-3b1d4976de1d

Basic numpy#

Try the following code snippet:

import numpy as np
a = np.array([0, 1.0, '3', 5])
print (a)
a.dtype

['0' '1.0' '3' '5']

dtype('<U32')

This means: we are importing the numpy package with the name np. Then, we create an numpy array by using the np.array function from a list of values, and assign the result to the variable a. What is a? use a? . The array a has several attributes which you can use later, like the shape and type of the internal data.

Let’s now compare the efficiency of a list versus a numpy array, by means of the %timeit magic function of ipython:

L = range(10000)
%timeit [i**2 for i in L]

418 μs ± 2.96 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

a = np.arange(10000)
%timeit a**2

3.1 μs ± 49.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

You can extract some info like the shape and the dimension as

a.ndim

a.shape

(10000,)

a.dtype

dtype('int64')

Alternative ways to create arrays#

a = np.arange(10)
b = np.arange(1, 9, 2)

The function linspace is very useful. It allows to create a uniform partition on a given interval for a given number of points. For example, to create an array of 100 points uniformly on the interval [2, 3], you can use

a = np.linspace(2, 3, 100)
print (a)

[2.         2.01010101 2.02020202 2.03030303 2.04040404 2.05050505
06060606 2.07070707 2.08080808 2.09090909 2.1010101  2.11111111
12121212 2.13131313 2.14141414 2.15151515 2.16161616 2.17171717
18181818 2.19191919 2.2020202  2.21212121 2.22222222 2.23232323
24242424 2.25252525 2.26262626 2.27272727 2.28282828 2.29292929
3030303  2.31313131 2.32323232 2.33333333 2.34343434 2.35353535
36363636 2.37373737 2.38383838 2.39393939 2.4040404  2.41414141
42424242 2.43434343 2.44444444 2.45454545 2.46464646 2.47474747
48484848 2.49494949 2.50505051 2.51515152 2.52525253 2.53535354
54545455 2.55555556 2.56565657 2.57575758 2.58585859 2.5959596
60606061 2.61616162 2.62626263 2.63636364 2.64646465 2.65656566
66666667 2.67676768 2.68686869 2.6969697  2.70707071 2.71717172
72727273 2.73737374 2.74747475 2.75757576 2.76767677 2.77777778
78787879 2.7979798  2.80808081 2.81818182 2.82828283 2.83838384
84848485 2.85858586 2.86868687 2.87878788 2.88888889 2.8989899
90909091 2.91919192 2.92929293 2.93939394 2.94949495 2.95959596
96969697 2.97979798 2.98989899 3.        ]

# Check the memory address
print(a[0].data, a[1].data, a[2].data)

<memory at 0x7f1b98593380> <memory at 0x7f1b98593e20> <memory at 0x7f1b98593740>

Check the documentation.

You can also create several other types of useful arrays like

a = np.ones((3, 4)) # shape is a tuple
print (a)

[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

a = np.random.rand(4)
print (a)

[0.97134854 0.39947064 0.95508733 0.29132007]

Indexing and slicing#

Numpy allows for powerfull and efficient access to internal members of arrays

from IPython.core.display import Image 
Image(filename='slicing.png')

../_images/c01ea71d0eac220fc517d22cf8b21095d9d4ae290f89a3a7ca49b087f1f2f0de.png

Copies and Views#

A slicing operation creates a view of the original array, not a copy (in contrast, an slice of a list creates a new list). A modification of a view, modifies the original array:

a = np.arange(10)
print (a)
b = a[::2]
print (b)
b[0] = 12
print (b)
print (a)

[0 1 2 3 4 5 6 7 8 9]
[0 2 4 6 8]
[12  2  4  6  8]
[12  1  2  3  4  5  6  7  8  9]

But, if you really need a copy, you can use the .copy method:

a = np.arange(10)
print (a)
c = a[::2].copy()
c[0] = 12
print (a)
print (c)

[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4 5 6 7 8 9]
[12  2  4  6  8]

Indexing with integers#

a = np.arange(0, 100, 10)
print (a)
a[[2, 3, 2, 4, 2]]

[ 0 10 20 30 40 50 60 70 80 90]

array([20, 30, 20, 40, 20])

a[[9, 7]] = -100
print (a)

[   0   10   20   30   40   50   60 -100   80 -100]

mask = a>=50
print(mask)
a[mask]

[False False False False False  True  True False  True False]

array([50, 60, 80])

Numerical operations (or vector computing)#

As stated at the beginning, numpy arrays are well suited for the so called numerical computing. In this section we will see some examples to get familiar with this kind of operations.

a = np.array([1, 2, 3, 4])
print (a) 
print (a + 1)
print (2*a)

[1 2 3 4]
[2 3 4 5]
[2 4 6 8]

b = np.ones(4) + 1
print (b)
print(b-a)

[2. 2. 2. 2.]
[ 1.  0. -1. -2.]

Take into account that array multiplication should be done through function np.dot

Exercises#

Reading a file#

Create a Reading a two column file: Make a program who reads a two column file and stores the first column in a list called x and the second one in a list called y. Then convert the list to arrays and plot them. Test it with some example file.

# YOUR CODE HERE
raise NotImplementedError()

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[19], line 2
      1 # YOUR CODE HERE
----> 2 raise NotImplementedError()

NotImplementedError: 

Reading file with comments#

Extend the previous exercise to be able to read a data file with comments. The comment chracter is supposed to be #. Every line starting with # should be ignored. Test.

# YOUR CODE HERE
raise NotImplementedError()

Using loadtxt#

Improve exercise 1 and 2 by using the numpy.loadtxt() function. You should rerad the documentation. Test and compare.

# YOUR CODE HERE
raise NotImplementedError()

Printing with comments#

Write a program which prints tabulated data for a given function, but also printing some comments on it using the # character. Use the previous program to make sure you can read back the data.

# YOUR CODE HERE
raise NotImplementedError()

Kinematics from file#

Assume that you are given a file which has printed the values \(a_0, a_1, \ldots, a_k\) for the acceleration of a given system at specified intervals of size \(\Delta t\), that is, \(t_k = k\Delta t\). Your task is to read those values and to compute the velocity of the system at some time \(t\). To do that remember that the acceleration can be given as \(a(t) = v'(t)\). Therefore, to find \(v\), you must integrate the acceleration as

\[v(t) = v(0) + \int\limits_0^t d\tau\ a(\tau).\]

If \(a(t)\) is only known at discrete points, as in this case, you have to approximate the integral. You can use the trapezoidal rule to get

\[v(t_k) \simeq \Delta t \left( \frac{1}{2}a_0 + \frac{1}{2}a_k + \sum\limits_{i=1}^{k-1}a_i \right), 1 \le k \le n-1 .\]

Assume that \(v(0) = 0\). Your program should: Read the values for \(a\) from the array. Then, compute the values for velocity and finally plot the acceleration and the velocity as a function of time. Good test cases for this problem are null values for the acceleration, and constant values for the acceleration, whose theoretical solution you already know. The \(\Delta t\) value should be specified at the command line (use the sys module to read command line arguments).

# YOUR CODE HERE
raise NotImplementedError()