Numerical errors (very short intro)#

Computers are finite. Therefore, they cannot represent the whole range of integer or real numbers. This implies some intrinsic errors when performing numerical computations, errors that one must be aware and control as best as possible. Those errors could appear when you represent a very large (overflow) or very small (underflow) numbers, or when the computer numbers available are different from the one you want to represent (truncation/round-off), or when you perform computations among numbers that are very different, etc. Fortunately, there is a standard that allow us to have the same behavior (and errors) on almost all platforms, the IEEE754 standard. In the following we will explore some of the aspect associated with the integer but specially floating point arithmetic.

Binary representation#

Every number is represented as a binary one. Check

  • https://bartaz.github.io/ieee754-visualization/

  • https://float.exposed/0x44c00000

  • https://www.h-schmidt.net/FloatConverter/IEEE754.html

  • https://en.wikipedia.org/wiki/Floating-point_arithmetic

  • https://trekhleb.dev/blog/2021/binary-floating-point/

having finite memory, this also means that:

  • You have a finite range of integer and float number to represent

  • The density of the float numbers is not uniform (is a power law). In single precision, you have almost half of the whole available numbers between 0 and 1, and only 8000 between 1023 and 1024. So NORMALIZE your models! https://upload.wikimedia.org/wikipedia/commons/b/b6/FloatingPointPrecisionAugmented.png

  • In general, single precision numbers have a relative precision of about \(10^{-6}\), while double precision numbers have a relative precision of \(10^{-15}\).

If you want to learn more, check : https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Let’s start now to check the typical errors, the underflow and the underflow

Underflow and overflow#

Underflows means representing a number smaller thatn the available ones. This usually rounds to zero (good). Overflows means representing a numer larger than the available ones This usually rounds to inf (bad) or , for integers in languages like C++, to negative numbers (very bad). You can check the possible range here: https://en.cppreference.com/w/cpp/language/types

NOTE: In python, integers could have arbitrary precision. This removes the problem of overflow, but makes them slow when they have to be simulated in software.

# Overflow, underflow 
# EXERCISE:  Find the N that produces overflow and underflow, for both np.float32 and np.float64

import numpy as np
under = 1.
over = 1.
N = 150
odata = np.zeros(N, dtype=np.float32)
udata = np.zeros(N, dtype=np.float32)
for ii in range(N):
    under /= 2.0
    over *= 2.0
    odata[ii] = over
    udata[ii] = under
    #print(f"{ii:25d}\t{udata[ii]:25.16e}\t{odata[ii]:25.16e}\n")

import matplotlib.pyplot as plt
%matplotlib inline
fig, axes = plt.subplots(1, 2, figsize=(10, 4))
axes[0].semilogy(np.arange(N), odata)
axes[1].semilogy(udata)
fig.tight_layout()
/tmp/ipykernel_2252/1874025384.py:13: RuntimeWarning: overflow encountered in cast
  odata[ii] = over
../_images/6a275169bd66b2b9281d84e28acc45d59092fa9b685396857e01ae22afa1d271.png
print(odata[-30:])
print(udata[-10:])
[2.6584560e+36 5.3169120e+36 1.0633824e+37 2.1267648e+37 4.2535296e+37
 8.5070592e+37 1.7014118e+38           inf           inf           inf
           inf           inf           inf           inf           inf
           inf           inf           inf           inf           inf
           inf           inf           inf           inf           inf
           inf           inf           inf           inf           inf]
[3.59e-43 1.79e-43 8.97e-44 4.48e-44 2.24e-44 1.12e-44 5.61e-45 2.80e-45
 1.40e-45 0.00e+00]

Machine precision#

You can also compute/verify the machine precision \(\epsilon_m\) of your machine according to different types. The machine precision is a number that basically rounds to zero and in opertation. It can be represented as \(1_c + \epsilon_m = 1_c\), where \(1_c\) is the computational representation of the number 1. Actually, that means than any real number \(x\) is actually represented as $\(x_c = x(1+\epsilon), |\epsilon| \le \epsilon_m .\)$

Implement the following algorithm to compute it and report your results:

eps = 1.0
begin do N times
    eps = eps/2.0
    one = 1.0 + eps
    print

What do you obtain for np.float32, np.float32, and np.float64?

Practical example: Series expansion of \(e^{-x}\)#

The function \(e^{-x}\) can be expanded as $\(e^{-x} = \sum_{i=0}^{\infty} (-1)^i \frac{x^i}{i!} = 1 - x + \frac{x^2}{2} - \frac{x^3}{6} + \ldots \ (|x| < \infty)\)$

This is a great expansion, valid for all finite values of \(x\). But, what numerical problems do you see?

Implement a function that receives \(x\) and \(N\) (max number of terms), and saves the iteration value of the series as a function of \(i\) in a file called sumdata.txt. Then load the data and plot it. Use \(N = 10000\) and \(x=1.8766\). You will need to implement also a factorial function.

# use fout.write(f"...") to save the data 
def exp_expansion(x, N, fname):
    """Add docs here"""
    total = 0.0
    fout = open(fname, "w")
    # YOUR CODE HERE
    raise NotImplementedError()
    fout.close()

def factorial(n):
    """Add docs here"""
    result = 1.0
    for ii in range(1,n+1):
        result *= ii
    return result
 
# Call the function  
exp_expansion(x=1.8766, N=10000, fname="sumdata.txt")
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
Cell In[3], line 18
     15     return result
     17 # Call the function  
---> 18 exp_expansion(x=1.8766, N=10000, fname="sumdata.txt")

Cell In[3], line 7, in exp_expansion(x, N, fname)
      5 fout = open(fname, "w")
      6 # YOUR CODE HERE
----> 7 raise NotImplementedError()
      8 fout.close()

NotImplementedError: 
%matplotlib inline
# Plot the data

import matplotlib.pyplot as plt
import numpy as np
x, y = np.loadtxt('sumdata.txt', unpack = True)
plt.plot(x, y, 'ro')

As you can see, there are many problems with this approach. When doing computational tasks, you cannot think that you are doing maths, you need to think also about the computer.

Now try to avoid the intrinsic overflows, underflows (and substractive cancelations) by reworking the sum term in a recurrent way: if $\(a_i = (-1)^i \frac{x^i}{i!}\)\( how could you write \)a_{i+1}\( as a function of \)a_i$? Thin about it and then check the next cell.

\[a_{i+1} = \frac{-x a_i}{i+1}\]

Now your task is to implement a second version using the recurrence function and compare the results with the first one. Write a function that prints i expansion1 expansion2 and plot the. Is there any advantage on using the new form?

# use fout.write(f"...") to save the data 
def exp_expansion_new(x, N, fname):
    """Add docs here"""
    total = 0.0
    a_i = 1.0
    total2 = 0.0
    fout = open(fname, "w")
    # YOUR CODE HERE
    raise NotImplementedError()
    fout.close()

# Call the NEW function  
exp_expansion_new(x=1.8766, N=10000, fname="sumdatanew.txt")
%matplotlib notebook
# Plot the data

import matplotlib.pyplot as plt
import numpy as np
x, y, z = np.loadtxt('sumdatanew.txt', unpack = True)
plt.plot(x, y, 'ro')
plt.plot(x, z, 'b-')