Modern arrays#

Arrays are useful to store large sets of data that share the same data type. Arrays are very fast to access and modify data, and are useful in several cases, like

  • Reading and storing data.

  • Applying the same operation to a large collection of data.

  • Store data in a systematic way instead of having many variables.

  • Model vectors, matrices, tensors.

  • Model vectorial and matrix operations.

  • Solve large linear problems, linear algebra problems, compute eigen values and eigen vectors.

  • Solve ODE systems.

  • Apply finite differences and finite elements.

  • Vectorize operations and use HPC.

  • Model data structs, from physical systems to graphical structs and so on.

  • Manipulate memory contents and program close to the metal.

There are many more data structs or containers: lists, dictionaries(maps), queue, binary trees, etc (see https://www.cs.usfca.edu/~galles/visualization/Algorithms.html). Arrays, in particular, are very fast to access and retrieve data from memory. But they are not fast if you need to insert and delete data in the middle. In general, for scientific computing, arrays are the default data struct, but you need always to consider which one would be the best for your case.

Since c++11 it is possible to use modern data structs that are equivalent to the primitive data arrays but give a better user interface, also handling automatically the memory. Although there are two analog structures, std::array and std::vector (and in the future we will use std::valarray), we will focus mainly on the std::vector since it handles dynamic memory (std::array is for static arrays). The key here is that these data structures create a continuous block memory block that can be accessed very fast: https://hackingcpp.com/cpp/std/sequence_containers.html

Image Description
From: "https://hackingcpp.com/cpp/std/vector.png"

The following example shows the basic features for both of them and also a new kind of for. It is recommended to run this on a graphical debugger , either locally in emacs or visual code, or online on the python tutor or onlinegdb, at the link https://onlinegdb.com/CKeYkLEiz .

#include <iostream>
#include <array>
#include <vector>

int main(void)
{
  std::array<double, 10> data1;
  data1[5] = 9.87;
  data1[0] = -7.8765;
  data1[9] = 1000.343;
  std::cout << data1[5] << "\n";
  for (auto val : data1) {
    std::cout << val << "\n";
  }

  std::vector<double> data2;
  data2.resize(4); // size = 4, capacity 4
  data2[3] = 9.87;
  data2[0] = -7.8765;
  data2[1] = 1000.343;
  std::cout << data2[2] << std::endl;
  for (auto & val : data2) { // reference
    val *= 2; // modify the value
    std::cout << val << "\n";
  }
  data2.resize(5); // size = 5, capacity = 8 (double the initial, to avoid frequent mallocs)
  data2.push_back(0.987); // size = 6, capacity = 8
  data2.push_back(12343.987); // size = 7, capacity = 8
  data2.push_back(-0.000987); // size = 8, capacity = 8
  data2.push_back(3.986544); // size = 9, capacity = 16

 return 0;
}

As you can see, a vector is a dynamc array of homogeneous contigous elements.

Initial exercises#

  1. Find the maximum size you can use for a std::array and a std::vector . Are they the same?

  2. Print the address of the first and second elements in a std::array and std::vector. Are they contigous? To print the memory address of the first element use the & operator as &data[0]

  3. Use the algorithm foreach with a lambda function to divide by two each element in an array. Use also for each to print the elements and their memory address.

Using std::array : Fixed size, goes to the stack, automatic for#

// g++ -g -fsanitize=address
#include <iostream>
#include <array>

int main(void)
{
  const int N = 10;
  //double data[N] {0};
  std::array<double, N> data;

  std::cout << "size: " << data.size() << std::endl;
  std::cout << "max_size: " << data.max_size() << std::endl;
  std::cout << "&data[0]: " << &data[0] << std::endl;
  std::cout << "&data[1]: " << &data[1] << std::endl;

  //std::cout << data[-1] << std::endl; // detected by sanitizers address

// initialize the array with even numbers
  for(int ii = 0; ii < N; ++ii) {
    data[ii] = 2*ii;
  }


// print the array
  for(const auto & val : data) {
    std::cout << val << "  ";
  }
  std::cout << "\n";    

// compute the mean
  double sum = 0.0;
  for(auto val : data) {
    sum += val;
  }
  std::cout << "Average = " << sum/data.size() << std::endl;

  return 0;
}

You might visualize this in the python/c++ tutor.

And we can simplified the code even more if we use the standard library:

// g++ -g -fsanitize=address
#include <numeric>
#include <algorithm>
#include <iostream>
#include <array>

int main(void)
{
  const int N = 10;
  //double data[N] {0};
  std::array<double, N> data;

  std::cout << "size: " << data.size() << std::endl;
  std::cout << "max_size: " << data.max_size() << std::endl;
  std::cout << "&data[0]: " << &data[0] << std::endl;
  std::cout << "&data[1]: " << &data[1] << std::endl;

  //std::cout << data[-1] << std::endl; // detected by sanitizers address

// initialize the array with even numbers
  int ii = 0 ;
  auto init  = [&ii](double & x){ x = 2*ii; ii++; };
  std::for_each(data.begin(), data.end(), init);

// print the array
  auto print = [](const int & x) { std::cout << x << "  " ; };
  std::for_each(data.begin(), data.end(), print);
  std::cout << "\n";

// compute mean
  double avg = std::accumulate(data.begin(), data.end(), 0.0)/data.size();
  std::cout << "Average = " << avg << std::endl;
}

Exercises#

  1. Find the largest std::array size you can get in your system. Is it related to the ram size? check the command

    ulimit -s
    
  2. Write a program that computes the dot product between two std::arrays. You can use either indexes or the inner_product or transform_reduce algorithms.

  3. Can you read the std::array size from the command line using argv? try it.

Using std::vector : Dynamic size, uses the heap#

Vectors allow to use dynamic size arrays very easily and are the recommend data structure for our current math vector and matrix applications. You can see more info here:

For now, let’s just declare a vector

// g++ -g -fsanitize=address
#include <algorithm>
#include <numeric>
#include <iostream>
#include <vector>

int main(int argc, char **argv)
{
  const int N = std::atoi(argv[1]);
  std::vector<double> data;
  data.reserve(5); //affects capacity, can store 15 numbers but its size is still 0
  data.resize(N); // real size, capacity >= N

  std::cout << "size: " << data.size() << std::endl;
  std::cout << "capacity: " << data.capacity() << std::endl;
  std::cout << "max_size: " << data.max_size() << std::endl;
  std::cout << "&data[0]: " << &data[0] << std::endl;
  std::cout << "&data[1]: " << &data[1] << std::endl;

  //std::cout << data[-1] << std::endl; // detected by sanitizers address

// initialize the array with even numbers
  int ii = 0;
  auto init  = [&ii](double & x){ x = 2*ii; ii++; };
  std::for_each(data.begin(), data.end(), init);

// print the array
  auto print = [](const double & x){ std::cout << x << "  ";};
  std::for_each(data.begin(), data.end(), print);
  std::cout << "\n";

// compute mean
  std::cout << "Average: " << std::accumulate(data.begin(), data.end(), 0.0)/data.size() << "\n";

// Add more data!
  std::cout << "size: " << data.size() << ", capacity: " << data.capacity() << std::endl;
  for (auto x : {200.0987, 300.987, 400.09754}){
    data.push_back(x);
    std::cout << "size: " << data.size() << ", capacity: " << data.capacity() << std::endl;
  }
  std::for_each(data.begin(), data.end(), print);
  std::cout << "\n";

  return 0;
}

Exercises#

Always try to use the STL, and also use functions to implement the following:

  1. Compute the norm of a vector.

  2. Implement the Gauss 7 point method, storing the weights and the coordinates points in a vector.

  3. Compute the dot product between two vectors. Use std::inner_product.

  4. Assume that a vector represents the coefficients of a n-degree polynomial. Compute the derivative of the polynomial and save the coefficients in another vector.

        data = {1 , 3, 4.5}; // 1*x^0 + 3*x^1 + 4.5*x^2
        newdata = {3, 9.0}; // 0 + 3*x^0 + 2*4.5*x^1
    
  5. Compute the min of a vector.

  6. Compute the max of a vector.

  7. Compute the argmin (index of the min) of a vector

  8. Compute the argmax (index of the max) of a vector

  9. Compute the pnorm

    (17)#\[\begin{equation} \left(\sum_{i} x_{i}^{p}\right)^{1/p} \ . \end{equation}\]
  10. Read the contents of a large file into a vector. Compute and print the mean, the median, the 25, 50, 75 percentiles, the min value, and the max value.

  11. Use the standard algorithm sort to sort a data array.

  12. Fill a vector with random numbers

        // llenar un vector con numeros aleatorios
        
        #include <iostream>
        #include <string>
        #include <vector>
        #include "random_vector.h"
        
        int main(int argc, char **argv) {
        
          // read variables
          if (argc != 2) {
            std::cerr << "Usage: " << argv[0] << " N" << std::endl;
            std::cerr << "N: size of the vector" << std::endl;
            return 1;
          }
          const int N = std::stoi(argv[1]);
        
          // set the vector
          std::vector<double> data;
          data.resize(N);
        
          // fill the vector
          fill_randomly(data);
        
          // print the vector
          std::string fname = "data.txt";
          print(data, fname);
        
          return 0;
        }
    
  13. Create a vector of ints, and then use std::shuffle to randomly shuffle it.

  14. Use a vector to compute the counter histogram for some data read from a file.

  15. Histogram from random numbers, saving to file. Write a program that generates N random numbers , with the weibull distribution and a given seed. Both N, the seed, and the weibull parameters must read from the command line. Also, you must compute the histogram and print it to a file.