Modern arrays

Modern arrays#

Arrays are useful to store large sets of data that share the same data type. Arrays are very fast to access and modify data, and are useful in several cases, like

Reading and storing data.
Applying the same operation to a large collection of data.
Store data in a systematic way instead of having many variables.
Model vectors, matrices, tensors.
Model vectorial and matrix operations.
Solve large linear problems, linear algebra problems, compute eigen values and eigen vectors.
Solve ODE systems.
Apply finite differences and finite elements.
Vectorize operations and use HPC.
Model data structs, from physical systems to graphical structs and so on.
Manipulate memory contents and program close to the metal.
…

There are many more data structs or containers: lists, dictionaries(maps), queue, binary trees, etc (see https://www.cs.usfca.edu/~galles/visualization/Algorithms.html). Arrays, in particular, are very fast to access and retrieve data from memory. But they are not fast if you need to insert and delete data in the middle. In general, for scientific computing, arrays are the default data struct, but you need always to consider which one would be the best for your case.

Since c++11 it is possible to use modern data structs that are equivalent to the primitive data arrays but give a better user interface, also handling automatically the memory. Although there are two analog structures, std::array and std::vector (and in the future we will use std::valarray), we will focus mainly on the std::vector since it handles dynamic memory (std::array is for static arrays). The key here is that these data structures create a continuous block memory block that can be accessed very fast: https://hackingcpp.com/cpp/std/sequence_containers.html

From: "https://hackingcpp.com/cpp/std/vector.png"

The following example shows the basic features for both of them and also a new kind of for. It is recommended to run this on a graphical debugger , either locally in emacs or visual code, or online on the python tutor or onlinegdb, at the link https://onlinegdb.com/CKeYkLEiz .

#include <iostream>
#include <array>
#include <vector>

int main(void)
{
  std::array<double, 10> data1;
  data1[5] = 9.87;
  data1[0] = -7.8765;
  data1[9] = 1000.343;
  std::cout << data1[5] << "\n";
  for (auto val : data1) {
    std::cout << val << "\n";
  }

  std::vector<double> data2;
  data2.resize(4); // size = 4, capacity 4
  data2[3] = 9.87;
  data2[0] = -7.8765;
  data2[1] = 1000.343;
  std::cout << data2[2] << std::endl;
  for (auto & val : data2) { // reference
    val *= 2; // modify the value
    std::cout << val << "\n";
  }
  data2.resize(5); // size = 5, capacity = 8 (double the initial, to avoid frequent mallocs)
  data2.push_back(0.987); // size = 6, capacity = 8
  data2.push_back(12343.987); // size = 7, capacity = 8
  data2.push_back(-0.000987); // size = 8, capacity = 8
  data2.push_back(3.986544); // size = 9, capacity = 16

 return 0;
}

As you can see, a vector is a dynamc array of homogeneous contigous elements.

Initial exercises#

Find the maximum size you can use for a std::array and a std::vector . Are they the same?
Print the address of the first and second elements in a std::array and std::vector. Are they contigous? To print the memory address of the first element use the & operator as &data[0]
Use the algorithm for_each with a lambda function to divide by two each element in an array. Use also for each to print the elements and their memory address.

Using `std::array` : Fixed size, goes to the stack, automatic for#

// g++ -g -fsanitize=address
#include <iostream>
#include <array>

int main(void)
{
  const int N = 10;
  //double data[N] {0};
  std::array<double, N> data;

  std::cout << "size: " << data.size() << std::endl;
  std::cout << "max_size: " << data.max_size() << std::endl;
  std::cout << "&data[0]: " << &data[0] << std::endl;
  std::cout << "&data[1]: " << &data[1] << std::endl;

  //std::cout << data[-1] << std::endl; // detected by sanitizers address

// initialize the array with even numbers
  for(int ii = 0; ii < N; ++ii) {
    data[ii] = 2*ii;
  }


// print the array
  for(const auto & val : data) {
    std::cout << val << "  ";
  }
  std::cout << "\n";    

// TODO: compute the mean
  std::cout << "Average = " << sum/data.size() << std::endl;

  return 0;
}

You might visualize this in the python/c++ tutor.

And we can simplified the code even more if we use the standard library:

// g++ -g -fsanitize=address
#include <numeric>
#include <algorithm>
#include <iostream>
#include <array>

int main(void)
{
  const int N = 10;
  //double data[N] {0};
  std::array<double, N> data;

  std::cout << "size: " << data.size() << std::endl;
  std::cout << "max_size: " << data.max_size() << std::endl;
  std::cout << "&data[0]: " << &data[0] << std::endl;
  std::cout << "&data[1]: " << &data[1] << std::endl;

  //std::cout << data[-1] << std::endl; // detected by sanitizers address

// initialize the array with even numbers
  int ii = 0 ;
  auto init  = [&ii](double & x){ x = 2*ii; ii++; };
  std::for_each(data.begin(), data.end(), init);

// print the array
  auto print = [](const int & x) { std::cout << x << "  " ; };
  std::for_each(data.begin(), data.end(), print);
  std::cout << "\n";

// compute mean
  double avg = std::accumulate(data.begin(), data.end(), 0.0)/data.size();
  std::cout << "Average = " << avg << std::endl;
}

Exercises#

Find the largest std::array size you can get in your system. Is it related to the ram size? check the command
```
ulimit -s
```
Write a program that computes the dot product between two std::arrays. You can use either indexes or the inner_product or transform_reduce algorithms.
Can you read the std::array size from the command line using argv? try it.

Using `std::vector` : Dynamic size, uses the heap#

Vectors allow to use dynamic size arrays very easily and are the recommend data structure for our current math vector and matrix applications. You can see more info here:

To declare a vector of double, you just use

...
std::vector<double> data1(1000); // 1000 doubles
std::vector<double> data2; // no size yet
data2.resize(1200); // reserve space to store 1200 double
std::vector<double> data3 {1, 2, 3}; // 3 ints, type deduced

Exploration : Declaration and operation on a vector#

Declare a vector of double whose size is read from the command line as an argument. Fill it as

(26)#\[\begin{equation} x[i] = i \end{equation}\]

Compute the average.

Exploration: Vectors and functions#

Same as before, but write a function to fill the vector and another to compute the vector average. Print the average in scientific notation. Test it with several sizes.

Exploration: Vectors and lambda functions#

Same as before, but use a lambda function to fill the vector.

Exploration: Vectors and `std` algorithms#

Same as before, but use lambda functions and standar algorithms to do the same

// g++ -g -fsanitize=address
#include <algorithm>
#include <numeric>
#include <iostream>
#include <vector>

int main(int argc, char **argv)
{
// TODO
  return 0;
    
}

A short discussion about primitive arrays#

Primitive arrays in C++ are fixed-size collections of elements of the same type stored in contiguous memory locations. They can be declared as

// Declaration and initialization
int numbers[5] = {1, 2, 3, 4, 5};
char name[6] = {'C', 'l', 'a', 'u', 'd', 'e'};

// Access elements using index
int third_number = numbers[2]; // Gets 3 (zero-indexed)

Characteristics of Primitive Arrays:

Fixed size: Size must be known at compile time
Stack allocation: Usually stored on the stack (unless global)
No bounds checking: C++ doesn’t prevent accessing invalid indices
No built-in size information: Size must be tracked separately
Cannot be returned directly from functions (decays to pointer)

Exercise Declare a primitive array of doubles of large size. What is the maximum size that you can get?

Primitive arrays with dynamic memory#

Dynamic arrays allow you to determine array size at runtime using pointers and heap memory allocation. See https://hackingcpp.com/cpp/lang/memory_basics.html. In this case, you must use a pointer, which is a variable that stores a memory address, and then ask the OS for memory (and release it), all manually.

// Allocate memory for 10 integers
int * dynamic_array = new int[10];
// todo: check if memory was actually allocated

// Use the array
dynamic_array[0] = 42;
dynamic_array[1] = 73;

// IMPORTANT: Must deallocate manually
delete[] dynamic_array;

Characteristics of Dynamic Arrays:

Runtime size determination: Size can be decided during execution
Heap allocation: Memory is allocated on the heap
Manual memory management: You must remember to delete[]
Raw pointers: No built-in bounds checking or size information
Potential memory leaks: If not properly deallocated

Comparison with `std::vector`#

std::vector from the C++ Standard Library provides a more robust, safe, and flexible alternative.

#include <vector>

// Create a vector
std::vector<int> numbers = {1, 2, 3, 4, 5};

// Add elements
numbers.push_back(6);

// Get size
size_t size = numbers.size(); // Returns 6

// Access with bounds checking
int safe_access = numbers.at(2); // Throws exception for invalid index

Advantages of std::vector:

Dynamic sizing: Automatically resizes as elements are added
Automatic memory management: No manual deallocation required
Bounds checking: Option to use .at() method for safer access
Rich functionality: Includes methods for insertion, deletion, searching: push_back, pop_back, insert, erase, clear, size, empty, and more.
Compatible with algorithms: Works with the entire C++ Standard Library
Contiguous storage: Same performance characteristics as arrays
Size information: Built-in tracking of current element count
Iterator support: Can be used with standard iterators

Almost always prefer to use std::vector.

EXERCISE: Comparing primitive arrays, dynamics memory and std::vector Please write two programs. The first one, using a primitive array and dynamic memory, that starts with 5 elemments like {2, 3, 4, 5, 6}; then, print the memory address of the first element (&data[0]), and then append the number 10 at the end. Print the final array. Now write the second program doing the same but using a std::vector.

Exercises#

Always try to use the STL, and also use functions to implement the following:

Compute the norm of a vector.
Implement the Gauss 7 point method, storing the weights and the coordinates points in a vector.
Compute the dot product between two vectors. Use std::inner_product.
Assume that a vector represents the coefficients of a n-degree polynomial. Compute the derivative of the polynomial and save the coefficients in another vector.
```
    data = {1 , 3, 4.5}; // 1*x^0 + 3*x^1 + 4.5*x^2
    newdata = {3, 9.0}; // 0 + 3*x^0 + 2*4.5*x^1
```
Compute the min of a vector.
Compute the max of a vector.
Compute the argmin (index of the min) of a vector
Compute the argmax (index of the max) of a vector
Compute the pnorm

(27)#\[\begin{equation} \left(\sum_{i} x_{i}^{p}\right)^{1/p} \ . \end{equation}\]
Read the contents of a large file into a vector. Compute and print the mean, the median, the 25, 50, 75 percentiles, the min value, and the max value.
Use the standard algorithm sort to sort a data array.

Fill a vector with random numbers

    // llenar un vector con numeros aleatorios
    
    #include <iostream>
    #include <string>
    #include <vector>
    #include "random_vector.h"
    
    int main(int argc, char **argv) {
    
      // read variables
      if (argc != 2) {
        std::cerr << "Usage: " << argv[0] << " N" << std::endl;
        std::cerr << "N: size of the vector" << std::endl;
        return 1;
      }
      const int N = std::stoi(argv[1]);
    
      // set the vector
      std::vector<double> data;
      data.resize(N);
    
      // fill the vector
      fill_randomly(data);
    
      // print the vector
      std::string fname = "data.txt";
      print(data, fname);
    
      return 0;
    }

Create a vector of ints, and then use std::shuffle to randomly shuffle it.
Use a vector to compute the counter histogram for some data read from a file.
Histogram from random numbers, saving to file. Write a program that generates N random numbers, with the Weibull distribution and a given seed. Both N, the seed, and the Weibull parameters must read from the command line. Also, you must compute the histogram and print it to a file.

Modern arrays

Contents

Modern arrays#

Initial exercises#

Using std::array : Fixed size, goes to the stack, automatic for#

Exercises#

Using std::vector : Dynamic size, uses the heap#

Exploration : Declaration and operation on a vector#

Exploration: Vectors and functions#

Exploration: Vectors and lambda functions#

Exploration: Vectors and std algorithms#

A short discussion about primitive arrays#

Primitive arrays with dynamic memory#

Comparison with std::vector#

Exercises#

Using `std::array` : Fixed size, goes to the stack, automatic for#

Using `std::vector` : Dynamic size, uses the heap#

Exploration: Vectors and `std` algorithms#

Comparison with `std::vector`#