Modern arrays#
Arrays are useful to store large sets of data that share the same data type. Arrays are very fast to access and modify data, and are useful in several cases, like
Reading and storing data.
Applying the same operation to a large collection of data.
Store data in a systematic way instead of having many variables.
Model vectors, matrices, tensors.
Model vectorial and matrix operations.
Solve large linear problems, linear algebra problems, compute eigen values and eigen vectors.
Solve ODE systems.
Apply finite differences and finite elements.
Vectorize operations and use HPC.
Model data structs, from physical systems to graphical structs and so on.
Manipulate memory contents and program close to the metal.
…
There are many more data structs or containers: lists, dictionaries(maps), queue, binary trees, etc (see https://www.cs.usfca.edu/~galles/visualization/Algorithms.html). Arrays, in particular, are very fast to access and retrieve data from memory. But they are not fast if you need to insert and delete data in the middle. In general, for scientific computing, arrays are the default data struct, but you need always to consider which one would be the best for your case.
Since c++11 it is possible to use modern data structs that are equivalent to the
primitive data arrays but give a better user interface, also handling
automatically the memory. Although there are two analog structures, std::array
and std::vector (and in the future we will use std::valarray), we will focus
mainly on the std::vector
since it handles dynamic memory (std::array
is for
static arrays). The key here is that these data structures create a continuous
block memory block that can be accessed very fast: https://hackingcpp.com/cpp/std/sequence_containers.html
The following example shows the basic features for both of them and also a new
kind of for
. It is recommended to run this on a graphical debugger , either
locally in emacs or visual code, or online on the python tutor or onlinegdb, at
the link https://onlinegdb.com/CKeYkLEiz .
#include <iostream>
#include <array>
#include <vector>
int main(void)
{
std::array<double, 10> data1;
data1[5] = 9.87;
data1[0] = -7.8765;
data1[9] = 1000.343;
std::cout << data1[5] << "\n";
for (auto val : data1) {
std::cout << val << "\n";
}
std::vector<double> data2;
data2.resize(4); // size = 4, capacity 4
data2[3] = 9.87;
data2[0] = -7.8765;
data2[1] = 1000.343;
std::cout << data2[2] << std::endl;
for (auto & val : data2) { // reference
val *= 2; // modify the value
std::cout << val << "\n";
}
data2.resize(5); // size = 5, capacity = 8 (double the initial, to avoid frequent mallocs)
data2.push_back(0.987); // size = 6, capacity = 8
data2.push_back(12343.987); // size = 7, capacity = 8
data2.push_back(-0.000987); // size = 8, capacity = 8
data2.push_back(3.986544); // size = 9, capacity = 16
return 0;
}
As you can see, a vector is a dynamc array of homogeneous contigous elements.
Initial exercises#
Find the maximum size you can use for a
std::array
and astd::vector
. Are they the same?Print the address of the first and second elements in a
std::array
andstd::vector
. Are they contigous? To print the memory address of the first element use the&
operator as&data[0]
Use the algorithm foreach with a lambda function to divide by two each element in an array. Use also for each to print the elements and their memory address.
Using std::array
: Fixed size, goes to the stack, automatic for#
// g++ -g -fsanitize=address
#include <iostream>
#include <array>
int main(void)
{
const int N = 10;
//double data[N] {0};
std::array<double, N> data;
std::cout << "size: " << data.size() << std::endl;
std::cout << "max_size: " << data.max_size() << std::endl;
std::cout << "&data[0]: " << &data[0] << std::endl;
std::cout << "&data[1]: " << &data[1] << std::endl;
//std::cout << data[-1] << std::endl; // detected by sanitizers address
// initialize the array with even numbers
for(int ii = 0; ii < N; ++ii) {
data[ii] = 2*ii;
}
// print the array
for(const auto & val : data) {
std::cout << val << " ";
}
std::cout << "\n";
// compute the mean
double sum = 0.0;
for(auto val : data) {
sum += val;
}
std::cout << "Average = " << sum/data.size() << std::endl;
return 0;
}
You might visualize this in the python/c++ tutor.
And we can simplified the code even more if we use the standard library:
// g++ -g -fsanitize=address
#include <numeric>
#include <algorithm>
#include <iostream>
#include <array>
int main(void)
{
const int N = 10;
//double data[N] {0};
std::array<double, N> data;
std::cout << "size: " << data.size() << std::endl;
std::cout << "max_size: " << data.max_size() << std::endl;
std::cout << "&data[0]: " << &data[0] << std::endl;
std::cout << "&data[1]: " << &data[1] << std::endl;
//std::cout << data[-1] << std::endl; // detected by sanitizers address
// initialize the array with even numbers
int ii = 0 ;
auto init = [&ii](double & x){ x = 2*ii; ii++; };
std::for_each(data.begin(), data.end(), init);
// print the array
auto print = [](const int & x) { std::cout << x << " " ; };
std::for_each(data.begin(), data.end(), print);
std::cout << "\n";
// compute mean
double avg = std::accumulate(data.begin(), data.end(), 0.0)/data.size();
std::cout << "Average = " << avg << std::endl;
}
Exercises#
Find the largest
std::array
size you can get in your system. Is it related to the ram size? check the commandulimit -s
Write a program that computes the dot product between two
std::arrays
. You can use either indexes or theinner_product
ortransform_reduce
algorithms.Can you read the
std::array
size from the command line usingargv
? try it.
Using std::vector
: Dynamic size, uses the heap#
Vectors allow to use dynamic size arrays very easily and are the recommend data structure for our current math vector and matrix applications. You can see more info here:
https://hackingcpp.com/cpp/std/sequence_containers.html#vector
https://www.learncpp.com/cpp-tutorial/an-introduction-to-stdvector/
https://www.learncpp.com/cpp-tutorial/stdvector-capacity-and-stack-behavior/
For now, let’s just declare a vector
// g++ -g -fsanitize=address
#include <algorithm>
#include <numeric>
#include <iostream>
#include <vector>
int main(int argc, char **argv)
{
const int N = std::atoi(argv[1]);
std::vector<double> data;
data.reserve(5); //affects capacity, can store 15 numbers but its size is still 0
data.resize(N); // real size, capacity >= N
std::cout << "size: " << data.size() << std::endl;
std::cout << "capacity: " << data.capacity() << std::endl;
std::cout << "max_size: " << data.max_size() << std::endl;
std::cout << "&data[0]: " << &data[0] << std::endl;
std::cout << "&data[1]: " << &data[1] << std::endl;
//std::cout << data[-1] << std::endl; // detected by sanitizers address
// initialize the array with even numbers
int ii = 0;
auto init = [&ii](double & x){ x = 2*ii; ii++; };
std::for_each(data.begin(), data.end(), init);
// print the array
auto print = [](const double & x){ std::cout << x << " ";};
std::for_each(data.begin(), data.end(), print);
std::cout << "\n";
// compute mean
std::cout << "Average: " << std::accumulate(data.begin(), data.end(), 0.0)/data.size() << "\n";
// Add more data!
std::cout << "size: " << data.size() << ", capacity: " << data.capacity() << std::endl;
for (auto x : {200.0987, 300.987, 400.09754}){
data.push_back(x);
std::cout << "size: " << data.size() << ", capacity: " << data.capacity() << std::endl;
}
std::for_each(data.begin(), data.end(), print);
std::cout << "\n";
return 0;
}
Exercises#
Always try to use the STL, and also use functions to implement the following:
Compute the norm of a vector.
Implement the Gauss 7 point method, storing the weights and the coordinates points in a vector.
Compute the dot product between two vectors. Use
std::inner_product
.Assume that a vector represents the coefficients of a n-degree polynomial. Compute the derivative of the polynomial and save the coefficients in another vector.
data = {1 , 3, 4.5}; // 1*x^0 + 3*x^1 + 4.5*x^2 newdata = {3, 9.0}; // 0 + 3*x^0 + 2*4.5*x^1
Compute the min of a vector.
Compute the max of a vector.
Compute the argmin (index of the min) of a vector
Compute the argmax (index of the max) of a vector
Compute the pnorm
(17)#\[\begin{equation} \left(\sum_{i} x_{i}^{p}\right)^{1/p} \ . \end{equation}\]Read the contents of a large file into a vector. Compute and print the mean, the median, the 25, 50, 75 percentiles, the min value, and the max value.
Use the standard algorithm sort to sort a data array.
Fill a vector with random numbers
// llenar un vector con numeros aleatorios #include <iostream> #include <string> #include <vector> #include "random_vector.h" int main(int argc, char **argv) { // read variables if (argc != 2) { std::cerr << "Usage: " << argv[0] << " N" << std::endl; std::cerr << "N: size of the vector" << std::endl; return 1; } const int N = std::stoi(argv[1]); // set the vector std::vector<double> data; data.resize(N); // fill the vector fill_randomly(data); // print the vector std::string fname = "data.txt"; print(data, fname); return 0; }
Create a vector of ints, and then use
std::shuffle
to randomly shuffle it.Use a vector to compute the counter histogram for some data read from a file.
Histogram from random numbers, saving to file. Write a program that generates N random numbers , with the weibull distribution and a given seed. Both N, the seed, and the weibull parameters must read from the command line. Also, you must compute the histogram and print it to a file.