Map data format

Map Data Format

After map making it is more convenient to explicitly think of the data as a vector and with associated matrices. The primary module for storing, reading an writing data of this kind can be found in core.algebra. This module provides vector and matrix interfaces tailored to the needs of intensity mapping analysis.

core.algebra

At the heart of this module is the need to organize multidimensional data (such as a map which has 3 axes: ra, dec, frequency) as a 1D vector for linear algebra operations. However we do not want to lose the multidimensional organization of the data. In addition there are efficiencies to be gained by organizing data in a multidimensional way. For example, some matrices will be block diagonal in that they will not couple different frequency bins. We would like to exploit this. To address this, classes are provided that store data in numpy ndarray format but know how to reorganize the data into an matrix or vector format. Some basic matrix operations are also provided.

Also fully supported is writing the data to and from disk in a standard, portable format that can easily be read in another language. This is based on numpy's npy format. Also fully supported is memory mapping; the ability to manipulate an array stored on disk as if it was in memory. This will be very important as data sets become to large to store in memory and when operations need to be performed in parallel using SCALAPACK.

Matrix and Vector classes

The highest level objects defined in this module are the mat and the vect. Each comes in two flavours depending on whether the data is stored in memory (a numpy array) or on disk (a numpy memmap). These are named mat_array, mat_memmap, vect_array and vect_memmap. To first order, these objects are just their numpy counterparts with an added info attribute (mat.info). info is a python dictionary that hold some extra meta data. It is stored in a dictionary to facilitate writing to disk. For the most part, you never need to (nor ever should you) access the info dictionary yourself. Everything you need from it should be available from aliased attributes. For instance mat.info['axes'] can be retrieved and set through mat.axes.

mats and vects are multidimensional arrays, with each dimension given a name which can be found in the .axes attribute (tuple of strings or None). The meta data tells us how to sort our multidimensional data into 1 dimensional vectors and 2 dimensional matrices. For vects this is simple, the data is simply numpy flattened to 1D. For mats, this is quite a bit more complicated. mats have attributes 'rows' and 'cols' which are tuples of integers and tell us whether a certain dimension should be identified in the 2D matrix as varying over column or varying over row. Every dimension must appear in at least one of rows or cols. A dimension may appear in both, which means the matrix is block diagonal over that dimension. For the time being, column dimensions must be the right most dimensions, followed by the row dimensions, with diagonal dimensions the left most. There is no reason this last restriction cannot be lifted in the future, someone just has to implement it.

This is best illustrated with an example. Lets say you have a mat_array object called Mat with the following attributes:

Mat.shape (attribute inherited from numpy arrays) = (h, i, j, k, m, n), where all these letters are integers.
Mat.rows = (0, 1, 2)
Mat.cols = (0, 1, 3, 4, 5)
Mat.axes = ('a', 'b', 'c', 'd', 'e', 'f')

Mat is a (hij) by (hikmn) matrix. If we subscript Mat[3,5,2,7,6,1] we get the element of the matrix corresponding to row (3ij + 5j + 2) and column (3ikmn + 5kmn + 7mn + 6n + 1). The matrix is block diagonal with block sizes j by (km*n).

Eventually, I intend to add support for reordering the axes, so that for instance the slowly varying row axis of a matrix can be swapped with the fastest varying axis. This would correspond to rotating the matrix to a certain frame.

As an example of how this might be useful, here is an example. Consider an matrix 'M' with 3 row dimensions named ('a', 'b', 'c') and 2 column dimensions ('x', 'y'). Now consider a vector 'V' with 2 dimensions ('x', 'y') (with axis 'x' the same length of M axis 'x', etc.). What should the structure of the vector W = M*V? Obviously W should have three dimensions named ('a', 'b', 'c'). This is implemented in the 'dot' function of this module.