Skip to content
Daniel edited this page May 18, 2017 · 3 revisions

Input Data Format

There are three components to the input data format:

  • A specification of the 3D grid used to sample the correlation function
  • A correlation function data vector specified on the grid
  • An estimate of the data vector covariance

The 3D grid is specified via runtime configuration options that are normally saved in a file but can also be provided on the command line (see also the custom-grid option described below). The data vector and covariance matrix are stored in files using one of the following extensions:

Extension Meaning
.data Unweighted data vector (default)
.wdata Inverse covariance weighted data vector (use --load-wdata option)
.cov Covariance matrix (default)
.icov Inverse covariance matrix (use --load-icov)

The data and covariance can be sparse, and can also be split into subsamples for resampling methods.

The grid used to sample the correlation function consists of 3 axes:

  • Separation along the line of sight, e.g., r∥, Δv, log(λ1/λ2).
  • Separation transverse to the line of sight, e.g., r⊥, μ, Δθ, multipole index.
  • Average pair distance, e.g., z, D(z).

The following combinations are currently supported via the data-format command-line/config option, but other conventions are easy to add:

data-format Axis 1 Axis 2 Axis 3 Notes
comoving-cartesian r(par) in Mpc/h r(perp) in Mpc/h redshift use --axis(1,2,3)-bins options
comoving-polar r in Mpc/h mu = r(par)/r redshift use --axis(1,2,3)-bins options
comoving-multipole r in Mpc/h multipole ell redshift use --axis(1,2,3)-bins options
quasar Δlog(λ) Δθ in arcmins redshift use --axis(1,2,3)-bins options
cosmolib Δlog(λ) Δθ in arcmins redshift use cosmolib options (BOSS legacy mode)

The values along each axes do not need to be equally spaced and are specified via command-line or configuration file options, for example:

# use 50 equally spaced bins for the first axis, covering 0-200
axis1-bins = [0:200]*50
# use the specified bin centers for the second axis
axis2-bins = {0.1,0.3,0.4,0.45,0.5,0.6,0.8}
# use a single bin for the third axis, with the specified bin center
axis3-bins = {2.35}

Read the documentation for createBinning in AbsBinning.h for details.

The grid must be rectangular but can be sparsely populated. Any 3D grid point (i1,i2,i3) is uniquely specified by its global index j:

0 <= i1 < N1 , 0 <= i2 < N2 , 0 <= i3 < N3
j = (i1*N2+i2)*N3+i3

Optionally, a 3D custom grid of non-uniform sampling points is supported via the custom-grid option and read from a text input file with the extension ".grid" consisting of columns (global index, axis1 bin center, axis2 bin center, axis3 bin center):

j1 axis1bin(j1) axis2bin(j1) axis3bin(j1)
j2 axis1bin(j2) axis2bin(j2) axis3bin(j2)
j3 axis1bin(j3) axis2bin(j3) axis3bin(j3)
...

Entries can appear in any order and the custom grid does not need to be rectangular. A default rectangular grid defining the data format always has to be specified (but will not be used explicitly).

The data vector is specified by a text input file using the ".data" extension and consisting of (global index, correlation estimate) pairs:

j1 xi(j1)
j2 xi(j2)
j3 xi(j3)
...

Entries can appear in any order and a missing entry implies that there is no information available (rather than zero correlation). Values of Cinv.xi can be provided instead of xi (this is flagged via a command-line option).

The covariance matrix is specified by a separate text input file using the ".cov" or ".icov" extension and consisting of triplets (global index 1, global index 2, covariance estimate):

j1 j2 cov(j1,j2)
j1 j3 cov(j1,j3)
j4 j5 cov(j4,j5)
...

Entries can appear in any order and a missing entry implies that the corresponding covariance is zero. Duplicate entries (j1,j2) and (j2,j1) only need to be specified once. It is an error for the covariance to to refer to global indices that are not present in the data vector. Values of inverse covariance can be provided instead of covariance (in which case files should use the ".icov" extension and the "load-icov" option should be included on the command line or in the config file).

It is often useful to divide input data into independent datasets (e.g., based on subregions of the sky) for the purposes of bootstrapping, etc. In this case, an additional "plate list" file is used to specify the dataset file names (without the ".data" or ".cov" / ".icov" extensions), with one file name per line. Multiple datasets must all use the same binning, including the set of unused bins (if any). In practice, small variations between datasets in which bins have valid correlation function estimates can be accommodated in two ways: either take the intersection of all valid bins (dropping bins which are only sometimes invalid) or else use the union and assign a large error to invalid bins. In practice, it is more efficient to provide ".icov" files than ".cov" files when using many observations.

As an analysis option, a distortion matrix describing a correction to the continuum fitting broadband distortion may be provided as an additional data product. The distortion matrix is specified by a separate text input file using the ".dmat" extension, with a file path that may be independent of the compulsory data components, and consisting of triplets (global index 1, global index 2, distortion estimate):

j1 j1 distortion(j1,j1)
j1 j2 distortion(j1,j2)
j1 j3 distortion(j1,j3)
...

Entries have to appear in order and a missing entry implies that the corresponding distortion is zero.

Clone this wiki locally