Istvan — Normalisation of Expression Data

Istvan is a collection of modules implementing various methods for inferring an invariant set, i.e. a set of genes with no or little difference in actual expression levels, from a set of pairs of microarray measured expression levels, and for inferring normalising functions based on a set of pairwise data, possibly an invariant set determined by one of the implemented methods. The implemented invariant set methods are

and the implemented normalising function types are

The supplied normalisation function types do not immediately support taking spatial and print tip information into account. However, as most of the invariant set methods can be made to return the invariant set or weights computed, these modules can easily be used in combination with spatial normalisation methods. For print tip dependent normalisation the full data sets just needs to be separated into a data set for each print tip. The accompanying README file provides more extensive descriptions of the functions available for invariant set and normalisation function inference, and of directives affecting the compilation.

Installation

The Istvan suite of methods has been implemented in C. To obtain Istvan first download the source code and dearchive (tar xvf istvan.tar) it. The source code files and a sample data file should now be available in the current directory. The main usage of Istvan is expected to be as a library of efficient implementations of normalisation methods that can be incorporated into general microarray analysis programs, but running make or gmake will hopefully lead to the compilation of istvan, an executable front end providing normalisation with combinations of the various methods contained in the library. Run it with option -? or -h to get a short usage description. The istvan program assumes a very simple data format: the measured intensities should be passed as a white space separated list of pairs (or pair of lists) of real numbers. However, once you have successfully compiled this front end you can also use the istvan.py Python script as an interface. This script parses data in column format, allowing genes to be grouped and/or deselected based on the content of specific columns. Run the script with option -? or -h for more extensive information.

The standard compilation of istvan uses the data type double for computing reliabilities. It is out experience that for larger data set overflows will cause this to fail. So if you intend to normalise larger data sets using reliabilities, compile istvan using the rational number data type from the GNU Multiple Precission Library instead. If you have the GMP library installed, simply compile using gmake USE_GMP=yes (after possibly modifying LOADLIBES in Makefile to refer to the right location of the GMP library). This will only affect normalisation using reliabilities, which will be slower but not suffer the risk of failure due to overflow.

References

The library is described in

A library for gene expression data normalisation, Rune Lyngs�, Istvan Miklos, Charles Sugnet, and Jotun Hein.

and a more comprehensive description of methods and results can be found in

Robust Invariant Set Methods for Normalisation, Rune Lyngs�, Istv�n Mikl�s, Charles Sugnet, Peter Underhill, Sarah Webb, and Jotun Hein.

Both are unpublished.

Supplementary Material

If you do not want to bother with installing the Istvan suite you can explore the results generated in our experiment using the modules of the library on self–self hybridisation data with simulated differential expression. This archive includes a small Python script for arranging the average values in tables in LaTeX format. Plots of data points and inferred normalising functions from one run of our experiment is also available.


Rune Lyngs�, rlyngsoe@daimi.au.dk