Istvan is a collection of modules implementing various methods for inferring an invariant set, i.e. a set of genes with no or little difference in actual expression levels, from a set of pairs of microarray measured expression levels, and for inferring normalising functions based on a set of pairwise data, possibly an invariant set determined by one of the implemented methods. The implemented invariant set methods are
and the implemented normalising function types are
The supplied normalisation function types do not immediately support taking spatial and print tip information into account. However, as most of the invariant set methods can be made to return the invariant set or weights computed, these modules can easily be used in combination with spatial normalisation methods. For print tip dependent normalisation the full data sets just needs to be separated into a data set for each print tip. The accompanying README file provides more extensive descriptions of the functions available for invariant set and normalisation function inference, and of directives affecting the compilation.
The Istvan suite of methods has been implemented in
C. To obtain Istvan first download
the source code and dearchive
(tar xvf istvan.tar) it. The source code
files and a sample data file should now be available in the
current directory. The main usage of Istvan is expected to be as
a library of efficient implementations of normalisation methods
that can be incorporated into general microarray analysis
programs, but running make or gmake will
hopefully lead to the compilation of
istvan
, an executable front end providing
normalisation with combinations of the various methods
contained in the library. Run it with option -?
or -h to get a short usage description. The
istvan
program assumes a very simple data format:
the measured intensities should be passed as a white space
separated list of pairs (or pair of lists) of real
numbers. However, once you have successfully compiled this front
end you can also use the istvan.py
Python script as an
interface. This script parses data in column format, allowing
genes to be grouped and/or deselected based on the content of
specific columns. Run the script with option -? or
-h for more extensive information.
The standard compilation of istvan
uses the data
type double
for computing reliabilities. It is out
experience that for larger data set overflows will cause this to
fail. So if you intend to normalise larger data sets using
reliabilities, compile istvan
using the rational
number data type from the GNU
Multiple Precission Library instead. If you have the GMP
library installed, simply compile using
gmake USE_GMP=yes (after possibly modifying
LOADLIBES
in Makefile
to refer to the
right location of the GMP library). This will only affect
normalisation using reliabilities, which will be slower but not
suffer the risk of failure due to overflow.
The library is described in
A library for gene expression data normalisation, Rune Lyngs�, Istvan Miklos, Charles Sugnet, and Jotun Hein.
and a more comprehensive description of methods and results can be found in
Robust Invariant Set Methods for Normalisation, Rune Lyngs�, Istv�n Mikl�s, Charles Sugnet, Peter Underhill, Sarah Webb, and Jotun Hein.
Both are unpublished.
If you do not want to bother with installing the Istvan suite you can explore the results generated in our experiment using the modules of the library on self–self hybridisation data with simulated differential expression. This archive includes a small Python script for arranging the average values in tables in LaTeX format. Plots of data points and inferred normalising functions from one run of our experiment is also available.