==============================================================================
=                                                                            =
=                              SPRINT USER GUIDE                             =
=                                                                            =
==============================================================================

Latest release SPRINT Beta 0.3.0 - 20.05.2011
Previous release SPRINT Beta 0.2.0 - 22.06.2010

-------
Content
-------

    1.  Introduction
    2.  Requirements
        a. Notes 
    3.  Installing SPRINT
        a. Notes
        b. Testing the Installation
    4.  Using SPRINT
        a. pcor()
        b. pmaxT()
        c. ppam()
        d. Performances
    5.  Troubleshooting     
    6.  Configuration of systems used for testing SPRINT
        a. Internal Cluster
        b. Local Cluster
        c. National Cluster

---------------
1. Introduction
---------------

SPRINT (Simple Parallel R INTerface) is a parallel framework for R. It 
provides a High Performance Computing (HPC) harness which allow R scripts to 
run on HPC clusters. SPRINT contains a library of selected R functions that 
have been parallelized. Functions are named after the original R function with
the added prefix 'p', i.e. the parallel version of cor() in SPRINT is called 
pcor(). Call to the parallel R functions are included directly in standard R 
scripts.


---------------
2. Requirements
---------------

Multi-core or HPC platform running:
 * R - latest version, SPRINT was tested using 2.12.1, 2.10.1, 2.10.0 and 
       2.9.2.
 * MPI2 - any version will full MPI-2 support will work, although there are
          complications when using non-gcc compilers. MPICH2 is preferred.         
 * Unix/Linux - SPRINT is designed for HPC systems and these all run on
                Unix/Linux.
                
a. Notes
--------
  
 * ff package is also needed but only when using the pcor() function. SPRINT 
   was tested using version 2.1-1. ff is available from CRAN 
  (http://cran.r-project.org/web/packages/ff/index.html).
 
 * There are known issues with open MPI parallel file IO support. SPRINT 
   pcor() function uses MPI-IO and tests carried on an installation with open 
   MPI 1.3.2 on IBM General Parallel File System (GPFS) have shown 
   inconsistent results. Sometimes wrong results were written out to file by 
   one or more processes when using pcor() on more than one node. The SPRINT 
   team therefore strongly recommends NOT to use open MPI 1.3.2.  

 * User access to HPC platforms will vary from service to service. The 
   installation of software is likely to be limited to system administrators.
   Therefore, help from your system administrator may be required to ensure
   that the required environment is set up on your HPC system. Running jobs is 
   often only allowed through a batch queue system rather interactively. In
   such case, R scripts using SPRINT will need to be submitted to the batch 
   queue using the appropriate utility specific to your HPC system (i.e. 
   mpiexec, qsub). 
   
   
--------------------
3. Installing SPRINT
--------------------

SPRINT is distributed as a single compressed tar file. All source files are
included in one single directory called 'sprint'.

The installation process is carried out with a single command line from the
directory containing SPRINT:

> R CMD INSTALL sprint

a. Notes
--------

 * If you are using a central installation of R and do not have the root  
   privileges to install libraries there, you can install the sprint library 
   locally by setting $R_LIBS accordingly.

   > R CMD INSTALL -l $R_LIBS sprint 

 * R tests if the installed package can be loaded during the installation. 
   SPRINT requires MPI to run and if you try to install it without MPI then 
   the installation will fail. If you are installing the SPRINT library on a 
   cluster where MPI is only installed on the back-end nodes but not on the 
   front-end nodes then you may need to use the "--no-test-load" flag during 
   the installation process. 
 
   > R CMD INSTALL --no-test-load sprint
   
 * optional arguments to the installation command:

   --configure-args="--with-wrapper-script=$WRAPPER_SCRIPT"

   where:

   WRAPPER_SCRIPT contains the compiler to be used for building SPRINT, e.g.
   "mpicc".
   
   The configure script automatically identifies the appropriate compiler for
   building SPRINT. This option should only be used if the script fails to 
   locate the MPI compiler.  
   
b. Testing the Installation
---------------------------

  * The SPRINT library includes a function to test the installation called
    ptest(). It simply prints a message identifying each processor in the 
    compute cluster. For example, when using SPRINT with 5 processors and you
    will get the following output:
             
    [1] "HELLO, FROM PROCESSOR: 0" "HELLO, FROM PROCESSOR: 2"
    [3] "HELLO, FROM PROCESSOR: 1" "HELLO, FROM PROCESSOR: 3"
    [5] "HELLO, FROM PROCESSOR: 4" 

    This is obtained by running the following sample R script, install_test.R:

        library("sprint")
        ptest()
        pterminate()
        quit()

    > mpiexec -n 5 R -f install_test.R


---------------
4. Using SPRINT
---------------

To make use of SPRINT it is necessary to include the library first. Within 
your R script use 'library("sprint")'. Then include calls to the SPRINT 
functions you wish to use. Note that it is likely that you will have to use an
R script, rather than simply typing in the commands into R. This is due to 
most High Performance Computing (HPC) machines executing codes on a 'back-end'
rather than interactively. It is worth testing your script on a desktop first 
with the unparallelized versions, before running it on a HPC machine. Finally,
all SPRINT enabled scripts require that pterminate() is called before the 
final quit() command. This calls MPI_FINALIZE and shuts down SPRINT.

For example, a simple R script which calls one single function called ptest()
will look like this:

    library("sprint")
    ptest()
    pterminate()
    quit()

This only gives access to SPRINT within R, it will not give you multiple
processors. You will need to run MPI to do this. How this is done dependents
on your system set-up. You will have to specified the location of the script 
name and the number of processors to be used.

For example, this command will run the install_test.R script on 5
processors.

    > mpiexec -n 5 R -f install_test.R

The available functions in SPRINT are:
 - ppam(): a clustering function (partioning around medoids)
 - pmaxT(): a permutation test 
 - pcor(): a parallel Pearson's correlation
 - ptest(): a simple test function to test SPRINT

a. pcor()
---------

Pcor() carried out a parallel Pearson's correlation. It either takes a 2D 
array as input and correlates each row with every other row or takes two 2D
arrays and correlates the columns of the first matrix with the columns of the 
second matrix. The output can either be the matrix of correlation coefficient 
or the distance matrix.

To use pcor():

    pcor(data_x, data_x, distance = FALSE, caching_ = "mmeachflush", 
         filename_ = NULL)

where:

    - 'data_x' is the input matrix data,
    - 'data_y' is the second input matrix with compatible dimensions to 
      data_x,
    - 'distance' is a boolean indicating whether the output is to be a 
      distance matrix rather than the correlation coefficient matrix,
    - 'caching_' caching scheme for the backend, currently "mmnoflush" or
      "mmeachflush" (flush mmpages at each swap) if no name is specified
      the default value is "mmeachflush",
    - 'filename' is a string and is optional. It specifies the name of
      a file where the results will be saved. By default, the results are
      saved to a temporary file that is delete after exiting from SPRINT.

Examples of valid calls to pcor are:

    - ff_obj <- pcor(t(inData))
    - ff_obj <- pcor(t(inData_x), t(inData_y))
    - ff_obj <- pcor(t(inData), filename_="output.dat") 
    - ff_obj <- pcor(data, caching_="mmeachflush", filename_="output.dat")
    - ff_obj <- pcor(t(inData), distance=TRUE, filename_="output.dat")
    
The first four are parallel equivalent to the call the sequential cor():

    results <- cor(t(inData))
    
The last one also implements a parallel equivalent to cor() but returns a 
different output that is the distance matrix.   

b. pmaxT()
----------

PmaxT() implements a parallel version of the mt.maxT function from the 
multtest package 
(http://www.bioconductor.org/packages/release/bioc/html/multtest.html). It 
computes the adjusted p-values for step-down multiple testing procedures. 

To use pmaxT():

    pmaxT(X, classlabel, test = "t", side = "abs", B = 10000, 
          na = .mt.naNUM, fixed.seed.sampling = "y", nonpara = "n") 
where:

    - 'X' is the input data array,
    - 'classlabel' is the class labels of the columns of the input dataset,
    - 'test' is the statistical method used for testing the null hypothesis. 
      The following six methods are supported:
        . t: Tests based on a two-sample Welch t-statistics 
             (unequal variances)
        . t.equalvar: tests based on a two-sample t-statistics with equal 
                      variance for the two samples.
        . Wilcoxon: Tests based on standardized rank sum Wilcoxon statistics.
        . F: Tests based on F-statistics.
        . Pair-T: Tests based on paired t-statistics.
        . Block-F: Tests based on F-statistics which adjust for block 
                   differences.
    - 'side' is the type of rejection region. The following values are 
      available:
        . "abs" for absolute difference
        . "upper" for the maximum difference
        . "lower" for the minimum difference
    - 'B' is the number of permutations. If set to "0" then the complete 
       permutations of the data will be computed. 
    - 'na' is the representation used for missing values. Missing values are
      excluded from all computations.    
    - 'fixed.seed.sampling' can either be: 
        . "y" to compute the permutations on the fly
        . "n" to save all permutations in memory prior to computations
    - 'nonpara' can either be:
        . "y" for non-parametric test statistics
        . "n" otherwise.
                                
The interface and parameters to the parallel pmaxt() are identical to those 
for the sequential mt.maxt():

    mt.maxT(X, classlabel, test = "t", side = "abs", B = 10000, 
            na = .mt.naNUM, fixed.seed.sampling = "y", nonpara = "n") 

c. ppam()
----------

Ppam() is a clustering function that performs a Parallel Partitioning Around 
Medoids and is based on the pam() function from the cluster R package 
(http://cran.r-project.org/web/packages/cluster/index.html). 
   
The interface and parameters to parallel function ppam() are similar to the 
serial function pam() but not identical. Ppam() requires a distance matrix as 
input parameters. Although, ppam() does not include the option to calculate 
the distance matrix, this can easily be done using SPRINT pcor() function 
with the 'distance' parameter set to TRUE. 

To use ppam():

    ppam (x, k, medoids = NULL, is_dist = inherits(x, "dist"),
          cluster.only = FALSE, do.swap = TRUE, trace.lev = 0)

where:

    - 'x' is the input distance matrix or dissimilarity matrix, depending on 
      the value of the "dist" argument. This can either be a matrix or an ff 
      object,
    - 'k' is a positive integer indicating the number of clusters. It must be 
      less than the number of observations,
    - 'medoids' is either a vector specifying the initial 'k' medoids or the 
      default value NULL which indicates that the initial medoids will be 
      selected by the algorithm, 
    - 'is_dist' is a boolean indicating whether the input matrix is a distance
      or dissimilarity matrix (TRUE) or a symmetric matrix (FALSE),
    - 'cluster.only' is a boolean when set to TRUE only the clustering will be
       computed and returned. The default value is FALSE,
    - 'do.swap' is a boolean indicating if the swap phase of the algorithm 
      should take place. The default is TRUE. The swap phase is computer 
      intensive and can be skipped by setting the 'do.swap' option to FALSE,
    - 'trace.lev' is an integer specifying the trace level for printing
      diagnostics during the build and swap phases of the algorithm. The 
      default value is 0 which does not produce any output. Increasing values 
      print increasing level of detailed information.

Examples of valid calls to ppam():

    # Pre-processing step using pcor() to return an ff object containing a
    # distance matrix. 
    mcor <- pcor(matrix(rnorm(1:10000), ncol=100), distance = TRUE)
    
    p1m <- ppam(mcor, 4)
    p2m <- ppam(mcor, 4, medoids = c(1,16))
    p3m <- ppam(mcor, 3, trace = 2)
    p4m <- ppam(dist(x), 12)
              
d. Performances
---------------
SPRINT parallel functions run on multiple processors reducing the time taken 
for the calculation to complete. Note, that the speed-up depends heavily on 
the size of the data set being analyzed. A small data set will show no 
speed-up on 3 or more processors. However, tests on larger data sets have 
shown an almost perfect scaling for up to 512 cores.

------------------------------------------------
 5. Troubleshooting
------------------------------------------------

Know issues in open MPI result in unreliable results when running pcor() on 
more than one node, see section 2.a. Sometimes the result matrix will be 
wrong. The symptoms for this issue are entire columns of zero (0) values and
data shifted towards the right, especially the expected diagonal line of one 
(1) values.  

 
------------------------------------------------
 6. Configuration of Systems used to test SPRINT
------------------------------------------------

SPRINT beta 0.3.0 has been developed and tested on an internal cluster, Ness
(http://www.epcc.ed.ac.uk/facilities/ness/), a local cluster, ECDF
(http://www.ecdf.ed.ac.uk/) and on the UK national supercomputing service, 
HECToR (http://www.hector.ac.uk/).

The setup and version details are listed below:

a. Internal Cluster
-------------------

Tested and working using MPICH2 1.0.7, gcc 4.1.2 and R 2.12.1

[user@ness ~]$ uname -a
Linux ness.epcc.ed.ac.uk 2.6.18-194.32.1.el5 #1 SMP Tue Jan 4 12:47:36 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[mewissen@ness ~]$ which R
~/R-HOME/R-2.12.1/bin/R
[user@ness ~]$ uname -a
Linux ness.epcc.ed.ac.uk 2.6.18-194.32.1.el5 #1 SMP Tue Jan 4 12:47:36 EST 2011 x86_64 x86_64 x86_64 GNU/Linux
[user@ness ~]$ mpicc -v
mpicc for 1.0.7
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
[user@ness ~]$ R

R version 2.12.1 (2010-12-16)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

b. Local Cluster
----------------

Tested and working using gcc 4.1.2 and LAM R 2.10.1

Note: pcor() produces unreliable results in this configuration due to known
      issues in open MPI, see section 2.a. 

[mmewisse@frontend02 ~]$ uname -a
Linux frontend02 2.6.18-164.15.1.el5 #1 SMP Tue Mar 16 18:44:51 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[mmewisse@frontend02 ~]$ mpicc -v
Using built-in specs.
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-libgcj-multifile --enable-languages=c,c++,objc,obj-c++,java,fortran,ada --enable-java-awt=gtk --disable-dssi --enable-plugin --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre --with-cpu=generic --host=x86_64-redhat-linux
Thread model: posix
gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
[mmewisse@frontend02 ~]$ R
WARNING: ignoring environment value of R_HOME

R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

c. National Cluster
-------------------

Tested and working using gcc/4.4.4 with xt-mpt/5.0.1, PrgEnv-gnu/2.2.48B and
R 2.12.0.

hector@nid00015:~> uname -a
Linux nid00015 2.6.16.60-0.39_1.0102.4784.2.2.48B-ss #1 SMP Thu Sep 16 15:50:08 CDT 2010 x86_64 x86_64 x86_64 GNU/Linux
hector@nid00015:~> gcc --version
gcc (GCC) 4.4.4 20100429 (Cray Inc.)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
hector@nid00015:~> R

R version 2.12.1 (2010-12-16)
Copyright (C) 2010 The R Foundation for Statistical Computing
ISBN 3-900051-07-0


===============================================================================
SPRINT Team

email: sprint@ed.ac.uk
http://www.r-sprint.org

Copyright  2008,2009,2010,2011 The University of Edinburgh.
===============================================================================
