\author{Hao Wu}

\name{read.madata}
\alias{read.madata}

\title{Read Micro Array data from TAB delimited simple text file}

\description{
  This is the function to read MicroArray experiment data from a TAB 
  delimited simple text file. 
}

\section{Preparing data file}{
  Before using the package, user need to prepare the input data
  file. The data file is a TAB delimited text file. In this file, each
  row corresponding to a gene. In the columns, you can put some gene
  specific information, e.g., the Clone ID, Gene Bank ID, etc. and the
  grid location of the spot. But most importantly you need to put the
  pmt data after that. Most of the MicroArray gridding softwares
  generate one file for each slide. At this point, you need to manually
  combine them into the data file. You need to decide which data you
  want to use in analysis, e.g., mean versus median, backgroud
  subtracted or not, etc. For N-dye array, your pmt data should have N
  columns for each array.  These N columns need to
  be adjacent to each other. You can put the spot flag as a column after
  pmt data for each array. (Note that if you have flag, you will have N+1
  columns data for each array.) If you have replicates, replicated
  measurements of the same clone on the same array should appear in
  adjacent rows.   

  For example, for a 2-dye cDNA array,
  you have four slides scanned by Gene Pix and you get four
  files. First you open your favorite Spreed Sheet editor, e.g., MS
  Excel. Copy your clone ID and Cluster ID to the first 2 columns. Then
  open one of the files generated by Gene Pix, copy the grid location
  into next 4 columns (you only need to do this once because they are
  all the same for four slides). Then for all four files, copy the two
  columns of foreground median value (if you want to use it) and one
  column of flag to the file in the order of Cy5, Cy3, flag. Then select
  the whole file and row sort it according to Clone ID. Save the file as
  tab delimited text file and you are done.


  The data file must be "full", that is, all rows have to have same
  number of fields. Sometimes leading and trailing TAB in the text file
  will bring problems, depends on the operating system. So user need to
  be careful about that.
}

\section{Preparing design file}{
  Design file is another TAB delimited text file. Number of rows of this
  file equals number of arrays times N (the number of dyes).
  Number of columns of this file
  depends on the experimental design. For example, you can have
  "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a
  column named "Sample" (case sensitive) in the design file. It should be 
  integers to represent the biological individuals. Reference samples
  should have Sample number to be zero(0). Reference sample will always be
  treated as fixed factor in mixed model and it will not be involved in
  any test. You also must have "Array" and "Dye" columns in the design
  file. You must NOT have "Spot" and "Label" columns. They are reserved
  for spotting and labelling effects. 

  Note that you don't have to *USE* all factors in design file. In
  making the model object in \code{\link[maanova]{makeModel}}, the
  experimental design will be determined by the design and a formula.
  You can put all factors in design file but turn them on/off in
  formula.
}

\section{Read pre-transformed data}{
  You can use other softwares to do data transformation and read in the
  pre-transformed data into R/maanova. If that is the case, you should
  skip the log2 transformation in \code{\link[maanova]{createData}} by
  setting log.trans=FALSE. And you should do
  \code{\link[maanova]{riplot}} and \code{\link[maanova]{arrayview}}
  on the result of \code{createData}. Because \code{riplot} and
  \code{arrayview} assume the data in \code{rawdata} object is on raw
  scale.
}

\usage{
read.madata(datafile, designfile="design.txt", header=TRUE, spotflag=TRUE,
            metarow, metacol, row, col, pmt, ...)
}

\arguments{
  \item{datafile}{The data file name with path name as a string.}
  \item{designfile}{The design file name with path as a string.}
  \item{header}{A logical value indicating whether the data file
    contains the column headers.}
  \item{spotflag}{A flag to indicate whether the input file contain the
    flag for bad spot or not.}
  \item{metarow}{The column number for meta row. Default values are 1s.}
  \item{metacol}{The column number for meta column. Default values are 1s.}
  \item{row}{The column number for row. Default value is NA.}
  \item{col}{The column number for column. Default value is NA.}
  \item{pmt}{The start column number for pmt data.}
  \item{\dots}{Other gene information in the data file.}
}

\value{
  An object of class \code{rawdata}, which is a list of following
  components:
  \item{n.array}{Number of arrays in the experiment.}
  \item{n.dye}{Number of dyes.}
  \item{data}{Two channel experiment data.}
  \item{flag}{A matrix for spot flag. Each element corresponding to one
    spot. 0 means normal spot, all other values mean bad spot.}
  \item{metarow}{Meta row for each spot.}
  \item{metacol}{Meta column for each spot.}
  \item{row}{Row for each spot.}
  \item{col}{Column for each spot.}
  \item{ArrayName}{A list of strings to represent the names of the pmt
    data. There are two names per array.}
  \item{design}{An object to represent the experimental design.}
  \item{Others}{Other experiment information listed in the data file and
     specified by user.}
}

\examples{
# note that data files are not distributed with the package,
# read in a file with spot flag
\dontrun{kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt", 
        metarow=1, metacol=2, col=3, row=4, Name=5, ID=6,
        pmt=7, spotflag=TRUE)}
# read in a file without spot flag
\dontrun{rawdata <- read.madata("paigen.txt", designfile="design.txt",cloneid=1, 
        metarow=2, metacol=3, row=4, col=5, pmt=6, spotflag=FALSE)}
}

\keyword{IO}
