Bioconductor Changelog
   


About
Click here to learn more about this blog.


Packages
Click on a link
for updates to that package only.


Subscribe
Subscribe to the Bioconductor Changlog.


       
Thu, 17 Jul 2008

Rtreemix
adapted to the new version of the package graph and updated documentation

[/Rtreemix] permanent link

Wed, 30 Apr 2008

Rtreemix
Updating the version number for the trunk post-BioC 2.2 release.
Updating the version number for the BioC 2.2 release.

[/Rtreemix] permanent link

Fri, 25 Apr 2008

Rtreemix
update of ExtendedVignette.pdf
updated bib file for Rtreemix.Rnw
updated bib file for Rtreemix.Rnw
updated Rtreemix.Rnw
Rplots file deleted
updated man files

[/Rtreemix] permanent link

Thu, 10 Apr 2008

Rtreemix
Fixed man files to allow creation of Rtreemix-manual.pdf document.

[/Rtreemix] permanent link

Sat, 29 Mar 2008

Rtreemix
\name{confIntGPS-methods}
\docType{methods}
\alias{confIntGPS}
\alias{confIntGPS-methods}
\alias{confIntGPS,RtreemixData,numeric-method}
\title{Method for calculating GPS values and their 95\% bootstrap
confidence intervals}
\description{
The method first calculates the genetic progression score (GPS) for the
patterns in a given dataset \code{data} based on a fitted mutagenetic trees
mixture model with \code{K} components. The \code{data} and \code{K}
have to be specified. Then, it derives a 95\% confidence intervals for
the GPS values with bootstrap analysis.
}
\usage{
\S4method{confIntGPS}{RtreemixData,numeric}(data, K, ...)
}
\arguments{
\item{data}{An \code{RtreemixData} object containing the samples
(patterns of genetic events) for which the GPS values and their
bootstrap confidence intervals are to be calculated. The number
of genetic events should NOT be greater than 20.}
\item{K}{An \code{integer} larger than 0 specifying the number of
branchings in the mixture model.}
\item{...}{
\code{sampling.mode} is a \code{character} that specifies the
sampling mode ("constant" or "exponential") used in the waiting time
simulations. Its default value is "exponential".
\code{sampling.param} is a \code{numeric} that specifies the
sampling parameter corresponding to the sampling mode given by
\code{sampling.mode}. Its default value is 1.
\code{no.sim} is an \code{integer} larger than 0 giving the number of
iterations for the waiting time simulation. Its default values is
10000.
\code{B} is an \code{integer} larger than 0 specifying the number of
bootstrap samples used in the bootstrap analysis. Its default value
is 1000.
\code{equal.star} is a \code{logical} specifying whether to use
equal edge weights in the noise component. The default value is
\code{TRUE}.
}
}
\value{
The function returns an object from the \code{RtreemixGPS} class that
containes the calculated GPS values, their 95\% confidence intervals,
the model used for the computation, the data, and so on (see
\code{\link{RtreemixGPS-class}}). The GPS values are represented as a
\code{numeric} vector with length equal to the number of samples in
\code{data}. Their corresponding confidence intervals are given in a
matrix with two columns.
}
\note{
The data for which the GPS values and their corresponding
confidence intervals are to be calculated should not have more
than 20 genetic events. The reason for this is that the number of all possible patterns
for which the GPS values are calculated during a computationally intensive simulations
is in this case 2^20. This demands too much memory.
The GPS examples are time consuming. They are commented out because of the time restrictions of the check of the package.
For trying out the code please copy it and uncomment it.
}
\author{Jasmina Bogojeska }
\seealso{
\code{\link{RtreemixGPS-class}}, \code{\link{gps-methods}},
\code{\link{RtreemixData-class}}, \code{\link{RtreemixModel-class}},
\code{\link{fit-methods}}
}
\examples{
}
\keyword{methods}
\name{gps-methods}
\docType{methods}
\alias{gps}
\alias{gps-methods}
\alias{gps,RtreemixModel,RtreemixData-method}
\alias{gps,RtreemixModel,matrix-method}
\alias{gps,RtreemixModel,missing-method}
\title{Methods for predicting the GPS of given dataset by
using a given mutagenetic trees mixture model}
\description{
These functions compute the genetic progression score (GPS) of each
sample in the given \code{data} by performing a waiting time
simulation along the branchings of the mixture model \code{model}. The
model has to be specified. If a dataset is missing a GPS for all
possible patterns is calculated. The number of events of the samples
in \code{data} equals the number of genetic events in the \code{model}.
}
\usage{
\S4method{gps}{RtreemixModel,RtreemixData}(model, data, ...)
\S4method{gps}{RtreemixModel,matrix}(model, data, ...)
\S4method{gps}{RtreemixModel,missing}(model, data, ...)
}
\section{Methods}{\describe{
\item{model = "RtreemixModel", data = "RtreemixData", ...}{A method for calculating
the GPS values of the data given as \code{RtreemixData} object.}
\item{model = "RtreemixModel", data = "matrix", ...}{A method for calculating
the GPS values of the data given as 0-1 \code{matrix}.}
\item{model = "RtreemixModel", data = "missing", ...}{A method for calculating
the GPS values of the set of all possible patterns.}
}}
\arguments{
\item{model}{An object of the class \code{RtreemixModel} specifying
the mutagenetic trees mixture model used for deriving the GPS values.
The model should NOT have more than 20 genetic events.}
\item{data}{An \code{RtreemixData} object or a 0-1 \code{matrix}
containing the samples (patterns of genetic events) for which the GPS values
are to be calculated. The length of each of them has to be equal
to the number of genetic events in the \code{model}.}
\item{...}{
\code{sampling.mode} is a \code{character} that specifies the
sampling mode ("constant" or "exponential") used in the waiting time
simulations. Its default value is "exponential".
\code{sampling.param} is a \code{numeric} that specifies the
sampling parameter corresponding to the sampling mode given by
\code{sampling.mode}. Its default value is 1.
\code{no.sim} is an \code{integer} larger than 0 giving the number of
iterations for the waiting time simulations. Its default value is 10.
\code{seed} is a positive \code{integer} specifying the random generator
seed. Its default value is (-1) and then the time is used as a
random generator.
}
}
\value{
The function returns an object from the \code{RtreemixGPS} class that
containes the calculated GPS values, the model used for the
computation, the data, and so on (see
\code{\link{RtreemixGPS-class}}). The GPS values are represented as a
\code{numeric} vector with length equal to the number of samples in \code{data}.
}
\references{Estimating cancer survival and clinical outcome based on
genetic tumor progression scores, J. Rahnenf\"urer et al. }
\note{
The mixture model used for deriving the GPS values should not have more than
20 genetic events. The reason for this is that the number of all possible patterns
for which the GPS values are calculated during a computationally intensive simulations
is in this case 2^20.
The GPS examples are time consuming. They are commented out because of the time restrictions of the check of the package.
For trying out the code please copy it and uncomment it.
}
\author{Jasmina Bogojeska}

\seealso{
\code{\link{RtreemixGPS-class}}, \code{\link{RtreemixData-class}},
\code{\link{RtreemixModel-class}},
\code{\link{fit-methods}}, \code{\link{confIntGPS-methods}}
}
\examples{
}
\keyword{methods}
\keyword{survival}
%\VignetteIndexEntry{Rtreemix}
%\VignetteDepends{methods, graph, Biobase, Hmisc, Rgraphviz}
%\VignetteKeywords{Rtreemix, mtreemix, disease progression, HIV}
%\VignettePackage{Rtreemix}
%\documentclass[a4paper, oneside, 10pt]{article}
\documentclass[a4paper,12pt,twoside]{article}
\usepackage[pdftex]{graphicx}
\usepackage{calc}
\usepackage{sectsty}
\usepackage{caption}
\renewcommand{\captionfont}{\it\sffamily}
\renewcommand{\captionlabelfont}{\bf\sffamily}
\allsectionsfont{\sffamily}
% page style %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage[a4paper, left=23mm, right=23mm, top=20mm, bottom=20mm, nohead]{geometry}
\setlength{\parskip}{1.5ex}
\setlength{\parindent}{0cm}
\pagestyle{empty}
\usepackage{Sweave}
\SweaveOpts{prefix.string = Rtreemix}
\title{\vspace*{-6ex} {\bf Rtreemix}: a package for estimating mutagenetic trees mixture models and genetic progression scores}
\author{Jasmina Bogojeska, J\"org Rahnenf\"uhrer}
\date{\today \\%
\texttt{http://www.mpi-sb.mpg.de/$\sim$jasmina}}
\begin{document}
\maketitle
<>=
options(width = 70)
@
\section{Introduction}
The mixture of mutagenetic trees introduced in \cite{Beer12005} is an
evolutionary model that provides an interpretable probabilistic
framework for modeling multiple paths of ordered accumulation of permanent genetic changes that
can be used for describing many disease processes. Each path captures a possible route of
disease development. These models are used to model \texttt{HIV} progression characterized
by accumulation of resistance mutations in the viral genome under drug pressure \cite{Beer12005} and
cancer progression by accumulation of chromosomal aberrations in
tumor cells \cite{Rahn2005}. From the mixture model, a genetic progression score (\texttt{GPS})
can be computed for each patient \cite{Rahn2005} that gives an estimate of the
disease progression and can be used for specifying therapies or estimating survival times of the patients.
Both the mixture model itself and the derived \texttt{GPS} values are shown to improve the
interpretation of disease progression and to have predictive power for estimating the drug resistance
in \texttt{HIV} \cite{Beer12005} or the survival time in cancer \cite{Rahn2005}.
Beerenwinkel \textit{et al.} in \cite{Beer22005} introduced the \texttt{Mtreemix} package implemented in
\texttt{C/C++} that provides an efficient code for estimating the mutagenetic trees mixture models from cross-sectional
data and using them for various predictions. Building up on the \texttt{Mtreemix} software and using the \texttt{C/C++} API that
\texttt{R} provides, we created the \texttt{Rtreemix} package. By reusing already existing \texttt{C} functions
our package provides the users with \texttt{R} functions as efficient as the
programs available in the \texttt{Mtreemix}.
Similar to \texttt{Mtreemix}, the \texttt{Rtreemix} package provides functions
for learning the mixture model from given data, simulation, likelihood computation
and estimation of the GPS values. Furthermore, it introduces new functionality for estimating genetic
progression scores with corresponding bootstrap confidence intervals and for performing stability
analysis of the mixture models \cite{Bogo2008}.
\section{Structure and Functionality}
The class structure of the \texttt{Rtreemix} package is given in Figure~\ref{fig:classes}.
\begin{figure}[!h]
\centering
\includegraphics[width=0.7\linewidth]{ClassDiagram.pdf}
\caption{Class diagram of the package \texttt{Rtreemix}. The diagram illustrates the classes with their attributes and the relationships among them.}
\label{fig:classes}
\end{figure}
In this way, all data structures and information connected to some entity, like the mixture model or the data for the model estimation,
are packed together, which makes the code compact and easy to understand and work with. In what follows we will present
a working scenario in which we will use and briefly discuss most of the functions available in \texttt{Rtreemix}. More detailed information
about the parameters of all the functions used in the text bellow and their default values can be found in the help files of the package.
First, we load the package.
<>=
library(Rtreemix)
@
\subsection{The Dataset and the \textit{RtreemixData} class}
The datasets used for estimating the mixture models consist of binary patterns that describe the occurrence
of a set of genetic events in a set of patients. Each pattern corresponds to a single patient. The set of genetic
events comprises genetic changes relevant for the disease taken into consideration. In \texttt{Rtreemix} the data used for
estimating a mutagenetic trees mixture model is represented as an object of the class \texttt{RtreemixData}.
We consider the dataset from the Stanford \texttt{HIV} Drug Resistance Database \cite{Rhee2003} that comprises genetic
measurements of 364 \texttt{HIV} patients treated only with the drug \textit{zidovudine}. This dataset is given as an \texttt{RtreemixData} object
\textit{hiv.data.RData} in the \textit{data} folder of the package and can be loaded and displayed as follows.
<>=
data(hiv.data)
show(hiv.data) ## show the RtreemixData object
@
It should be also pointed out that a text file with a specific format can be used for creating an object of class \texttt{RtreemixData}. An example of
such file is the file \textit{treemix.pat} in the \textit{examples} directory of the package. The text files used to create an \texttt{RtreemixData}
object should follow the format of this file. Using \textit{treemix.pat} the \texttt{RtreemixData} object is created as follows.
<>=
ex.data <- new("RtreemixData", File = paste(.path.package(package = "Rtreemix"), "/examples/treemix.pat", sep = ""))
show(ex.data) ## show the RtreemixData object
@
One can also create an \texttt{RtreemixData} object by specifying the set of patient profiles as a binary matrix as shown in the code below.
<>=
bin.mat <- cbind(c(1, 0, 0, 1, 1), c(0, 1, 0, 0, 1), c(1, 1, 0, 1, 0))
toy.data <- new("RtreemixData", Sample = bin.mat)
show(toy.data)
@
Additionally, there are functions for listing the set of profiles, the genetic events, the patient IDs, the number of events and the number of patients in the dataset.
<>=
Sample(hiv.data)
Events(hiv.data)
Patients(hiv.data)
eventsNum(hiv.data)
sampleSize(hiv.data)
@
\subsection{Learning mutagenetic trees mixture models}
Having a set of patterns that indicate the occurence of genetic events for a group of patients
we can learn a mutagenetic trees mixture model. The model is an object of the \texttt{RtreemixModel} class that extends the
\texttt{RtreemixData} class. We fit a 2-trees mixture model for the \texttt{HIV} data \cite{Rhee2003}.
<>=
mod <- fit(data = hiv.data, K = 2, equal.edgeweights = TRUE, noise = TRUE)
show(mod)
@
The tree components of the fitted model can be visualized as follows.
\setkeys{Gin}{width=.9\linewidth}
\begin{figure}[!t]
\centering
<>=
plot(mod, fontSize = 15)
@
\caption{The mutagenetic trees mixture model for the \texttt{HIV} dataset.}
\label{fig:hiv}
\end{figure}
When the mixture model contains a large number of tree components it is convenient to be able to plot a specific tree component. The following code plots the
second tree component of the mixture model learned from the \texttt{HIV} dataset.
<>=
dev.off()
plot(mod, k=2, fontSize = 14)
@
It is assumed that mixture models with at least two components always have the noise (star) component as a first component. When only one tree component is fitted to the given data it can be either a star or a nontrivial tree component based on the choice of the parameter \texttt{noise}.
The mixture components comprising the model are represented as a list of directed \texttt{graphNEL} objects, and their weights (the mixture parameters) are given as a numeric vector. There are functions for getting the mixture parameters of the model, the number of tree components, the dataset used for estimating the model, etc.
<>=
Weights(mod)
Trees(mod)
getTree(object = mod, k = 2) ## Get a specific tree component k
edgeData(getTree(object = mod, k = 2), attr = "weight") ## Conditional probabilities assigned to edges of the 2nd tree component
numTrees(mod)
getData(mod)
@
This class can also contain other useful information connected with the mixture model: an indicator for the presence of the star component, a matrix of the responsibilities of each tree component for each pattern of the data used for learning the model, a matrix of the complete dataset in case of missing data, etc.
<>=
Star(mod)
Resp(mod)
CompleteMat(mod)
@
The mutagenetic trees mixture model encodes a probability distribution on the set of all possible patterns \cite{Beer12005}.
<>=
distr <- distribution(model = mod)
distr$probability
@
One can also generate a random mutagenetic mixture model. In this case each tree component from the model is drawn uniformly at random from the tree topology space by using the Pr\"ufer encoding of trees. The number of tree components and the number of genetic events have to be specified. Additionally, one can specify the range from which the edge weights of the tree components are randomly drawn ($[0.2, 0.8]$ in the example bellow).
<>=
show(rand.mod)
@
It is also possible to fit a mixture model and analyze its variance by deriving confidence intervals for the mixture parameters and the edge weights
(resulting from a bootstrap analysis). An example for this is given in the PDF file \textit{ExtendedVignette} in the \textit{doc} folder of the package.
\subsection{The \texttt{likelihood} method}
The package \texttt{Rtreemix} implements the function \texttt{likelihoods} which calculates the (log, weighted) likelihoods for
the patterns in a given dataset (\texttt{RtreemixData} object) derived with respect to a given \texttt{RtreemixModel}.
The likelihoods are contained in an object of class \texttt{RtreemixStats} which extends the \texttt{RtreemixData} class.
The number of the genetic events in the patterns from the given dataset has to be equal to the number of genetic events in
the branchings from the given mixture model. In the code that follows we calculate the likelihoods of the \texttt{HIV} dataset
with respect to the fitted mixture model.
<>=
mod.stat <- likelihoods(model = mod, data = hiv.data)
Model(mod.stat)
getData(mod.stat)
LogLikelihoods(mod.stat)
WLikelihoods(mod.stat)
@
When having the weighted likelihoods, one can easily derive the responsibilities of the model components for generating the patterns
in the specified dataset.
<>=
getResp(mod.stat)
@
\subsection{The \texttt{sim} method}
The mutagenetic trees mixture model encodes a probability distribution on the set of all possible patterns for a specified set of genetic events.
The \texttt{sim} method provides the possibility of simulating (drawing) patterns from a given \texttt{RtreemixModel}. The simulated patterns
are then returned as an \texttt{RtreemixData} object. Let's draw a specified number of patterns from our randomly generated model.
<>=
data <- sim(model = rand.mod, no.draws = 300)
show(data)
@
When besides the mixture model also the sampling mode and its respective sampling parameter are specified, this function simulates
patterns together with their waiting and sampling times from the respective model. The waiting and sampling times result from a waiting
time simulation along the branchings of the mixture model. The \texttt{sim} method presents the results in an \texttt{RtreemixSim} object.
An example illustrating this is given in the PDF file \textit{ExtendedVignette}.
\subsection{The genetic progression score (\texttt{GPS})}
When assuming independent Poisson processes for the occurrence of events on the edges of each mixture component
and for the sampling time of the disease, waiting times can be mapped on the tree edges of the branchings.
Consequently, a genetic progression score (\texttt{GPS}) that incorporates correlations among events and time intervals among
occurrences of events can be associated to the mixture model as proposed in \cite{Rahn2005}. The \texttt{GPS} estimates the
stage of the disease and is important for estimating survival times and giving patients proper therapies.
In the \texttt{Rtreemix} package the method \texttt{gps} calculates the \texttt{GPS} of a given set of patterns
with respect to a specified mixture model (an \texttt{RtreemixModel} object). The GPS values are derived in a waiting time
simulation for a given large number of simulation iterations, and for a specified sampling mode ("constant" or "exponential")
and its corresponding sampling parameter. The result of the \texttt{GPS} calculation is given as an
object of the class \texttt{RtreemixGPS} which extends the class \texttt{RtreemixData}. Moreover, with the function \texttt{confIntGPS}
the package gives the possibility to analyze the variance of the \texttt{GPS} values.
This function first calculates the \texttt{GPS} for the patterns in a given dataset data based on a fitted \texttt{K}-mutagenetic trees
mixture model. Then, it derives a 95\% confidence intervals for the \texttt{GPS} values with
bootstrap analysis. The data and \texttt{K} have to be specified. The confidence intervals reflect the variability of the
estimated \texttt{GPS} values. The results are again given as an \texttt{RtreemixGPS} object. The usage of the functions \texttt{gps}
and \texttt{confIntGPS} is demonstrated in the PDF file \textit{ExtendedVignette}.
\section{Stability analysis with the \texttt{Rtreemix} package}
The \texttt{Rtreemix} package implements functions that can be used for assessing the quality and analyzing the stability of different attributes of the
mutagenetic trees mixture model (the probability distributions induced by the model, the number of tree components, the
topologies of the tree components, the GPS values and so on). This is usually done by first choosing a model attribute of
interest and its appropriate similarity measure, and then inspecting its stability in a simulation setting.
In the simulation study typically attributes from simulated true mixture models are compared with the corresponding attributes from
models fitted to observations drawn from these true models. The similarity between the true and the fitted model is then
compared to the similarities between the true and a sufficient number of random models sampled uniformly from the mixture model space.
The quality of a fitted attribute is then assessed by estimating a p-value as the percentage of cases in which the true model is closer
to a random model than to the fitted model with respect to the chosen model attribute. More details about performing stability analysis
of the mutagenetic tree mixture modes and interpreting the results can be found in \cite(Bogo2008). Simple code examples for the
stability analysis of a mutagenetic trees mixture models are given in the \textit{doc} folder of the package in the PDF file \textit{ExtendedVignette}.

\bibliography{Rtreemix}
\bibliographystyle{unsrt}
\end{document}

[/Rtreemix] permanent link

Sat, 17 Nov 2007

Rtreemix
added the Rtreemix package and removed the bim package

[/Rtreemix] permanent link