Bioconductor Code: ClassifyR

Raw Blame Patch Log History
% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/rankingPlot.R
\name{rankingPlot}
\alias{rankingPlot}
\alias{rankingPlot,list-method}
\alias{rankingPlot,ClassifyResult-method}
\title{Plot Pair-wise Overlap of Ranked Features}
\usage{
\S4method{rankingPlot}{ClassifyResult}(results, ...)

\S4method{rankingPlot}{list}(
  results,
  topRanked = seq(10, 100, 10),
  comparison = "within",
  referenceLevel = NULL,
  characteristicsList = list(),
  orderingList = list(),
  sizesList = list(lineWidth = 1, pointSize = 2, legendLinesPointsSize = 1, fonts = c(24,
    16, 12, 12, 12, 16)),
  lineColours = NULL,
  xLabelPositions = seq(10, 100, 10),
  yMax = 100,
  title = if (comparison[1] == "within") "Feature Ranking Stability" else
    "Feature Ranking Commonality",
  yLabel = if (is.null(referenceLevel)) "Average Common Features (\%)" else
    paste("Average Common Features with", referenceLevel, "(\%)"),
  margin = grid::unit(c(1, 1, 1, 1), "lines"),
  showLegend = TRUE,
  parallelParams = bpparam()
)
}
\arguments{
\item{results}{A list of \code{\link{ClassifyResult}} objects.}

\item{...}{Not used by end user.}

\item{topRanked}{A sequence of thresholds of number of the best features to
use for overlapping.}

\item{comparison}{Default: \code{"within"}. The aspect of the experimental design to
compare. Can be any characteristic that all results share or special value
"within" to compared between all pairwise iterations of cross-validation.}

\item{referenceLevel}{The level of the comparison factor to use as the
reference to compare each non-reference level to. If \code{NULL}, then each
level has the average pairwise overlap calculated to all other levels.}

\item{characteristicsList}{A named list of characteristics. The name must be
one of \code{"lineColour"}, \code{"pointType"}, \code{"row"} or
\code{"column"}. The value of each element must be a characteristic name, as
stored in the \code{"characteristic"} column of the results' characteristics
table.}

\item{orderingList}{An optional named list. Any of the variables specified
to \code{characteristicsList} can be the name of an element of this list and
the value of the element is the order in which the factor should be
presented in.}

\item{sizesList}{Default: \code{lineWidth = 1, pointSize = 2,
legendLinesPointsSize = 1, fonts = c(24, 16, 12, 12, 12, 16)}. A list which
must contain elements named \code{lineWidth}, \code{pointSize},
\code{legendLinesPointsSize} and \code{fonts}. The first three specify the
size of lines and points in the graph, as well as in the plot legend.
\code{fonts} is a vector of length 6.  The first element is the size of the
title text. The second element is the size of the axes titles.  The third
element is the size of the axes values. The fourth element is the size of
the legends' titles.  The fifth element is the font size of the legend
labels. The sixth element is the font size of the titles of grouped plots,
if any are produced. Each list element must numeric.}

\item{lineColours}{A vector of colours for different levels of the line
colouring parameter, if one is specified by
\code{characteristicsList[["lineColour"]]}. If none are specified but,
\code{characteristicsList[["lineColour"]]} is, an automatically-generated
palette will be used.}

\item{xLabelPositions}{Locations where to put labels on the x-axis.}

\item{yMax}{The maximum value of the percentage to plot.}

\item{title}{An overall title for the plot.}

\item{yLabel}{Label to be used for the y-axis of overlap percentages.}

\item{margin}{The margin to have around the plot.}

\item{showLegend}{If \code{TRUE}, a legend is plotted next to the plot. If
FALSE, it is hidden.}

\item{parallelParams}{An object of class \code{\link{MulticoreParam}} or
\code{\link{SnowParam}}.}
}
\value{
An object of class \code{ggplot} and a plot on the current graphics
device, if \code{plot} is \code{TRUE}.
}
\description{
Pair-wise overlaps can be done for two types of analyses. Firstly, each
cross-validation iteration can be considered within a single classification.
This explores the feature ranking stability. Secondly, the overlap may be
considered between different classification results. This approach compares
the feature ranking commonality between different results. Two types of
commonality are possible to analyse. One summary is the average pair-wise
overlap between all possible pairs of results. The second kind of summary is
the pair-wise overlap of each level of the comparison factor that is not the
reference level against the reference level. The overlaps are converted to
percentages and plotted as lineplots.
}
\details{
If \code{comparison} is \code{"within"}, then the feature selection overlaps
are compared within a particular analysis. The result will inform how stable
the selections are between different iterations of cross-validation for a
particular analysis. Otherwise, the comparison is between different
cross-validation runs, and this gives an indication about how common are the
features being selected by different classifications.

Calculating all pair-wise set overlaps for a large cross-validation result
can be time-consuming.  This stage can be done on multiple CPUs by providing
the relevant options to \code{parallelParams}.
}
\examples{

  predicted <- DataFrame(sample = sample(10, 100, replace = TRUE),
                          permutation = rep(1:2, each = 50),
                          class = rep(c("Healthy", "Cancer"), each = 50))
  actual <- factor(rep(c("Healthy", "Cancer"), each = 5))
  allFeatures <- sapply(1:100, function(index) paste(sample(LETTERS, 3), collapse = ''))
  rankList <- list(allFeatures[1:100], allFeatures[c(15:6, 1:5, 16:100)],
                   allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 61:100, 60:51)])
  result1 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name", "Cross-validation"),
                            value = c("Melanoma", "t-test", "Diagonal LDA", "2 Permutations, 2 Folds")),
                            LETTERS[1:10], allFeatures, rankList,
                            list(rankList[[1]][1:15], rankList[[2]][1:15],
                                 rankList[[3]][1:10], rankList[[4]][1:10]),
                            list(function(oracle){}), NULL,
                            predicted, actual)
  
  predicted[, "class"] <- sample(predicted[, "class"])
  rankList <- list(allFeatures[1:100], allFeatures[c(sample(20), 21:100)],
  allFeatures[c(1:9, 11, 10, 12:100)], allFeatures[c(1:50, 60:51, 61:100)])
  result2 <- ClassifyResult(DataFrame(characteristic = c("Data Set", "Selection Name", "Classifier Name",
                                                         "Cross-validations"),
                            value = c("Melanoma", "t-test", "Random Forest", "2 Permutations, 2 Folds")),
                            LETTERS[1:10], allFeatures, rankList,
                            list(rankList[[1]][1:15], rankList[[2]][1:15],
                                 rankList[[3]][1:10], rankList[[4]][1:10]),
                            list(function(oracle){}), NULL,
                            predicted, actual)
                            
  rankingPlot(list(result1, result2), characteristicsList = list(pointType = "Classifier Name"))

}
\author{
Dario Strbenac
}