Data Description Toolbox DD Tools 2.0.0
Data Description Toolbox DD Tools 2.0.0
dd tools 2.0.0
A Matlab toolbox for data description, outlier and novelty detection
for PRTools 5.0
0.8
0.6
0.4
0.2
0
5 5
0 0
−5
−5 Feature 1
Feature 2
Contents
1 This manual 4
2 Introduction 6
2.1 Classification in Prtools . . . . . . . . . . . . . . . . . . . . . 6
2.2 What is one-class classification? . . . . . . . . . . . . . . . . . 6
2.3 Error minimization in one-class . . . . . . . . . . . . . . . . . 8
2.4 Receiver Operating Characteristic curve . . . . . . . . . . . . 8
2.5 Introduction dd tools . . . . . . . . . . . . . . . . . . . . . . 10
3 Datasets 11
3.1 Creating one-class datasets . . . . . . . . . . . . . . . . . . . . 11
3.2 Inspecting one-class datasets . . . . . . . . . . . . . . . . . . . 13
4 Classifiers 15
4.1 Prtools classifiers . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Creating one-class classifiers . . . . . . . . . . . . . . . . . . . 16
4.3 Inspecting one-class classifiers . . . . . . . . . . . . . . . . . . 17
4.4 Available classifiers . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Combining one-class classifiers . . . . . . . . . . . . . . . . . . 25
4.6 Multi-class classification using one-class classifiers . . . . . . . 26
4.7 Note for programmers . . . . . . . . . . . . . . . . . . . . . . 27
5 Error computation 30
5.1 Basic errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.2 Precision and recall . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3 Area under the ROC curve . . . . . . . . . . . . . . . . . . . . 31
5.4 Cost curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Generating artificial outliers . . . . . . . . . . . . . . . . . . . 35
2
5.6 Cross-validation . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 General remarks 37
3
Chapter 1
This manual
The dd tools Matlab toolbox provides tools, classifiers and evaluation func-
tions for the research of one-class classification (or data description). The
dd tools toolbox is an extension of the Prtools toolbox , more specifically,
Prtools 5.0. In this toolbox Matlab objects for datasets and mappings,
called prdataset and prmapping, are defined. dd tools uses these objects
and their methods, but extends (and sometimes restricts) them to one-class
classification. This means that before you can use dd tools to its full po-
tential, you need to know a bit about Prtools. When you are completely
new to pattern recognition, Matlab or Prtools, please familiarize yourself a
bit with them first (see http://www.prtools.org for more information on
Prtools).
This short document should give the reader some idea what the data de-
scription toolbox (dd tools) for Prtools offers. It provides some background
information about one-class classification, about some implementation issues
and it gives some practical examples. It does not try to be complete, though,
because each new version of the dd tools will probably include new com-
mands and possibilities. The file Contents.m in the dd tools-directory gives
the up-to-date list of all functions and classifiers in the toolbox. The most
up-to-date information can be found on the webpage on dd tools, currently
at: http://prlab.tudelft.nl/david-tax/dd_tools.html.
Note, that this is not a cookbook, solving all your problems. It should
point out the basic philosophy of the dd tools . You should always have
a look at the help provided by each command (try help dd tools). They
should show all possible combinations of parameter arguments and output
arguments. When a parameter is listed in the Matlab code, but not in the
4
help, it often indicates an undocumented feature, which means: be careful!
Then I’m not 100% sure if it will work, how useful it is and if it will survive
a next dd tools version.
In chapter 2 a basic introduction about one-class classification/novelty detec-
tion/outlier detection is given. What is the goal, and how is the performance
measured. You can skip that if you’re familiar with one-class classification.
In chapter 2.5 the basic idea of the dd tools is given. Then in chapters 3
and 4 the specific use of datasets and classifiers is shown. In chapter 5 the
computation of the error is explained, and finally in 6 some general remarks
are given.
5
Chapter 2
Introduction
6
target class : this class is assumed to be sampled well, in the sense that
of this class many (training) example objects are available. It does not
necessarily mean that the sampling of the training set is done com-
pletely according to the target distribution found in practice. It might
be that the user sampled the target class according to his/her idea of
how representative these objects are. It is assumed though, that the
training data reflect the area that the target data covers in the feature
space.
outlier class : this class can be sampled very sparsely, or can be totally
absent. It might be that this class is very hard to measure, or it might
be very expensive to do the measurements on these types of objects.
In principle, a one-class classifier should be able to work, solely on the
basis of target examples. Another extreme case is also possible, when
the outliers are so abundant that a good sampling of the outliers is not
possible.
7
2.3 Error minimization in one-class
In order to find a good one-class classifier, two types of errors have to be
minimized, namely the fraction false positives and the fraction false negatives.
In table 2.1 all possible classification situations for one-class classification are
shown.
The fraction false negative can be estimated using (for instance) cross-validation
on the target training set. Unfortunately, the fraction false negative is much
harder to estimate. When no example outlier objects are available, this frac-
tion cannot be estimated. Minimizing just the fraction false negative, will
result in a classifier which labels all object as target object. In order to avoid
this degenerate solution, outlier examples have to be available, or artificial
outliers have to be generated (see also section 5.5).
8
1
0.9
0.8
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
outliers accepted (FP)
9
2.5 Introduction dd tools
In dd tools it is possible to define special one-class datasets and one-class
classifiers. Furthermore, the toolbox provides methods for generating artifi-
cial outliers, estimating the different errors the classifiers make (false positive
and false negative errors), estimating the ROC curve, the AUC (Area under
the ROC curve) error, the AUC over a limited integration domain and many
classifiers.
This is reflected in the setup of this manual. Each of the ingredients are
discussed in a separate section:
1. One-class datasets,
2. One-class classifiers,
Before you can use the toolbox, you have to use Prtools and you have to
put your data into a special dataset-format. Let us first start with the data.
10
Chapter 3
Datasets
>> xt = randn(40,2);
>> xo = gendatb([20,0]);
>> x = gendatoc(xt,xo);
>> xo = 10*randn(25,2);
>> x = gendatoc([],xo);
11
is lost. All data in xt will be labeled target and all data in xo outlier,
without exception.
• oc set: this function relabels an existing Prtools dataset such that one
of the classes becomes target class, and all others become outlier. You
have to supply the label of the class that you want to be target class.
Assume you generate data from a banana-shaped distribution, and you
want to have the class labeled 1 to be target class:
Now you still have 40 objects, half is labeled target, the other half
outlier.
This function oc set also accepts several classes to be labeled as target
class. When a 10-class problem is loaded, a subset of these classes can
be assigned to be target class:
When you don’t supply labels, it is assumed that all data is target data:
>> x = rand(20,2);
>> x = oc_set(x)
12
• target class: this functions labels one of the classes as target (iden-
tical to oc set) but furthermore removes all other objects from the
dataset.
Now dataset x just contains 20 target objects. You can achieve the
same in this way:
>> isocset(x)
This is often not very useful for normal users, but becomes important when
you’re constructing one-class classifiers for yourself.
When a one-class dataset is constructed, you can extract again which objects
are target or outlier:
Thus find target returns the indices of the target (in It) and the outlier
data (in Io). This is often used to split a one-class dataset into target and
outlier objects. This is cheaper than running target class twice:
13
>> xt = target_class(x);
>> xo = target_class(x,’outlier’);
This last implementation has the added drawback that xo is now labeled
target. The advantage of using target class is, that in one comment you
can extract the target class from the data into a new Prtools dataset. This
avoids the lengthy construction It=find target(x); xt=x(It,:).
The only special thing about one-class datasets is actually, that they contain
just 1 or 2 classes, with the labels target and/or outlier. When you define
your own dataset with these two labels, it will be automatically recognized
as a one-class dataset.1 Other datasets with two classes are not one-class
datasets, because there it is not clear which of the two is considered the
target class.
1
The toolbox also offers the function relabel for relabeling a dataset. It redefines the
field lablist in the dataset. The user first has to find out in which order the classes are
labeled in the dataset before he/she can relabel them. Therefore I will not recommend it,
although it becomes very useful when you want to label several classes as target and the
rest as outlier.
14
Chapter 4
Classifiers
15
4.2 Creating one-class classifiers
The one-class classifiers should be trained on the datasets from the previous
chapter. Many one-class classifiers do not know how to use example outliers
in their training data. They may therefore complain, or just ignore the outlier
objects completely if you supply them in your training data. For now, I call
it the responsibility of the user...
All one-class classifiers share the same characteristics:
2. Their second argument is always the error they may make on the target
class (the fraction false negative),
16
parameters can be given (for instance, for the k-means clustering method, it
is the number of clusters k).
These one-class classifiers are normal mappings in the Prtools sense. So
they can be plotted by plotc, can be combined with other mappings by [],
*, etc. To check if a classifier is a one-class classifier (i.e. it labels objects as
target or outlier), use isocc.
>> x=oc_set(gendatb([50,10]),’1’) 2
Feature 2
>> scatterd(x,’legend’)
0
>> w = svdd(target_class(x),0.1,8);
−2
>> plotc(w)
>> w = svdd(x,0.1,8); −4
>> plotc(w) −6
−8 −6 −4 −2 0 2 4 6
Feature 1
This W now contains several sub-fields with the optimized parameters. Which
parameters are stored, depends on the classifier. The format is free, except
that one parameter, threshold, should always be there.
An example is the support vector data description. In a quadratic optimiza-
tion procedure, the weights α are optimized. Assume, I want to have 10% of
the data on the boundary, using a Gaussian kernel with a kernel parameter
σ = 5 (forget the details, they are not important). Now I’m interested in
what the optimal α’s will be:
>> x = target_class(gendatb([50,0]),’1’);
>> w = svdd(x,0.1,5);
17
>> W = +w;
>> W.a
Note The SVDD has been changed in this new version of the toolbox. Have
a look at the remarks at the end of the file (chapter 6).
Another example is the Mixture of Gaussians. Let us see if we can plot the
boundary and the centers of the clusters. First, we create some data and
train the classifier (using 5 clusters):
>> x = target_class(gendatb([100,0]),’1’);
>> w = mog_dd(x,0.1,5);
Now we inspect the trained classifier and give some visual feedback:
>> W = +w
>> scatterd(x);
>> plotc(w); hold on;
>> scatterd(W.m,’r*’)
Apparently, the means of the clusters are stored in the m field of the structure
in the classifier.
18
• rob gauss dd: Gaussian target distribution, but robustified. Can be
significantly better for data with long tails. The mathematical de-
scription of the method is identical to gauss dd. The difference is in
the computation of the µ and Σ. This procedure reweights the ob-
jects in the training set according to their proximity to the (previously
estimated) mean. Remote objects (candidate outliers) will be down
weighted such that a more robust estimate is obtained. It will not be
strictly robust, because these outliers will always have some influence.
• mog dd: Mixture of Gaussians. Here the target class is modeled using
a mixture of K Gaussians, to create a more flexible description. The
model looks like:
K
X
Pi exp −(x − µi )T Σ−1
f (x) = i (x − µi ) (4.3)
i=1
19
This ’background’ outlier cluster is fixed and will not be adapted in
the EM algorithm (although it will be used in the computation of the
probability density). This results in the following model:
Kt
X
Pi exp −(x − µi )T Σ−1
f (x) = i (x − µi )
i=1
−P∗ exp −(x − µ)T Σ−1
∗ (x − µ)
Ko
X
Pj exp −(x − µj )T Σ−1
− i (x − µj ) (4.5)
j=1
20
• kmeans dd: the k-means data description, where the data is described
by k clusters, placed such that the average distance to a cluster cen-
ter is minimized. The cluster centers ci are placed using the standard
k-means clustering procedure ([Bis95]). The target class is then char-
acterized by:
• kcenter dd: the k-center data description, where the data is described
by k clusters, placed such that the maximal distance to a cluster center
is minimized [YD98]. When the clusters are placed, the mathematical
description of the method is similar to kmeans dd.
The default setting of the method is that the k eigenvectors with the
largest eigenvalues are used. It appears that this does not always have
the best performance [TM03]. It is also possible to choose the k eigen-
vectors with the smallest eigenvalues. This is an extra feature in the
toolbox.
21
clustering method in which the cluster centers are constrained in their
placing. The construction of the SOM is such that all objects in the fea-
ture space retain as much as possible their distance and neighborhood
relations in the mapped space.
The mapping is performed by a specific type of neural network, equipped
with a special learning rule. Assume that we want to map an d-
dimensional measurement space to a k-dimensional feature space, where
k < d. In the feature space, we define a finite orthogonal grid with
K × K grid points xSOM . At each grid point we place a neuron. Each
neuron stores an d-dimensional vector that serves as a cluster center.
By defining a grid for the neurons, each neuron does not only have a
neighboring neuron in the measurement space, it also has a neighboring
neuron in the grid. During the learning phase, neighboring neurons in
the grid are enforced to also be neighbors in the measurement space. By
doing so, the local topology will be preserved. In this implementation
the SOM only k = 1 or k = 2.
To evaluate if a new object fits this model, again a reconstruction error
is defined. This reconstruction error is the difference between the object
and its closest cluster center (neuron) in the SOM:
f (x) = min ||x − xSOM ||2 (4.11)
i
22
to the k-th nearest neighbor is used. Slightly advanced methods use
averaged distances, which works somewhat better. This simple method
is often very good in high dimensional feature spaces.
• incsvdd: The incremental support vector machine which uses its own
optimization routine. This makes it possible to optimize the SVDD
23
without the use of an external quadratic programming optimizer, and
to use any kernel. In future versions this will be adapted to cope with
dynamically changing data (data distributions which change in time).
Currently this is my preferred way to train a SVDD.
• dlpdd: The linear programming distance-data description [PTD03].
This data descriptor is specifically constructed to describe target ob-
jects which are represented in terms of distances to other objects. In
some cases it might be much easier to define distances between objects
than informative features (for instance when shapes have to be distin-
guished). To stress that the classifier is operating on distance data, the
name starts with a d. The classifier has basically the following form:
X
f (x) wi d(x, xi ) (4.12)
i
The weights w are optimized such that just a few weights stay non-zero,
and the boundary is as tight as possible around the data.
• lpdd: The linear programming data description. The fact that dlpdd is
using distance data instead of standard feature data makes it harder to
simply use it on normal feature data. Therefore lpdd is created, which
is basically a wrapper combining some high level algorithms together
to make the application of dlpdd simpler.
• mpm dd: The minimax probability machine by Lanckriet [LEGJ03]. It
tries to find the linear classifier that separates the data from the origin,
rejecting maximally a specific fraction of the target data. In the original
version, an upper bound on this rejection error is used (applying a very
general bound using only the mean and covariance matrix of the target
data). Unfortunately, in practice this bound is so loose that it is not
useful. Therefore the rejection threshold is re-derived from the target
data.
• stump dd: Put a threshold on one of the features (per default the first
feature). Everything above the threshold is labeled as target, every-
thing below is outlier. Although it is a stupid classifier, it can be used
as a base classifier to create more complex ones.
To get more information for each individual classifier, have a look at their
help (for instance help gauss dd).
24
4.5 Combining one-class classifiers
Many real-world datasets have a much more complicated distribution than
can be modeled by, say, a mixture of Gaussians. It appears that it might
be very benificial to combine classifiers. Each of the classifiers can focus on
a specific feature or characteristic in the data. By combining the classifiers,
one hopes to combine all the strong points of the classifiers, and obtain a
much more flexible model.
Like with normal classifiers, there is the problem that the outputs of the
classifiers should be rescaled in such a way, that the outputs become com-
parable. For trainable combining this is not very essential, but when fixed
combination rules like mean-rule, max-rule, median-rule are considered, the
outputs of the classifiers should be rescaled.
In Prtools, the output of many classifiers are scaled by fitting a sigmoid
function and then normalized such that the sum of the outputs becomes 1. In
this way, the outputs of the classifier can be interpreted as (an approximation
to) the class posterior probabilities. This scaling is done by the function
classc (see also prex combining.m).
For one-class classifiers there is a small complication. Here there are two
types of classifiers: classifiers based on density estimations, and classifiers
based on distances to a model. Normalization of the first type of classifiers
is no problem, it directly follows the strategy of Prtools. The output of the
second type of classifiers, on the other hand, causes problems. Imagine an
object belonging to the target class. It will have a distance to the target
model smaller than some threshold. According to Prtools, the objects are
assigned to the class with the highest output. Therefore, in dd tools, the
distances of the distance-based classifiers are negated such that the output
for the target class with be higher than the threshold.
To normalize the outputs of these distance-based classifiers, the output of
these classifiers have to be negated again. This means that the standard
classc cannot be applied. For this, a new function dd normc is introduced.
The standard approach is to multiply all classifiers per default with dd normc.
It will not change the classification by the classifier.
>> a = target_class(gendatb);
>> w1 = gauss_dd(a,0.1); % define 4 arbitrary OC classifiers
>> w2 = pca_dd(a,0.1,1);
>> w3 = kmeans_dd(a,0.1,3);
25
>> w4 = mog_dd(a,0.1,2);
>> % combine them with the mean comb rule:
>> W = [w1*dd_normc w2*dd_normc w3*dd_normc w4*dd_normc] * meanc;
>> scatterd(x);
>> plotc(W);
The result is an elliptical decision boundary around all data, and inside the
ellipse a linear decision boundary between the two classes.
The second approach is to fit a one-class classifier on each of the individual
classes in the multi-class dataset. It has the advantage that a specialized one-
class classifier can be fitted on each separate classes, making it potentially
much more flexible and powerful. The drawback is that it is not so simple
to compare the outputs of different one-class classifier: how to compare an
output from a Gaussian model with an output of a k-means clustering? This
is solved by normalizing the output of each of the one-class classifiers. The
26
normalization is different for each of the classifiers, and it is performed by
dd normc.
The training using the second appraoch is implemented by the function
multic. Assume we want to fit a Gaussian model on the first class, and
a mixture of Gaussians (with two clusters) on the second class:
27
% W = RANDOM_DD(A,FRACREJ)
%
% This is the trivial one-class classifier, randomly assigning labels
% and rejecting FRACREJ of the data objects. This procedure is just to
% show the basic setup of a Prtools classifier, and what is required
% to define a one-class classifier.
function W = random_dd(a,fracrej)
if ~ismapping(fracrej) %training
% train it:
% this trivial classifier cannot be trained. for each object we will
% output a random value between 0 and 1, indicating the probability
% that an object belongs to class ’target’
% if we would like to train something, we should do it here.
28
else %testing
W = getdata(fracrej); %unpack
[m,k] = size(a);
% This classifier only contains the threshold, nothing more.
W = setdat(a,newout,fracrej);
end
return
29
Chapter 5
Error computation
The first entry e(1) gives the false negative rate (i.e. the error on the target
class) while e(2) gives the false positive rate (the error on the outlier class).
Question: can you imagine what would happen when you would replace
oc set in the third line by target class?
30
precision : defined as
# of correct target predictions
precision = ,
# of target predictions
31
First the classifier is trained on x for a specific threshold. Then for varying
thresholds, the classifier is evaluated on dataset z. The results are returned
in a ROC curve, given in a matrix e with two columns, the first indicating
the false negatives, the second the false positives.
1
0.9
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
outliers accepted (FP)
>> plotroc(e);
>> a = oc_set(gendatb,1);
>> w = gauss_dd(a,0.1);
>> h = plotroc(w,a)
By moving the mouse, and clicking, the user can change the position of the
operating point. Inside the figure, a new mapping with this new operating
point is stored. This mapping can be retrieved in the Matlab working space
by:
>> w2 = getrocw(h)
32
Because it is very hard to compare ROC curves of different classifiers, often
the AUC error (Area Under the AUC curve is taken). In my definition of
the AUC error, the larger the value, the better the one-class classifier. It is
computed from the ROC curve values using the function dd auc:
In many cases only a restricted range for the false negatives is of interest:
for instance, we want to reject less than half of the target objects. In these
cases one may want to set bounds on the range of the AUC error:
>> e = dd_roc(z,w);
>> err = dd_auc(e,[0.05 0.5]);
>> b = oc_set(gendatb);
>> w = gauss_dd(b);
>> r = dd_prc(b,w);
>> plotroc(r)
>> e = dd_meanprec(r)
33
0.5
0.45
0.4
0.3
0.25
0.2
0.15
0.1
0.05
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Probability cost function
Figure 5.2: The cost curve derived from the same dataset and classifier as in
Figure 5.1.
>> a = oc_set(gendatb,1);
>> w = gauss_dd(a,0.1);
>> c = a*w*dd_costc;
>> plotcostc(c)
34
5.5 Generating artificial outliers
When you are not so fortunate to have example outliers available for testing,
you can create them yourself. Say that z is a set of test target objects.
Artificial outliers can be generated by:
>> z_o = gendatblockout(z,100)
This creates a new dataset from z, containing both the target objects from z
and 100 new artificial outliers. The outliers are drawn from a block-shaped
uniform distribution that covers the target objects in z. (An alternative is
to draw samples not from a box, but from a Gaussian distribution. This is
implemented by gendatoutg.)
This works well in practice for low dimensional dataset. For higher dimen-
sions, it becomes very inefficient. Most of the data will be in the ’corners’ of
the box. In these cases it is better to generate data uniform in a sphere.
>> z_o = gendatout(z,100)
In this version, the most tight hypersphere around the data is fitted. Given
the center and radius of this sphere, data can be uniformly generated by
randsph (this is not trivial!).
For (very) high dimensional feature spaces there is still another method to
generate outliers, gendatouts. This method first estimates the PCA sub-
space in which the target data is distributed, then it will generate outliers in
this subspace. The outliers are drawn from a sphere again.
5.6 Cross-validation
The toolbox has an extra procedure to facilitate cross-validation. In cross-
validation a dataset is split into B batches. From these batches B − 1 are
used to train a classifier, and the left-out batch is used to evaluate it. This
is repeated B times, and the performances are averaged. The advantage is
that given a limited training set, it is still possible to obtain a relatively good
classifier, and estimate its performance on an independent set.
In practice, this cross-validation procedure is applied over and over again.
Not only to evaluate and compare the performance of classifiers, but also to
optimize hyperparameters. To keep the procedure as flexible as possible, the
cross-validation is kept as simple as possible. An index vector is generated
35
that indicates to with batch each object in a training set belongs. By repeat-
edly applying the procedure, the different batches are combined in a training
and evaluation set. The following piece of code shows how this is done in
practice:
Note that the procedure takes class priors into account. It tries to retain the
number of objects per class in each fold according to the total dataset.
36
Chapter 6
General remarks
In this chapter I collected some remarks which are important to see once,
but did not fit in the line of the previous chapters.
1. If you want to know more about a classifier or function, always try the
help command.
2. Also have a look at the file Contents.m. This contains the full list of
functions and classifiers defined in the toolbox.
37
optimization simplifies significantly with this assumption. Using ksvdd
or incsvdd this restriction is now lifted. In particular incsvdd is rec-
ommended because it does not rely on external quadratic programming
optimizers which always creates problems.
38
6. I’m not responsible for the correct functioning of the toolbox, but of
course I do my best to make the toolbox as useful and bug-free as possi-
ble. Please email me when you have found a bug at [email protected].
I’m also very interested when people have defined new one-class classi-
fiers.
39
Chapter 7
40
%kwhiten rescale data to unit variance in kernel space
%gower compute the Gower similarities
%
%One-class classifiers
%---------------------
%random_dd description which randomly assigns labels
%stump_dd threshold the first feature
%gauss_dd data description using normal density
%rob_gauss_dd robustified gaussian distribution
%mcd_gauss_dd Minimum Covariance Determinant gaussian
%mog_dd mixture of Gaussians data description
%mog_extend extend a Mixture of Gaussians data description
%parzen_dd Parzen density data description
%nparzen_dd Naive Parzen density data description
%
%autoenc_dd auto-encoder neural network data description
%kcenter_dd k-center data description
%kmeans_dd k-means data description
%pca_dd principal component data description
%som_dd Self-Organizing Map data description
%mst_dd minimum spanning tree data description
%
%nndd nearest neighbor based data description
%knndd K-nearest neighbor data description
%ball_dd L_p-ball data description
%lpball_dd extended L_p-ball data description
%svdd Support vector data description
%incsvdd Incremental Support vector data description
%(incsvc incremental support vector classifier)
%ksvdd SVDD on general kernel matrices
%lpdd linear programming data description
%mpm_dd minimax probability machine data description
%lofdd local outlier fraction data description
%lofrangedd local outlier fraction over a range
%locidd local correlation integral data description
%abof_dd angle-based outlier fraction data description
%
%dkcenter_dd distance k-center data description
41
%dnndd distance nearest neighbor based data description
%dknndd distance K-nearest neighbor data description
%dlpdd distance-linear programming data description
%dlpsdd distance-linear progr. similarity description
%
%isocc true if classifier is one-class classifier
%
%AUC optimizers
%--------------
%rankboostc Rank-boosting algorithm
%auclpm AUC linear programming mapping
%
%Classifier postprocessing/optimization/combining.
%--------------------------------------
%consistent_occ optimize the hyperparameter using consistency
%optim_auc optimize the hyperparameter by maximizing AUC
%dd_normc normalize oc-classifier output
%multic construct a multi-class classifier from OCC’s
%ocmcc one-class and multiclass classifier sequence
%
%Error computation.
%-----------------
%dd_error false positive and negative fraction of classifier
%dd_confmat confusion matrix
%dd_kappa Cohen’s kappa coefficient
%dd_f1 F1 score computation
%dd_eer equal error rate
%dd_roc computation of the Receiver-Operating Characterisic curve
%dd_prc computation of the Precision-Recall curve
%dd_auc error under the ROC curve
%dd_meanprec mean precision of the Precision-Recall curve
%dd_costc cost curve
%dd_aucostc area under the cost curve
%dd_delta_aic AIC error for density estimators
%dd_fp compute false positives for given false negative
% fraction
%simpleroc basic ROC curve computation
%dd_setfn set the threshold for a false negative rate
42
%roc2prc convert ROC to precision-recall curve
%
%Plot functions.
%--------------
%plotroc plot an ROC curve or precision-recall curve
%plotcostc plot the cost curve
%plotg plot a 2D grid of function values
%plotw plot a 2D real-valued output of classifier w
%askerplot plot the FP and FN fraction wrt the thresholds
%plot_mst plot the minimum spanning tree
%lociplot plot a lociplot
%
%Support functions.
%-----------------
%dd_version current version of dd_tools, with upgrade possibility
%istarget true if an object is target
%find_target gives the indices of target and outlier objs from a dataset
%getoclab returns numeric labels (+1/-1)
%dist2dens map distance to posterior probabilities
%dd_threshold give percentiles for a sample
%randsph create outlier data uniformly in a unit hypersphere
%makegriddat auxiliary function for constructing grid data
%relabel relabel a dataset
%dd_kernel general kernel definitions
%center center the kernel matrix in kernel space
%gausspdf multi-variate Gaussian prob.dens.function
%mahaldist Mahalanobis distance
%sqeucldistm square Euclidean distance
%mog_init initialize a Mixture of Gaussians
%mog_P probability density of Mixture of Gaussians
%mog_update update a MoG using EM
%mogEMupdate EM procedure to optimize Mixture of Gaussians
%mogEMextend smartly extend a MoG and apply EM
%mykmeans own implementation of the k-means clustering algorithm
%getfeattype find the nominal and continuous features
%knn_optk optimization of k for the knndd using leave-one-out
%volsphere compute the volume of a hypersphere
%scale_range compute a reasonable range of scales for a dataset
43
%nndist_range compute the average nearest neighbor distance
%inc_setup startup function incsvdd
%inc_add add one object to an incsvdd
%inc_remove remove one object from an incsvdd
%inc_store store the structure obtained from inc_add to prtools mapping
%unrandomize unrandomize objects for incsvc
%plotroc_update support function for plotroc
%roc_hull convex hull over a ROC curve
%lpball_dist lp-distance to a center
%lpball_vol volume of a lpball
%lpdist fast lp-distance between two datasets
%nndist (average) nearest neighbor distance
%dd_message printf with colors
%
%Examples
%--------
%dd_ex1 show performance of nndd and svdd
%dd_ex2 show the performances of a list of classifiers
%dd_ex3 shows the use of the svdd and ksvdd
%dd_ex4 optimizes a hyperparameter using consistent_occ
%dd_ex5 shows the construction of lpdd from dlpdd
%dd_ex6 shows the different Mixture of Gaussians classifiers
%dd_ex7 shows the combination of one-class classifiers
%dd_ex8 shows the interactive adjustment of the operating point
%dd_ex9 shows the use of dd_crossval
%dd_ex10 shows the use of the incremental SVDD
%dd_ex11 the construction of a multi-class classifier using OCCs
%dd_ex12 the precision-recall-curve and the ROC curve
%dd_ex13 kernelizing the AUCLPM
%dd_ex14 show the combination of a one-class and multi-class
%dd_ex15 show the parameter optimization mapping
%
% Copyright: D.M.J. Tax, [email protected]
% Faculty EWI, Delft University of Technology
% P.O. Box 5031, 2600 GA Delft, The Netherlands
44
Index
45
plotg, 37
precision, 30
precision-recall curve, 32
Prtools, 4
random dd, 26
recall, 30
relabel, 13
ROC curve, 8, 30
target class, 6
target class, 12
threshold, 15
true negative, 7
true positive, 7
46
Bibliography
47