0% found this document useful (0 votes)
19 views72 pages

8 SVMs

Uploaded by

Mehroz Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views72 pages

8 SVMs

Uploaded by

Mehroz Khan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 72

IPVS – Institute for Parallel and Distributed Systems

Analytic Computing

8 Support Vector Machines

Prof. Dr. Steffen Staab


https://www.ipvs.uni-stuttgart.de/departments/ac/
• based on slides by
• Thomas Gottron, U. Koblenz-Landau,
https://west.uni-koblenz.de/de/studying/courses/ws1718/machine-learning-and-data-mining-1

• Andrew Zisserman, http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf

25.06.22 2
1 Perceptron Algorithm
Binary classification

$
• Given training data 𝑥! , 𝑦! !"# with 𝑥! ∈ ℝ% and 𝑦! ∈ −1,1 ,

• Learn a classifier

( > 0, if 𝑦! = +1
𝑓 𝑥! = +
< 0, if 𝑦! = −1

• Correct classification:
𝑓( 𝑥! 𝑦! > 0
25.06.22 4
Linear separability

(Zisserman 2015)

25.06.22 5
Linear separability

(Zisserman 2015)

25.06.22 6
Linear classifiers Related to, but different
from chapter „From linear
regression to classification“
• A linear classifier has the form in 4-LogisticRegression

𝑓( 𝑥 = 𝑤 & 𝑥 + 𝑏
!

• In 2D the discriminant is a line


• 𝑤 is the normal to the line,
! !
and b the bias
• 𝑤 is known as the weight vector
(Zisserman 2015)

25.06.22 7
Linear classifiers

• Let‘s assume 𝒙𝒊 = 𝟏, 𝒙𝒊,𝟏 , … , 𝒙𝒊,𝒅


(as in linear regression)

• The we write
!

𝑓( 𝑥 = 𝑤 & 𝑥

! !

(Zisserman 2015)
x0=1
25.06.22 8
Linear classifiers

• A linear classifier has the form

𝑓( 𝑥 = 𝑤 & 𝑥
!
• In 3D the discriminant is a plane
• In n-dim the discriminant is a
hyperplane
• Only 𝑤 (including 𝑏) are needed to
classify new data
(Zisserman 2015)

25.06.22 9
The perceptron classifier

$
• Given training data 𝑥! , 𝑦! !"# with 𝑥! ∈ ℝ% and 𝑦! ∈ −1,1 ,
• how to find a weight vector w, the separating hyperplane,
such that the two categories are separated for the dataset?

• Perceptron algorithm
1. Initialize 𝑤 = 0#
2. While there is 𝑖 such that 𝑓& 𝑥! 𝑦! < 0 do

• 𝑤 ≔ 𝑤 − 𝛼𝑥! sign 𝑓4 𝑥! = 𝑤 + 𝛼𝑥! 𝑦! (not for sign(...)=0)

25.06.22 10
Example for perceptron algorithm in 2D

1. Initialize 𝑤 = 0#
2. While there is 𝑖 such that 𝑓& 𝑥! 𝑦! < 0 do
• 𝑤 ≔ 𝑤 − 𝛼𝑥! sign(𝑓& 𝑥! )

At convergence: 𝑤 = ∑$
!"# 𝛼! 𝑥! (Zisserman 2015)
Example

• if the data is linearly separable,


then the algorithm will converge
• convergence can be slow …
• separating line close to training data

(Zisserman 2015)

25.06.22 12
2 Support Vectors
What is the best 𝒘?

• Idea: maximum margin solution is most stable under


perturbations of the inputs

(Zisserman 2015)

25.06.22 16
Support Vector Machine

!
𝑓" 𝑥 = & 𝛼! 𝑦! x"# 𝑥 + 𝑏
! (Zisserman 2015)
25.06.22 17
SVM Optimization Problem (1)

• Distance 𝛿 𝑥! , ℎ of data point 𝑥! from hyperplane ℎ


"$ # $% %$ &'
𝛿 𝑥! , ℎ = $

𝛿 𝑥! , ℎ 25.06.22 18
SVM Optimization Problem (1)

• In general, length of vector w does not matter


• fix w such that support vectors 𝑥( make 𝑦( 8 𝑤 ) 𝑥( + 𝑏 = 1
• Then positive and negative
support vectors have 1
distance 𝛿 𝑥( , ℎ =
* 𝑤
|$|
from hyperplane –
which we want
to maximize
• Standard formulation:
$
Minimize
,
(inverse margin)
𝛿 𝑥! , ℎ 25.06.22 19
Support Vector Machine

(Zisserman 2015)

25.06.22 21
SVM Optimization Problem (2)

2
max
$ ||𝑤||

subject to
𝑦! 𝑤 ) 𝑥! + 𝑏 ≥ 1, for 𝑖 = 1 … 𝑁
Or equivalently
min 𝑤 ,
$
subject to the same constraints
This is a quadratic optimization problem
subject to linear constraints and there is a unique minimum

25.06.22 22
Compare the two optimization criteria for classfication of linearly separable data
by linear regression with classification by SVM
Linear classification
Linear classification
using SVM:
using regression:
decision line maximizes
decision line is average
margins between support
between regression lines;
vectors; far away data
all data points are
points are irrelevant
considered

25.06.22 24
3 Soft Margin and
Hinge Loss
Re-visiting linear separability

• Points can be linearly separated,


but with very narrow margin

(Zisserman 2015)

25.06.22 26
Re-visiting linear separability

• Points can be linearly separated,


but with very narrow margin

• Possibly the large margin solution is better,


even though one constraint is violated

Trade-off between the margin and the


number of mistakes on training data (Zisserman 2015)

25.06.22 27
Introduce „slack“ variables

(Zisserman 2015)
Soft margin solution
Revised optimization problem
O
K
min 𝑤 + 𝐶 ' 𝜉L
G∈ℝ ,J" ∈ℝ#
!
LMN
subject to
𝑦! 𝑤 " 𝑥! + 𝑏 ≥ 1 − 𝜉! , for 𝑖 = 1 … 𝑁
• Every constraint can be satisfied if 𝜉! is sufficiently large
• 𝐶 is a regularization parameter:
- small 𝐶 allows constraints to be easily ignored ⟹ large margin
- large 𝐶 makes constraints hard to ignore ⟹ narrow margin
- 𝐶 = ∞ enforces all constraints ⟹ hard margin
• Still a quadratic optimization problem with unique minimum
• One hyperparameter 𝐶 25.06.22 29
Loss function

• Given constraints:
𝑦! 𝑤 ) 𝑥! + 𝑏 ≥ 1 − 𝜉!
𝜉! ≥ 0
• We can rewrite 𝜉! as:
𝜉! = max(0,1 − 𝑦! 𝑓& 𝑥! )

• Hence, we can optimize the unconstrained optimization problem over 𝑤:


0
1
min& 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

regularization loss function


25.06.22 30
Loss function
, 0
1
min 𝑤 + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ& ,2$ ∈ℝ' 𝐶
!/*

(Zisserman 2015) 25.06.22 31


Hinge loss

(Zisserman 2015)

25.06.22 32
4 Gradient descent over
convex function
Gradient descent/ascent

https://en.wikipedia.org/wiki/Gradient_descent#/media/File:Gradient_descent.svg
Climb down a hill
Climb up a hill

Given differentiable function


describing height of hill at position
𝒙 = (𝑥(, … , 𝑥) ) height of hill 𝑓 𝒙 .

How to climb up/down fastest?

Go in direction where
𝑑𝑓(𝒙)
= 𝛻𝒙 𝑓(𝒙)
𝑑𝒙
is maximal/minimal
In general: challenge can be difficult

𝑥! 𝑥"
Gradient Descent (- but without posts)

https://goo.gl/images/JKN6zm
Optimization continued

Questions
• Does this cost function have a unique solution?

25.06.22 38
Optimization continued

(Zisserman 2015)
Questions
• Does this cost function have a unique solution?
• Do we find it using gradient descent?
Does the solution we find using gradient descent depend on the starting point?
To the rescue:
• If the cost function is convex, then a locally optimal point is globally optimal (provided
the optimization is over constraints that form a convex set – given in our case)
25.06.22 39
Convex functions

(Zisserman 2015)
25.06.22 40
Convex function examples

• A non-negative sum of convex functions is convex (Zisserman 2015)

25.06.22 41
Applied to hinge loss and regularization

25.06.22 42
Gradient descent algorithm for SVM
To minimize a cost function 𝒞(𝑤) use the iterative update
𝑤#$% ≔ 𝑤# − 𝜂# ∇& 𝒞(w' )
where 𝜂 is the learning rate.

(
Let‘s rewrite the minimization problem as an average with 𝜆 = ):
+
1 1
𝒞 𝑤 = 𝑤 ( + > max(0,1 − 𝑦! 𝑓C 𝑥! ) =
𝑁𝐶 𝑁
!*%
+
1 𝜆
= > 𝑤 ( + max(0,1 − 𝑦! 𝑓C 𝑥! )
𝑁 2
!*%

and 𝑓C 𝑥! = 𝑤 " 𝑥 + 𝑏
25.06.22 43
Sub-gradient for hinge loss

ℒ 𝑥! , 𝑦! ; 𝑤 = max(0,1 − 𝑦! 𝑓& 𝑥! ), 𝑓& 𝑥! = 𝑤 ) 𝑥! + 𝑏

(Zisserman 2015)
!
25.06.22 44
Sub-gradient descent algorithm for SVM
/
1 𝜆 0
𝒞 𝑤 = ) 𝑤 + ℒ(𝑥- , 𝑦- ; 𝑤)
𝑁 2
-.(

The iterative update is


𝑤12( ≔ 𝑤1 − 𝜂∇3 𝒞 w4 ≔
/
1
≔ 𝑤1 − 𝜂 ) 𝜆𝑤1 + ∇3 ℒ(𝑥- , 𝑦- ; 𝑤)
𝑁
-.(

Then each iteration t involves cycling through the training data with the updates:

𝑤 − 𝜂 𝜆𝑤1 − 𝑦- 𝑥- , if 𝑦- 𝑓< 𝑥- < 1


𝑤12( ≔9 1
𝑤1 − 𝜂𝜆𝑤1 , otherwise
(
Typical learning rate in Pegasos: 𝜂1 =
51

25.06.22 45
?
Questions 4:
Gradient descent

Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 25.06.22 46


5 The dual problem
Primal vs dual problem
• SVM is a linear classifier: 𝑓& 𝑥 = 𝑤 ) 𝑥 + 𝑏
• The primal problem: an optimization problem over w:
0
1
min& 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

• The dual problem: Getting rid of the 𝑤 for a slightly different


representation of 𝑓& 𝑥 leads to the following representation
0
𝑓& 𝑥 = R 𝛼! 𝑦! 𝑥!) 𝑥 + 𝑏
!/*
and a new optimization problem with the same solution, but several
advantages. Let us show this on following slides... 25.06.22 49
Revisit Optimization Problem for Hard Margin Case
• Minimize the quadratic form
𝑤 , 𝑤)𝑤
=
2 2
• With constraints
𝑦! 8 𝑤 ) 𝑥! + 𝑏 ≥ 1 ∀𝑖

• The constraints will reach a value of 1 for at least one instance.

• Include hard constraints into the loss function:

, 0
𝑤
ℒ(𝑤, 𝑏, 𝛼) = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1
2
!/*
• failed constraints “punish“ the objective function 25.06.22 50
Excursion: Lagrange Multiplier

• We want to maximize a function 𝑓 𝑥


under the constraints 𝑔 𝑥 = 𝑎

Solution with Lagrange Multiplier


• Optimize the Lagrangian
𝑓 𝑥 −𝜆 𝑔 𝑥 −𝑎
instead!
Nicely visual explanation of Lagrange optimization at
https://www.svm-tutorial.com/2016/09/duality-lagrange-multipliers/
25.06.22 51
Algorithm for optimization with a Lagrange multiplier

1. Write down the Lagrangian 𝑓 𝑥 − 𝜆 ⋅ 𝑔 𝑥 − 𝑎


2. Take derivative of Lagrangian wrt x,
set it to 0
to find estimate of x that depends on 𝜆
3. Plug your estimate of x in the Lagrangian,
take the derivative wrt 𝜆,
and set it to 0,
to find the optimal value for the lagrange multiplier 𝜆
4. Plug in the Lagrange multiplier in your estimate for x

25.06.22 52
Revisit Optimization Problem for Hard Margin Case
• Minimize the quadratic form
𝑤 , 𝑤)𝑤
=
2 2
• With constraints
𝑦! 8 𝑤 ) 𝑥! + 𝑏 ≥ 1 ∀𝑖

• The constraints will reach a value of 1 for at least one instance.

• Include hard constraints into the loss function:

, 0 𝛼- are the
𝑤
ℒ(𝑤, 𝑏, 𝛼) = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1 Lagrange
2 multipliers
!/*
• failed constraints “punish“ the objective function 25.06.22 53
Lagrangian primal problem
• Lagrangian primal problem is:
min max ℒ(𝑤, 𝑏, 𝛼)
$,' 3

subject to ∀𝑖: 𝛼! ≥ 0

25.06.22 54
Finding the optimum

• Loss is a function of 𝑤, 𝑏, and 𝛼


, 0
𝑤
ℒ 𝑤, 𝑏, 𝛼 = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1
2
!/*

𝑤 is a linear
• Find optimum using derivatives: combination of the
data instances!
0
𝜕
ℒ 𝑤, 𝑏, 𝛼 = 0 ⟹ 0 = R 𝛼! 𝑦!
𝜕𝑏
!/*
0 0
𝜕
ℒ 𝑤, 𝑏, 𝛼 = 0 ⟹ 𝑤( = R 𝛼! 𝑦! 𝑥!,( ⟹ 𝑤 = R 𝛼! 𝑦! 𝑥!
𝜕𝑤(
!/* !/* 25.06.22 55
Substitution into ℒ(𝑤, 𝑏, 𝛼)
, 0
𝑤
ℒ 𝑤, 𝑏, 𝛼 |$/∑8 = − R 𝛼! 𝑦! 8 𝑤 ) 𝑥! + 𝑏 − 1 =
$67 3$ "$ %$ 2
!/*

) 6
0 0 0 0
1
= R 𝛼( 𝑦( 𝑥( R 𝛼5 𝑦5 𝑥5 − R 𝛼! 𝑦! 8 R 𝛼( 𝑦( 𝑥( 𝑥! + 𝑏 − 1 =
2
(/* 5/* !/* (/*

0 0
1
= R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5 − R 𝛼! 𝛼( 𝑦! 𝑦( 𝑥!) 𝑥( − 𝑏 R 𝛼! 𝑦! + R 𝛼! =
2
(,5 !,( !/* !/*

= =0
0
1
= ℒ(𝛼) = R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
2 25.06.22 56
!/* (,5
Wolfe dual problem

0
1
max R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
3 2
!/* (,5

subject to ∀𝑖: 𝛼! ≥ 0, and 0 = ∑0


!/* 𝛼! 𝑦!

• This problem is solvable with quadratic programming, because it fulfills


the Karush-Kuhn-Tucker conditions on 𝛼! that handle inequality
constraints (≥1) in the Lagrange optimization (not given here!).
• It gives us the classification function:
𝛼! is positive if
8 𝑥! is a support
vector
𝑓! 𝑥 = % 𝛼5 ' 𝑦5 ' 𝑥59 𝑥 + 𝑏
567
25.06.22 57
Non-separable Case (similar as before)
• Introduce (positive) „slack variables“ 𝜉! to allow deviations from the minimum
distance:

𝑦! 𝑤 " 𝑥! + 𝑏 ≥ 1 − 𝜉!

• Include a penalizing term in the optimization function:


% &

𝐶 > 𝜉!
!#$

• Transform to Lagrangian
• with additional Lagrange multipliers for the slack variables being
constrained to positive values ...

25.06.22 58
Summary: Primal and dual formulations

• Primal version of classifier


𝑓& 𝑥 = 𝑤 ) 𝑥 + 𝑏
• Dual version of classifier
0
𝑓& 𝑥 = R 𝛼! 8 𝑦! 8 𝑥!) 𝑥 + 𝑏
!/*

The dual form classifier seems to work like a kNN classifier, it requires
the training data points 𝑥! . However, many of the 𝛼! are zero.
The ones that are non-zero define the support vectors 𝑥! .

25.06.22 59
Summary: Primal and dual formulations
• Lagrangian primal problem is:
min max ℒ(𝑤, 𝑏, 𝛼)
$,' 3

subject to ∀𝑖: 𝛼! ≥ 0

• Lagrangian dual problem is:


0
1
max R 𝛼! − R 𝛼( 𝛼5 𝑦( 𝑦5 𝑥() 𝑥5
3 2
!/* (,5

subject to ∀𝑖: 𝛼! ≥ 0, and 0 = ∑0


!/* 𝛼! 𝑦!

25.06.22 60
6 Kernelization Tricks
in SVMs
Non-linear Case

• Not all classes can


be separated via a
hyperplane
• Essential:
• Dual representation uses
only the product of data
instances:
+

𝑓C 𝑥 = > 𝛼! G 𝑦! G 𝑥!" 𝑥 + 𝑏
!*%
• 𝑥- : i-th training instance
• 𝛼- : weight for i-th training instance

• Same for the Lagrangian... 25.06.22 62


Feature engineering using 𝝓(𝒙) Cf. lecture on
regression.
Chapter “beyond
• Classifier: Given 𝑥! ∈ ℝ7 , 𝜙: ℝ7 → ℝ8 , 𝑤 ∈ ℝ8 linear input”
𝑓& 𝑥 = 𝑤 ) 𝜙 𝑥 + 𝑏
• Learning:
0
1
min9 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

25.06.22 63
Example 1: From 1-dim to 2-dim

25.06.22 64
Example 2: From 2-dim to 3-dim

(Zisserman 2015)

25.06.22 65
Feature engineering using 𝝓(𝒙)

• Classifier: Given 𝑥! ∈ ℝ7 , 𝜙: ℝ7 → ℝ8 , 𝑤 ∈ ℝ8
𝑓& 𝑥 = 𝑤 ) 𝜙 𝑥 + 𝑏
• Learning:
0
1
min9 𝑤 , + R max(0,1 − 𝑦! 𝑓& 𝑥! )
$∈ℝ 𝐶
!/*

• 𝜙 𝑥 maps to high dimensional space ℝ8 where data is separable


• If 𝐷 ≫ 𝑑 then there are many more parameters to learn for w
25.06.22 66
Dual classifier in transformed feature space
Classifier:
/

𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝑥-# 𝑥 + 𝑏
-.(
/ 𝜙(x)
⟹ 𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝜙 𝑥- #
𝜙(𝑥) + 𝑏 only occurs in pairs
"
-.( 𝜙 𝑥' 𝜙(𝑥! )
Learning:
/ Kernels
1 "
max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝑥!# 𝑥) 𝑘 𝑥' , 𝑥! = 𝜙 𝑥' 𝜙(𝑥! )
: 2
-.( !,)

/
1 #
⟹ max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝜙 𝑥! 𝜙(𝑥) )
: 2
-.( !,)

subject to ∀𝑖: 𝛼- ≥ 0, and 0 = ∑/


-.( 𝛼- 𝑦-
25.06.22 67
Dual classifier using kernels
Classifier:
/

𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝜙 𝑥- #
𝜙(𝑥) + 𝑏
-.(
/

⟹ 𝑓< 𝑥 = ) 𝛼- F 𝑦- F 𝑘(𝑥- , 𝑥) + 𝑏
-.(

Learning:

/
1 #
max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝜙 𝑥! 𝜙(𝑥) )
: 2
-.( !,)

/
1
⟹ max ) 𝛼- − ) 𝛼! 𝛼) 𝑦! 𝑦) 𝑘(𝑥! , 𝑥) )
: 2
-.( !,)

subject to ∀𝑖: 𝛼- ≥ 0, and 0 = ∑/


-.( 𝛼- 𝑦-
25.06.22 68
Example kernels
• Linear kernels: 𝑘 𝑥, 𝑥 < = 𝑥 # 𝑥 <

• Polynomial kernels: 𝑘 𝑥, 𝑥 < = 1 + 𝑥 # 𝑥 < = , for any 𝑑 > 0


• Contains all polynomial terms up to degree

$
!"!#
>
• Gaussian kernels: 𝑘 𝑥, 𝑥 < = 𝑒 $%$ , for 𝜎 > 0
• Infinite dimensional feature space
• Also called Radial basis function kernel (RBF)
• often works quite well!

• Graph kernels: random walk


• String kernels: ...
• build your own kernel for your own problem! 25.06.22 69
Summary on kernels

• „Instead of inventing funny non-linear features, we may directly invent


funny kernels“ (Toussaint 2019)
• Inventing a kernel is intuitive:
• 𝑘(𝑥, 𝑥′) expresses how correlated 𝑦 and 𝑦′ should be
• it is a meassure of similarity, it compares 𝑥 and 𝑥′.
• Specifying how ’comparable’ 𝑥 and 𝑥′ are is often more intuitive than
defining “features that might work”.

25.06.22 70
Background reading and more

• Smooth readong about SVMs: Alexandre Kowalczyk,


Support vector machines succinctly. Syncfusion. Free
ebook:
https://www.syncfusion.com/ebooks/support_vector_machines_succinctly
• Also talks about most efficient algorithms to be used for finding
support vectors (it is neither of the two presented here!)
?
Questions 6:
Kernel

Steffen Staab, Universität Stuttgart, @ststaab, https://www.ipvs.uni-stuttgart.de/departments/ac/ 25.06.22 72


7 Transductive
Classification
Transductive learning characteristics

Characteristics Use cases


• Training data AND • news recommender
test data known • spam classifier
at learning time
• document reorganization
• Learning happens
specifically for the given
test cases

Thorsten Joachims:
Transductive Inference for Text Classification using Support Vector
Machines. ICML 1999: 200-209
Maximum margin hyperplane

Training data … , 𝑥⃗! , 𝑦! , …


Test data … 𝑥⃗(∗ …
Loss function
; 5
1
∥ 𝑤 ∥, +𝐶 R 𝜉! + 𝐶 ∗ R 𝜉(∗
2
!/: (/:
Naive, intractable approach:
subject to:
• for every hyperplane:
∀;!/* : 𝑦! 𝑤 ⋅ 𝑥⃗! + 𝑏 ≥ 1 − 𝜉!
5
∀(/* : 𝑦(∗ 𝑤 ⋅ 𝑥⃗(∗ + 𝑏 ≥ 1 − 𝜉(∗ • classify 𝑥⃗(∗
∀;!/* : 𝜉! > 0 • compute loss
5
∀(/* : 𝜉(∗ > 0 76
Reuters data set experiments (3299 test documents)

25.06.22 77
Reuters data set experiments (17 training documents)

25.06.22 78
IPVS

Thank you!

Steffen Staab

E-Mail [email protected]
Telefon +49 (0) 711 685-To be defined
www. ipvs.uni-stuttgart.de/departments/ac/

Universität Stuttgart
Analytic Computing, IPVS
Universitätsstraße 32, 50569 Stuttgart

You might also like