0% found this document useful (0 votes)

88 views16 pages

Outline of Next 2 Lectures: Matrix Computations: Direct Methods I

The document outlines a lecture on parallel methods for solving dense linear algebra problems directly using techniques like Gaussian elimination and matrix factorization. It discusses existing sequential algorithms for operations like matrix-vector/matrix multiplication and eigendecomposition. It will then cover parallel algorithms for these operations with complexity estimates and existing parallel linear algebra libraries. It also motivates these types of problems in computational physics which can be reduced to solving systems of the form Ax=b.

Uploaded by

itrial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views16 pages

Outline of Next 2 Lectures: Matrix Computations: Direct Methods I

Uploaded by

itrial

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Matrix Computations: Direct Methods I

April 30, 2014

Lecture 10

Outline of Next 2 Lectures

•  Motivation for parallel solution of linear
algebra problems using direct methods
•  Brief discussion of existing sequential methods
for most relevant operations:
–  Gaussian Elimination / Matrix Factorization
–  Matrix-vector and matrix-matrix multiplication
–  Eigenvalue / eigenvector calculation

1
Outline of Next 2 Lectures
•  Parallel algorithms for these matrix operations
with complexity estimates
•  Existing parallel linear algebra subroutines and
libraries (PBLAS, ScaLAPACK, ATLAS, etc)
•  Similar discussions for these operations
performed on sparse matrices

Motivation:Dense Linear Algebra

•  Most problems in computational physics
can be reduced to the form
Ax = b
•  This is true whether the original problems
are linear or non-linear (with appropriate
linearization), whether the problems are 1D,
2D, 3D, and whether an approximate
factorization has been performed or not

2
Motivation:Dense Linear Algebra
•  In such cases, we may be solving for a
subset of the problem every time we solve
the equation A x = b, since A can be written
as
Ax = ( AI AJ AK ) x = b
with some factorization error
•  It is true that these matrices are typically
banded, and therefore, the cost of full
factorization is not necessary. Bear with us.

Motivation: Continuous Variables,

Continuous Parameters

Examples of such systems include

•  Heat flow: Temperature(position, time)

•  Diffusion: Concentration(position, time)
•  Electrostatic or Gravitational Potential:
Potential(position)
•  Fluid flow: Velocity,Pressure,Density(position,time)
•  Quantum mechanics: Wave-function(position,time)
•  Elasticity: Stress,Strain(position,time)

3
Example: Deriving the Heat Equation

0 x-h x x+h 1
Consider a simple problem
•  A bar of uniform material, insulated except at ends
•  Let u(x,t) be the temperature at position x at time t
•  Heat travels from x-h to x+h at rate proportional to:

d u(x,t) (u(x-h,t)-u(x,t))/h - (u(x,t)- u(x+h,t))/h

= C *
dt h

•  As h 0, we get the heat equation:

d u(x,t) d2 u(x,t)
= C * 2
dt dx

Implicit Solution
•  As with many (stiff) ODEs, need an implicit method
•  This turns into solving the following equation
(I + (z/2)*T) * u[:,i+1]= (I - (z/2)*T) *u[:,i]
•  Here I is the identity matrix and T is:

2 -1
Graph and “stencil”
-1 2 -1

T = -1 2 -1
-1 2 -1 -1 2 -1
-1 2

•  I.e., essentially solving Poisson’s equation in 1D

4
2D Implicit Method
•  Similar to the 1D case, but the matrix T is now
Graph and “stencil”
4 -1 -1
-1 4 -1 -1
-1
-1 4 -1
-1 4 -1 -1 -1 4 -1

T = -1 -1 4 -1 -1
-1
-1 -1 4 -1
-1 4 -1
-1 -1 4 -1
-1 -1 4
•  Multiplying by this matrix (as in the explicit case) is simply
nearest neighbor computation on 2D grid
•  To solve this system, there are several techniques

Algorithms for 2D Poisson Equation with N unknowns

Algorithm Serial PRAM Memory #Procs

•  Dense LU N3 N N2 N2
•  Band LU N2 N N3/2 N
•  Jacobi N2 N N N
•  Explicit Inv. N log N N N
•  Conj.Grad. N 3/2 1/2
N *log N N N
•  RB SOR N 3/2 N 1/2 N N
•  Sparse LU N 3/2 N 1/2 N*log N N
•  FFT N*log N log N N N
•  Multigrid N log2 N N N
•  Lower bound N log N N

5
Building Blocks in Linear Algebra
•  BLAS (Basic Linear Algebra Subprograms)
created / defined in 1979 by Lawson et. al
•  BLAS intends to modularize problems in linear
algebra by identifying typical operations present in
complex algorithms in linear algebra, and defining
a standard interface to them
•  This way, hardware vendors can optimize their
own version of BLAS and allow users’ programs
to run efficiently with simple recompilation
•  Optimized BLAS implementations are usually
hand-tuned (and coded in assembly language)

Building Blocks in Linear Algebra

•  BLAS routines have to be simple enough that high
levels of optimization can be obtained
•  BLAS routines have to be general enough so that
complex algorithms can be constructed as
sequences of calls to these basic routines
•  Others (LINPACK, LAPACK, EISPACK, etc.)
have followed suit and have tried to do a similar
job for a variety of linear algebra problems

6
Building Blocks in Linear Algebra
•  BLAS advantages:
–  Robustness: BLAS routines are programmed with
robustness in mind. Various exit conditions can be
diagnosed from the routines themselves, overflow is
predicted, and general pivoting algorithms are
implemented
–  Portability: the calling API is fixed; hardware vendors
optimize behind-the-scenes
–  Readability: since the names of BLAS routines are
fairly common, one knows exactly what a program is
doing by reading the source code; auto-documentation.

BLAS Level 1 Routines

•  Perform low level functions (typically operations
between vectors like dot products, sums, etc.)
•  4 or 5 letter names preceded by s, d, c, z to
indicate precision type. For example, DAXPY is a
double precision addition of a vector and another
one multiplied by a scalar
•  Typical operations are O(n), where n is the length
of the vectors being operated on
•  Large ratio of floating point operations to memory
loads and stores prevents high Mflop rating of
these routines in most computers

7
BLAS Level 1 Routines
•  Typical operations

y ← αx + y
x ← αx
dot ← xT y
asum ← re( x) + im( x)
nrm 2 ← x 2
a max ← 1st k ∍ re( xk ) + img ( xk ) = x ∞

BLAS Level 1 Routines

•  One of the most basic operations, a matrix-vector
multiply, can actually be done with a sequence of
n SAXPY operations, but the result vector is
stored to memory and retrieved from it at every
step, but it could have remained in memory for the
actual computation.
•  BLAS Level 2 routines add functionality to help
out in this situation

8
BLAS Level 2 Routines
•  Level 1 BLAS routines do not have enough
granularity to achieve high performance: reuse of
registers must occur because of high cost of
memory accesses and limitations in current chip
architectures
•  Optimization at least at the level of matrix-vector
operations is necessary. Level 1 disallows this by
hiding details from the compiler
•  Level 2 BLAS includes these kinds of operations
which typically involve O(m n) operations, where
the matrices involved have size m x n

BLAS Level 2 Routines

•  Typical operations involve:
y = αAx + βy
y = αAT x + βy
y = Tx
y =TTx
x = T −1 x
•  as well as rank-1 and rank-2 updates to a matrix
(optimization).

9
BLAS Level 2 Routines
•  Additional operations for banded, Hermitian,
triangular, etc. matrices are also available (look at
“man blas” on junior)
•  Efficiency of implementations can be increased in
this way, but there are drawbacks for cache-based
architectures which still want to reuse memory as
much as possible.
•  Level 3 BLAS addresses this problem

BLAS Level 3 Routines

•  Sometimes it is preferable to decompose matrices
into blocks to perform various operations on a
matrix-matrix basis
•  Data reuse is enhanced in this way
•  Typically obtain O(n^3) operations with O(n^2)
data references (similar to granularity surface-to-
volume effect discussed earlier)
•  Two opportunities for parallelism:
–  operations on distinct blocks may be done in parallel
–  operations within a block may have loop-level
parallelism

10
BLAS Level 3 Routines
•  Typical operations involve matrix-matrix products
C = αAB + βY
C = αAAT + βC
B = αTB
B = αT −1 B
•  as well as rank-k updates and solutions of systems
involving triangular matrices
•  Better performance is achieved

BLAS Level 3 Routines

Peak
BLAS 3

BLAS 2

BLAS 1

11
Matrix Problem Solution, Ax=b
•  The main steps in the solution process are
–  Fill: computing the matrix elements of A
–  Factor: factoring the dense matrix A
–  Solve: solving for one or more right hand sides,b

Task Work Parallelism Parallel Sp

Fill O(n**2) embarrassing low

Factor O(n**3) moderately diff. very high
Solve O(n**2) moderately diff. high
Field Calc. O(n) embarrassing high

Review of Gaussian Elimination (GE) for

solving Ax=b
•  Add multiples of each row to later rows to make A upper
triangular
•  Solve resulting triangular system Ux = c by substitution
… for each column i
… zero it out below the diagonal by adding multiples of row i to later rows
for i = 1 to n-1
… for each row j below row i
for j = i+1 to n
… add a multiple of row i to row j
for k = i to n
A(j,k) = A(j,k) - (A(j,i)/A(i,i)) * A(i,k)

12
Refine GE Algorithm (1)
•  Initial Version
… for each column i
… zero it out below the diagonal by adding multiples of row i to later rows
for i = 1 to n-1
… for each row j below row i
for j = i+1 to n
… add a multiple of row i to row j
for k = i to n
A(j,k) = A(j,k) - (A(j,i)/A(i,i)) * A(i,k)

•  Remove computation of constant A(j,i)/A(i,i)

from inner loop
for i = 1 to n-1
for j = i+1 to n
m = A(j,i)/A(i,i)
for k = i to n
A(j,k) = A(j,k) - m * A(i,k)

Refine GE Algorithm (2)

•  Last version
for i = 1 to n-1
for j = i+1 to n
m = A(j,i)/A(i,i)
for k = i to n
A(j,k) = A(j,k) - m * A(i,k)

•  Don’t compute what we already know:

zeros below diagonal in column i
for i = 1 to n-1
for j = i+1 to n
m = A(j,i)/A(i,i)
for k = i+1 to n
A(j,k) = A(j,k) - m * A(i,k)

13
Refine GE Algorithm (3)
•  Last version
for i = 1 to n-1
for j = i+1 to n
m = A(j,i)/A(i,i)
for k = i+1 to n
A(j,k) = A(j,k) - m * A(i,k)

•  Store multipliers m below diagonal in zeroed

entries for later use
for i = 1 to n-1
for j = i+1 to n
A(j,i) = A(j,i)/A(i,i)
for k = i+1 to n
A(j,k) = A(j,k) - A(j,i) * A(i,k)

Refine GE Algorithm (4)

•  Last version for i = 1 to n-1
for j = i+1 to n
A(j,i) = A(j,i)/A(i,i)
for k = i+1 to n
A(j,k) = A(j,k) - A(j,i) * A(i,k)

•  Express using matrix operations (BLAS)

for i = 1 to n-1
A(i+1:n,i) = A(i+1:n,i) / A(i,i)
A(i+1:n,i+1:n) = A(i+1:n , i+1:n )
- A(i+1:n , i) * A(i , i+1:n)

14
What GE really computes
for i = 1 to n-1
A(i+1:n,i) = A(i+1:n,i) / A(i,i)
A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n)

•  Call the strictly lower triangular matrix of multipliers M, and

let L = I+M
•  Call the upper triangle of the final matrix U
•  Lemma (LU Factorization): If the above algorithm
terminates (does not divide by zero) then A = L*U
•  Solving A*x=b using GE
–  Factorize A = L*U using GE (cost = 2/3 n3 flops)
–  Solve L*y = b for y, using substitution (cost = n2 flops)
–  Solve U*x = y for x, using substitution (cost = n2 flops)
•  Thus A*x = (L*U)*x = L*(U*x) = L*y = b as desired

Problems with basic GE algorithm

•  What if some A(i,i) is zero? Or very small?
–  Result may not exist, or be “unstable”, so need to pivot
•  Current computation all BLAS 1 or BLAS 2, but we know
that BLAS 3 (matrix multiply) is fastest
for i = 1 to n-1
A(i+1:n,i) = A(i+1:n,i) / A(i,i) … BLAS 1 (scale a vector)
A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) … BLAS 2 (rank-1 update)
- A(i+1:n , i) * A(i , i+1:n)

Peak
BLAS 3

BLAS 2
BLAS 1

15
Pivoting in Gaussian Elimination
°  A = [0 1] fails completely, even though A is “easy”
[1 0]

°  Illustrate problems in 3-decimal digit arithmetic:

A = [ 1e-4 1 ] and b = [ 1 ], correct answer to 3 places is x = [ 1 ]

[ 1 1 ] [2] [1]

°  Result of LU decomposition is
L=[ 1 0] = [ 1 0 ] … No roundoff error yet
[ fl(1/1e-4) 1 ] [ 1e4 1 ]

U = [ 1e-4 1 ] = [ 1e-4 1 ] … Error in 4th decimal place

[0 fl(1-1e4*1) ] [ 0 -1e4 ]

Check if A = L*U = [ 1e-4 1] … (2,2) entry entirely wrong

[ 1 0]

°  Algorithm “forgets” (2,2) entry, gets same L and U for all |A(2,2)|<5
°  Numerical instability
°  Computed solution x totally inaccurate
°  Cure: Pivot (swap rows of A) so entries of L and U bounded

Gaussian Elimination with Partial Pivoting

(GEPP)
°  Partial Pivoting: swap rows so that each multiplier
|L(i,j)| = |A(j,i)/A(i,i)| <= 1

for i = 1 to n-1
find and record k where |A(k,i)| = max{i <= j <= n} |A(j,i)|
… i.e. largest entry in rest of column i
if |A(k,i)| = 0
exit with a warning that A is singular, or nearly so
elseif k != i
swap rows i and k of A
end if
A(i+1:n,i) = A(i+1:n,i) / A(i,i) … each quotient lies in [-1,1]
A(i+1:n,i+1:n) = A(i+1:n , i+1:n ) - A(i+1:n , i) * A(i , i+1:n)

°  Lemma: This algorithm computes A = PLU, where P is a

permutation matrix
°  Since each entry of |L(i,j)| <= 1, this algorithm is considered
numerically stable
°  For details see LAPACK code at www.netlib.org/lapack/single/sgetf2 and
Dongarra’s book

HPC Linear
No ratings yet
HPC Linear
52 pages
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
No ratings yet
CS 240A: Solving Ax B in Parallel: Dense A: Gaussian Elimination With Partial Pivoting (LU)
35 pages
Num Lin Alg Software
No ratings yet
Num Lin Alg Software
28 pages
Current Trends in Numerical Linear Algebra
No ratings yet
Current Trends in Numerical Linear Algebra
19 pages
Iterative Methods for Sparse Systems
No ratings yet
Iterative Methods for Sparse Systems
24 pages
Sparse 1
No ratings yet
Sparse 1
68 pages
Linear Algebra: Assignment I
No ratings yet
Linear Algebra: Assignment I
11 pages
Ecp2018 Magma Tutorial 1
No ratings yet
Ecp2018 Magma Tutorial 1
50 pages
Matrix Computations in Numerical Analysis
No ratings yet
Matrix Computations in Numerical Analysis
63 pages
Applied Numerical Linear Algebra. Lecture 5
No ratings yet
Applied Numerical Linear Algebra. Lecture 5
52 pages
Second Midterm Exam
No ratings yet
Second Midterm Exam
11 pages
Lecture5 Lenguajes de Programacion v3
No ratings yet
Lecture5 Lenguajes de Programacion v3
33 pages
Numerical Solution of Linear Systems: Chen Greif
No ratings yet
Numerical Solution of Linear Systems: Chen Greif
59 pages
NC Final SP 2024
No ratings yet
NC Final SP 2024
12 pages
2 Basis Art Rev
No ratings yet
2 Basis Art Rev
10 pages
ماتركس
No ratings yet
ماتركس
16 pages
BIOENG 1330/2330 Biomedical Imaging FALL 2015: Sowmya Aggarwal Ker-Jiun Wang University of Pittsburgh
No ratings yet
BIOENG 1330/2330 Biomedical Imaging FALL 2015: Sowmya Aggarwal Ker-Jiun Wang University of Pittsburgh
61 pages
Fundamentals of Numerical Linear Algebra
No ratings yet
Fundamentals of Numerical Linear Algebra
265 pages
Layton W., Sussman M. Numerical Linear Algebra 2020
No ratings yet
Layton W., Sussman M. Numerical Linear Algebra 2020
274 pages
BHH 93
No ratings yet
BHH 93
27 pages
Solvingsingular Linear Equation
No ratings yet
Solvingsingular Linear Equation
49 pages
Direct Methods For Space Matrices
No ratings yet
Direct Methods For Space Matrices
7 pages
Lecture 1: Sci. Comp. For Dphil Stduents
No ratings yet
Lecture 1: Sci. Comp. For Dphil Stduents
5 pages
Notes About Numerical Methods With Matlab
No ratings yet
Notes About Numerical Methods With Matlab
50 pages
Matlab Lab Manual
33% (3)
Matlab Lab Manual
49 pages
Numerical Methods For Partial Differential Algebraic Systems of Equations
No ratings yet
Numerical Methods For Partial Differential Algebraic Systems of Equations
61 pages
FDMcode
No ratings yet
FDMcode
9 pages
Bank 1985
No ratings yet
Bank 1985
4 pages
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
No ratings yet
G. W. Stewart - Matrix Algorithms-Society For Industrial and Applied Mathematics (1998)
479 pages
Fortran Numerical Modelling Guide
No ratings yet
Fortran Numerical Modelling Guide
53 pages
Numerical Linear Algebra
No ratings yet
Numerical Linear Algebra
45 pages
Unit-3-Floyd Warshal Algorithm
No ratings yet
Unit-3-Floyd Warshal Algorithm
22 pages
Numerical Solutions for CE Problems
No ratings yet
Numerical Solutions for CE Problems
19 pages
MATLAB for Engineering Students
No ratings yet
MATLAB for Engineering Students
45 pages
Pc10 NumLinAl I
No ratings yet
Pc10 NumLinAl I
58 pages
Cholesky Decomposition & MATLAB
No ratings yet
Cholesky Decomposition & MATLAB
33 pages
Linear Systems - Introduction To Computers & Engineering
No ratings yet
Linear Systems - Introduction To Computers & Engineering
12 pages
DIP TC-424 Manual Final
No ratings yet
DIP TC-424 Manual Final
55 pages
1 Methods For Solving Systems of Linear Equations
No ratings yet
1 Methods For Solving Systems of Linear Equations
21 pages
VSS NumericalLibraries
No ratings yet
VSS NumericalLibraries
21 pages
Direct Methods For Sparse Linear Systems by Timothy A. Davis
No ratings yet
Direct Methods For Sparse Linear Systems by Timothy A. Davis
230 pages
Lesson 2
No ratings yet
Lesson 2
75 pages
(Gene H. Golub, Charles F. Van Van Loan) Matrix Co (BookFi)
No ratings yet
(Gene H. Golub, Charles F. Van Van Loan) Matrix Co (BookFi)
732 pages
How To Implement Linear Regression
No ratings yet
How To Implement Linear Regression
16 pages
LecN6 R
No ratings yet
LecN6 R
6 pages
Matrix Operations Calculator
No ratings yet
Matrix Operations Calculator
33 pages
Numerical Analysis Algorithms in MATLAB
No ratings yet
Numerical Analysis Algorithms in MATLAB
16 pages
ln04 Eg501v Fall19
No ratings yet
ln04 Eg501v Fall19
11 pages
Matlab®: Academic Resource Center
No ratings yet
Matlab®: Academic Resource Center
40 pages
The Notesaasdfsadf
No ratings yet
The Notesaasdfsadf
44 pages
Sparse Matrix Algorithms Overview
No ratings yet
Sparse Matrix Algorithms Overview
36 pages
Iterative Methods for Large Sparse Matrices
No ratings yet
Iterative Methods for Large Sparse Matrices
7 pages
HPC Iterative
No ratings yet
HPC Iterative
106 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Linear Algebra II
No ratings yet
Linear Algebra II
3 pages
Ogura 1992 Jpn. J. Appl. Phys. 31 1485 PDF
No ratings yet
Ogura 1992 Jpn. J. Appl. Phys. 31 1485 PDF
7 pages
1 872249 PDF
No ratings yet
1 872249 PDF
7 pages
Evaporation Rate For Uranium As A Function of Temperature: (Source Dimension: 13 CM X 0.6 CM)
No ratings yet
Evaporation Rate For Uranium As A Function of Temperature: (Source Dimension: 13 CM X 0.6 CM)
4 pages
Howto Indent Printer Cartridge
No ratings yet
Howto Indent Printer Cartridge
6 pages
Control System Framework for GSI Experiments
No ratings yet
Control System Framework for GSI Experiments
1 page
Laing Magnetic Trapping
No ratings yet
Laing Magnetic Trapping
10 pages
Control System Framework for Experiments
No ratings yet
Control System Framework for Experiments
3 pages
Thesis Optical Trap
100% (1)
Thesis Optical Trap
132 pages
Talk IonTrap Collisions
No ratings yet
Talk IonTrap Collisions
15 pages
Part 1-Financial Computation in C
No ratings yet
Part 1-Financial Computation in C
177 pages
C++ Programming Concepts Overview
No ratings yet
C++ Programming Concepts Overview
2 pages
FISH Programming Basics for Beginners
No ratings yet
FISH Programming Basics for Beginners
14 pages
Technic For Faster PL SQL
100% (2)
Technic For Faster PL SQL
45 pages
SAP HANA ABAP Specialist Exam Review
No ratings yet
SAP HANA ABAP Specialist Exam Review
17 pages
C++ OOP Concepts: Inheritance & Encapsulation
No ratings yet
C++ OOP Concepts: Inheritance & Encapsulation
58 pages
OOP Concepts in C++ Programming
No ratings yet
OOP Concepts in C++ Programming
34 pages
Instruction Sets: Characteristics & Functions
100% (2)
Instruction Sets: Characteristics & Functions
40 pages
Standard BAS Module (VB6)
No ratings yet
Standard BAS Module (VB6)
3 pages
Syllabus Adca Dca Pgdca PDF
No ratings yet
Syllabus Adca Dca Pgdca PDF
58 pages
Oracle PL SQL Interview Questions For 3 Years Exp
91% (11)
Oracle PL SQL Interview Questions For 3 Years Exp
20 pages
PIPEPHASE98 Getting Started Guide
No ratings yet
PIPEPHASE98 Getting Started Guide
129 pages
Create a Simple Flutter App Guide
No ratings yet
Create a Simple Flutter App Guide
16 pages
Xii - Computer Science Reudced Syllabus Minimum Study Material 2021-2022
100% (1)
Xii - Computer Science Reudced Syllabus Minimum Study Material 2021-2022
67 pages
Assembler, Loader, and Linker Guide
No ratings yet
Assembler, Loader, and Linker Guide
64 pages
Emerson Delta-V Alarm Management PDF
100% (1)
Emerson Delta-V Alarm Management PDF
10 pages
Pps - Question Bank
No ratings yet
Pps - Question Bank
65 pages
EEE-342 Microprocessor Lab Manual
No ratings yet
EEE-342 Microprocessor Lab Manual
79 pages
C0VM: Programs as Data Explained
No ratings yet
C0VM: Programs as Data Explained
22 pages
Data Abstraction in OOP for COMP2012
No ratings yet
Data Abstraction in OOP for COMP2012
52 pages
Udf Wall Temp
No ratings yet
Udf Wall Temp
7 pages
C Programming: Arrays and Functions Guide
No ratings yet
C Programming: Arrays and Functions Guide
11 pages
Ray: A Distributed Framework For Emerging AI Applications
No ratings yet
Ray: A Distributed Framework For Emerging AI Applications
19 pages
Computer Programming 2
100% (1)
Computer Programming 2
43 pages
Lab Manual
100% (2)
Lab Manual
63 pages
C Arrays and Functions Explained
No ratings yet
C Arrays and Functions Explained
10 pages
AmiBroker AddToComposite Overview
No ratings yet
AmiBroker AddToComposite Overview
33 pages
Ur XX
No ratings yet
Ur XX
21 pages
4-VDLOAD @ABAQUS-User Sub Rout Ins (Reference Manual) - 2
No ratings yet
4-VDLOAD @ABAQUS-User Sub Rout Ins (Reference Manual) - 2
3 pages
Course Profile - CSE 122
No ratings yet
Course Profile - CSE 122
4 pages

Outline of Next 2 Lectures: Matrix Computations: Direct Methods I

Uploaded by

Outline of Next 2 Lectures: Matrix Computations: Direct Methods I

Uploaded by

Matrix Computations: Direct Methods I

April 30, 2014

Outline of Next 2 Lectures

Motivation:Dense Linear Algebra

Motivation: Continuous Variables,

Examples of such systems include

• Heat flow: Temperature(position, time)

d u(x,t) (u(x-h,t)-u(x,t))/h - (u(x,t)- u(x+h,t))/h

• As h 0, we get the heat equation:

• I.e., essentially solving Poisson’s equation in 1D

Algorithms for 2D Poisson Equation with N unknowns

Algorithm Serial PRAM Memory #Procs

Building Blocks in Linear Algebra

BLAS Level 1 Routines

BLAS Level 1 Routines

BLAS Level 2 Routines

BLAS Level 3 Routines

BLAS Level 3 Routines

Task Work Parallelism Parallel Sp

Fill O(n**2) embarrassing low

Review of Gaussian Elimination (GE) for

• Remove computation of constant A(j,i)/A(i,i)

Refine GE Algorithm (2)

• Don’t compute what we already know:

• Store multipliers m below diagonal in zeroed

Refine GE Algorithm (4)

• Express using matrix operations (BLAS)

• Call the strictly lower triangular matrix of multipliers M, and

Problems with basic GE algorithm

° Illustrate problems in 3-decimal digit arithmetic:

A = [ 1e-4 1 ] and b = [ 1 ], correct answer to 3 places is x = [ 1 ]

U = [ 1e-4 1 ] = [ 1e-4 1 ] … Error in 4th decimal place

Check if A = L*U = [ 1e-4 1] … (2,2) entry entirely wrong

Gaussian Elimination with Partial Pivoting

° Lemma: This algorithm computes A = P*L*U, where P is a

You might also like

•  Heat flow: Temperature(position, time)

•  As h 0, we get the heat equation:

•  I.e., essentially solving Poisson’s equation in 1D

•  Remove computation of constant A(j,i)/A(i,i)

•  Don’t compute what we already know:

•  Store multipliers m below diagonal in zeroed

•  Express using matrix operations (BLAS)

•  Call the strictly lower triangular matrix of multipliers M, and

°  Illustrate problems in 3-decimal digit arithmetic:

°  Lemma: This algorithm computes A = PLU, where P is a