0% found this document useful (0 votes)

186 views39 pages

CS 294-73 Software Engineering For Scientific Computing: Pcolella@berkeley - Edu Pcolella@lbl - Gov

This document provides an introduction and overview of the CS294-73 Software Engineering for Scientific Computing course. It outlines that the course grade will be based on homework assignments (60%) and a final group project (40%). It also describes the required software, expected programming skills, and introduces key concepts in scientific computing like models, discretization, and performance. The goal of the course is to teach skills for understanding and developing good software design for scientific applications.

Uploaded by

Edmund Zin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views39 pages

CS 294-73 Software Engineering For Scientific Computing: Pcolella@berkeley - Edu Pcolella@lbl - Gov

Uploaded by

Edmund Zin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 39

CS 294-73  

 
Software Engineering for
Scientific Computing 
 
[email protected] 
[email protected] 
 
Lecture 1: Introduction 
 
 
Grading

•  5-6 homework assignments, adding up to 60% of the grade.

•  The final project is worth 40% of the grade.
-  Project will be a scientific program, preferably in an area related to
your research interests or thesis topic.
-  Novel architectures and technologies are not encouraged (they will
need to run on a standard Mac OS X or Linux workstation)
-  For the final project only, you will self-organize into teams to develop
your proposal. Undergraduates may need additional help developing a
project proposal.

08/29/2019 CS294-73 - Lecture 1 2

Hardware/Software Requirements

•  Laptop or desktop computer on which you have root permission

•  Mac OS X or Linux operating system
-  Cygwin or MinGW on Windows *might* work, but we have limited
experience there to help you.

•  Installed software (this is your IDE)

-  Gcc or clang
-  GNU Make
-  gdb or lldb
-  Ssh
-  VisIt
-  Doxygen
-  emacs
-  LaTex

08/29/2019 CS294-73 - Lecture 1 3

Homework and Project submission

•  Submission will be done via the class source code repository (git).

•  On midnight of the deadline date the homework submission

directory is made read-only.

•  We will be setting up times for you to get accounts.

08/29/2019 CS294-73 - Lecture 1 4

What we are not going to teach you in class

•  Navigating and using Unix

•  Unix commands you will want to know
-  ssh
-  scp
-  tar
-  gzip/gunzip
-  ls
-  mkdir
-  chmod
-  ln

•  Emphasis in class lectures will be explaining what is really going on,

not syntax issues. We will rely heavily on online reference material,
available at the class website.
•  Students with no prior experience with C/C++ are strongly urged to
take CS9F. 5
08/29/2019 CS294-73 - Lecture 1
What is Scientific Computing ?

We will be mainly interested in scientific computing as it arises in

simulation.
The scientific computing ecosystem:
•  A science or engineering problem that requires simulation.

•  Models – must be mathematically well posed.

•  Discretizations – replacing continuous variables by a finite number
of discrete variables.
•  Software – correctness, performance.

•  Data – inputs, outputs.

•  Hardware.

•  People.

08/29/2019 CS294-73 - Lecture 1 6

What will you learn from taking this course ?

The skills and tools to allow you to understand (and perform) good
software design for scientific computing.
•  Programming: expressiveness, performance, scalability to large
software systems (otherwise, you could do just fine in matlab).
•  Data structures and algorithms as they arise in scientific
applications.
•  Tools for organizing a large software development effort (build tools,
source code control).
•  Debugging and data analysis tools.

08/29/2019 CS294-73 - Lecture 1 7

Why C++ ?

(Compare to Matlab, Python, ...).

•  Strong typing + compilation. Catch large class of errors at compile
time, rather than at run time.
•  Strong scoping rules. Encapsulation, modularity.

•  Abstraction, orthogonalization. Use of libraries and layered

design.
C++, Java, some dialects of Fortran support these techniques to
various degrees well. The trick is doing so without sacrificing
performance. In this course, we will use C++.
-  Strongly typed language with a mature compiler technology.
-  Powerful abstraction mechanisms.

08/29/2019 CS294-73 - Lecture 1

Who should take this course ?

Students who don’t have the skills listed above, and expect to need them
soon.
•  Expect to take CS 267.
•  Building or adding to a large software system as part of your research.

•  Interested in scientific computing.

•  Interested in high-performance computing.

•  Prior to this semester, EECS graduate students were not permitted to take
this course.

08/29/2019 CS294-73 - Lecture 1

A Cartoon View of Hardware

What is a performance model ?

•  A “faithful cartoon” of how source code gets executed.
•  Languages / compilers / run-time systems that allow you to
implement based on that cartoon.
•  Tools to measure performance in terms of the cartoon, and close
the feedback loop.

08/29/2019 CS294-73 - Lecture 1

The Von Neumann Architecture / Model

Devices

CPU Memory

Instructions
registers
or data

•  Data and instructions are equivalent in terms of the memory.

•  Instructions are executed in a sequential order implied by the
source code.
•  Really easy cartoon to understand, program to.

•  The extent to which the cartoon is an illusion can have substantial

impact on the performance of your program.

08/29/2019 CS294-73 - Lecture 1 11

Memory Hierarchy

•  Take advantage of the principle of locality to:

-  Present as much memory as in the cheapest technology
-  Provide access at speed offered by the fastest technology

Processor

Core Core Core

Tertiary
core cache core cache core cache Secondary Storage
Main Storage (Tape/
Controller
Memory

Shared Cache Second Memory

(Disk/ Cloud
O(106) Level (DRAM/ FLASH/ Storage)
Cache FLASH/
core cache core cache core cache PCM)
(SRAM) PCM)
Core Core Core

Latency (ns): ~1 ~5-10 ~100 ~107 ~1010

Size (bytes): ~106 ~109 ~1012 ~1015

08/29/2019 CS294-73 - Lecture 1

The Principle of Locality
•  The Principle of Locality:
-  Program access a relatively small portion of the address space at any
instant of time.

•  Two Different Types of Locality:

-  Temporal Locality (Locality in Time): If an item is referenced, it will tend
to be referenced again soon (e.g., loops, reuse)
-  so, keep a copy of recently read memory in cache.
-  Spatial Locality (Locality in Space): If an item is referenced, items whose
addresses are close by tend to be referenced soon
(e.g., straightline code, array access)
-  Guess where the next memory reference is going to be based on
your access history.

•  Processors have relatively lots of bandwidth to memory, but also very

high latency. Cache is a way to hide latency.
-  Lots of pins, but talking over the pins is slow.
-  DRAM is (relatively) cheap and slow. Banking gives you more bandwidth

08/29/2019 CS294-73 - Lecture 1

Programs with locality cache well ...

Memory Address (one dot per Bad locality behavior

Temporal
access)

Locality

Spatial
Locality
Time
Donald J. Hatfield, Jeanette Gerald: Program
Restructuring for Virtual Memory. IBM Systems
Journal 10(3): 168-192 (1971)
08/29/2019 CS294-73 - Lecture 1
Memory Hierarchy: Terminology

•  Hit: data appears in some block in the upper level (example: Block X)
-  Hit Rate: the fraction of memory access found in the upper level
-  Hit Time: Time to access the upper level which consists of
RAM access time + Time to determine hit/miss

•  Miss: data needs to be retrieve from a block in the lower level (Block
Y)
-  Miss Rate = 1 - (Hit Rate)
-  Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor

•  Hit Time << Miss Penalty

Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y

08/29/2019 CS294-73 - Lecture 1

Consequences for programming

•  A common way to exploit spatial locality is to try to get stride-1

memory access
-  Cache fetches a cache line worth of memory on each cache miss
-  Cache line can be 32-512 bytes (or more)

•  Each cache miss causes an access to the next deeper memory

hierarchy
-  Processor usually will sit idle while this is happening
-  When that cache-line arrives some existing data in your cache will be
ejected (which can result in a subsequent memory access resulting in
another cache miss. When this event happens with high frequency it is
called cache thrashing).

•  Caches are designed to work best for programs where data access
has lots of simple locality.

08/29/2019 CS294-73 - Lecture 1 16

But processor architectures keep changing

•  SIMD (vector) instructions: a(i) = b(i) + c(i), i = 1, … , 4 is as fast as

a0 = b0 + c0)
•  Non-uniform memory access
•  Many processing elements with varying performance

I will have someone give a guest lecture on this during the

semester. Otherwise, not our problem (but it will be in CS 267).

08/29/2019 CS294-73 - Lecture 1

Take a peek at your own computer

•  Most UNIX machines

-  >cat /etc/proc

•  Mac
-  >sysctl -a hw

08/29/2019 CS294-73 - Lecture 1 18

Seven Motifs of Scientific Computing
Simulation in the physical sciences and engineering is done out using
various combinations of the following core algorithms.
•  Structured grids
•  Unstructured grids

•  Dense linear algebra

•  Sparse linear algebra

•  Fast Fourier transforms

•  Particles
•  Monte Carlo (We won’t be doing this one)

Each of these has its own distinctive combination of computation and

data access.
There is a corresponding list for data (with significant overlap).
08/29/2019 CS294-73 - Lecture 1 19
Seven Motifs of Scientific Computing
Dwarf&Algorithm&Classification&by&node&
hours
•  Blue Waters usage patterns, in terms of motifs.

I/O
10%
Structured(
FFT Grid
16% 26%
Unstructured(Grid
Dense( 1%
Monte(Carlo
4% Matrix
N:Body
Sparse( 13%
16%
Matrix
14%

Figure 2.3-1 Colella’s seven dwarf classification of recognized applications run on Blue Wate
(by total node hours) in the study period, assuming equal weighting if an application is using
more than one algorithm in Table 10.0-1 in Appendix IV.
20
2.4#Numerical#Libraries##
08/29/2019 CS294-73 - Lecture 1
A “Big-O, Little-o” Notation

f = ⇥(g) if f = O(g) , g = O(f )

08/29/2019 CS294-73 - Lecture 1 21

Structured Grids

Used to represent continuously varying

quantities in space in terms of values on a
regular (usually rectangular) lattice.
= (x) ! i ⇡ (ih)
: B ! R , B ⇢ ZD

If B is a rectangle, data is stored in a contiguous block of memory.

B = [1, . . . , Nx ] ⇥ [1, . . . , Ny ]
i,j = chunk(i + (j 1)Nx )
Typical operations are stencil operations, e.g. to compute finite
difference approximations to derivatives.
1
L( )i,j = 2 ( i,j+1 + i,j 1 + i+1,j + i 1,j 4 i,j )
h
Small number of flops per memory access, mixture of unit stride
and non-unit stride.
08/29/2019 CS294-73 - Lecture 1 22
Structured Grids

In practice, things can get much more = (x) ! i ⇡ (ih)

complicated: For example, if B is a union of : B ! R , B ⇢ ZD
rectangles, represented as a list.

To apply stencil operations, need to get values from neighboring

rectangles.
1
L( )i,j = 2 ( i,j+1 + i,j 1 + i+1,j + i 1,j 4 i,j )
h
Can also have a nested hierarchy of grids, which means that
missing values must be interpolated.
Algorithmic / software issues: sorting, caching addressing
information, minimizing costs of irregular computation.
08/29/2019 CS294-73 - Lecture 1 23
Unstructured Grids

•  Simplest case: triangular / tetrahedral

elements, used to fit complex geometries.
Grid is specified as a collection of nodes,
organized into triangles.

N = {xn : n = 1, . . . , Nnodes }
E = {(xen1 , . . . , xenD+1 ) : e = 1, . . . , Nelts }

•  Discrete values of the function to be

represented are defined on nodes of the
grid.
= (x) is approximated by : N ⇥ R , n (xn )

•  Other access patterns required to solve PDE problems, e.g. find all of
the nodes that are connect to a node by an element. Algorithmic issues:
sorting, graph traversal.

08/29/2019 CS294-73 - Lecture 1 24

Dense Linear Algebra

Want to solve system of equations

0 10 1 0 1
a1,1 a1,2 a1,3 ··· a1,n x1 b1
B a2,1 a2,2 a2,3 ··· a2,n C B x 2 C B b2 C
B CB C B C
B a3,1 a3,2 a3,3 ··· a3,n C B C B C
B C B x 3 C = B b3 C
B .. .. .. .. .. C B .. C B .. C
@ . . . . . A@ . A @ . A
an,1 an,2 an,3 ··· an,n xn bn

08/29/2019 CS294-73 - Lecture 1 25

Dense linear algebra

0 Gaussian elimination:
10 1 0 1 0 10 1 0 1
a1,1 a1,2 a1,3 ··· a1,n x1 b1 a1,1 a1,2 a1,3 ··· a1,n x1 b1
B a2,1 a2,2 a2,3 ··· a2,n C B C B C B 0 a2,n C B x2 C B b2 C
C B C B
B C B x 2 C B b2 C B a2,2 a2,3 ··· C
B a3,1 a3,2 a3,3 ··· a3,n C B C B C B 0 a3,n C B x 3 C B b3 C
B C B x 3 C = B b3 C B
B ..
a3,2 a3,3 ··· CB C = B C
C B .. C B .. C
B .. .. .. .. .. C B .. C B .. C @ .
.. .. ..
A@ . A @ . A
@ . . . . . A@ . A @ . A . . .
an,1 an,2 an,3 ··· an,n xn bn 0 an,2 an,3 ··· an,n xn bn
ak,1
ak,l := ak,l a1,l
a1,1
ak,1
bl := bl b1
a1,1
0 10 1 0 1
a1,1 a1,2 a1,3 ··· a1,n x1 b1 The pth row reduction costs 2 (n-p)2
B 0 a2,2 a2,3 ··· a2,n C B C B C
B C B x 2 C B b2 C + O(n) flops, so that the total cost
B 0 0 a3,3 ··· a3,n C B C B C
B C B x 3 C = B b3 C is nX1
B .. .. .. .. .. C B .. C B .. C
@ . . . . . A@ . A @ . A 2(n p)2 + O(n2 ) = O(n3 )
0 0 an,3 ··· an,n xn bn p=1
ak,2
ak,l := ak,l a2,l Good for performance: unit stride
a2,2
ak,2 access, and O(n) flops per word of
bl := bl b2
a2,2 data accessed. But, if you have to
write back to main memory...
08/29/2019 CS294-73 - Lecture 1 26
Sparse Linear Algebra

⎛1.5 0 0 0 0 0 0 0 ⎞
⎜ ⎟
⎜ 0 2.3 0 1.4 0 0 0 0 ⎟
⎜ 0 0 3.7 0 0 0 0 0 ⎟
⎜ ⎟
⎜ 0 − 1.6 0 2.3 9.9 0 0 0 ⎟
A=⎜ ⎟
⎜ 0 0 0 0 5. 8 0 0 0 ⎟
⎜ 0 0 0 0 0 7.4 0 0 ⎟
⎜ ⎟
⎜ 0 0 1.9 0 0 0 4.9 0 ⎟
⎜ 0 0 0 0 0 0 0 3.6 ⎟⎠
⎝

Want to store only non-zeros, so use compressed-sparse-row storage (CSR) format.

JA 1 2 4 3 2 4 5 5 6 3 7 8

StA 1.5 2.3 1.4 3.7 –1.6 2.3 9.9 5.8 7.4 1.9 4.9 3.6

IA 1 2 4 5 8 9 10 12 13

08/29/2019 CS294-73 - Lecture 1 27

Sparse Linear Algebra

•  Matrix multiplication: indirect addressing.

Not a good fit for cache hierarchies.
IAk+1 1
X
(Ax)k = (StA)j xJAj , k = 1, . . . , 8
j=IAk

•  Gaussian elimination: fills in any columm

below a nonzero entry all the way to the
diagonal. Can attempt to minimize this by
reordering the variables.

•  Iterative methods for sparse matrices are based on applying the matrix to
the vector repeatedly. This avoids memory blowup from Gaussian
elimination, but need to have a good approximate inverse to work well.

08/29/2019 CS294-73 - Lecture 1 28

Fast Fourier Transform (Cooley and Tukey, 1965)

We also have
FkP (x) = Fk+P
P
(x)
So the number of flops to compute F N (x) is 2 N, given that you have

F N/2 (E(x)) , F N/2 (O(x))

08/29/2019 CS294-73 - Lecture 1 29

Fast Fourier Transform

08/29/2019 CS294-73 - Lecture 1 30

Fast Fourier Transform

N/2
If N = 2M , we can apply this to F (E(x)) , F N/2 (O(x)) :

The number of flops to compute these

smaller Fourier transforms is
is also 2 x 2 x (N/2) = 2 N, given that you
have the N/4 transforms. Can continue
this process until computing 2M-1 sets
of F 2 , each of which costs O(1) flops.
So the total number of flops is O(M N) =
O(N log N). The algorithm is recursive,
and the data access pattern is
complicated.

08/29/2019 CS294-73 - Lecture 1 31

Particle Methods

Collection of particles, either representing physical particles, or a

discretization of a continuous field.
{xk , v k , wk }N
k=1
dxk
= vk
dt
dv k
= F (xk )
Xdt
F (x) = wk0 (r )(x xk 0 )
k0

To evaluate the force for a single particle requires N evaluations

of r , leading to an O(N2) cost per time step.

08/29/2019 CS294-73 - Lecture 1 32

Particle Methods

To reduce the cost, need to localize the force calculation. For typical
force laws arising in classical physics, there are two cases.
•  Short-range forces (e.g. Lennard-Jones potential).
C1 C2
(x) =
|x|6 |x|12
⇥ (x) 0 if |x| >
The forces fall off sufficiently rapidly that the approximation
introduces acceptably small errors for practical values of the cutoff
distance.

08/29/2019 CS294-73 - Lecture 1 33

Particle Methods

•  Coulomb / Newtonian potentials

1
(x) = in 3D
|x|
=log(|x|) in 2D
cannot be localized by cutoffs without
an unacceptable loss of accuracy.
However, the far field of a given
particle, while not small, is smooth,
with rapidly decaying derivatives. Can
take advantage of that in various
ways. In both cases, it is necessary to
sort the particles in space, and
organize the calculation around which
particles are nearby / far away.

08/29/2019 CS294-73 - Lecture 1 34

Options: “Buy or Build?”

•  “Buy”: use software developed and maintained by someone else.

•  “Build”: write your own.
•  Some problems are sufficiently well-characterized that there are
bulletproof software packages freely available: LAPACK (dense
linear algebra), FFTW. You still need to understand their properties,
how to integrate it into your application.
•  “Build” – but what do you use as a starting point ?
-  Programming everything from the ground up.
-  Use a framework that has some of the foundational components built
and optimized.

•  Unlike LAPACK and FFTW, frameworks typically are not “black

boxes” – you will need to interact more deeply with them.

08/29/2019 CS294-73 - Lecture 1 35

Tradeoffs

•  Models – How faithfully does the model reproduce reality, versus

the cost of computing with that model ? Well-posedness, especially
stability to small perturbations in inputs (because numerical
approximations generate them).
•  Discretizations – replacing continuous variables by a finite number
of discrete variables. Numerical stability – the discrete system must
be resilient to arbitrary small perturbations to the inputs. Robustness
to off-design use.
•  Software – correctness, performance. How difficult is this to
implement / modify, especially for high performance ? Correctness /
performance debugging.
•  Data – inputs, outputs. How much data does this generate ? If it is
large, how do you look at it ?
The art of designing simulation software is navigating the tradeoffs
among these considerations to get the best scientific throughput.
08/29/2019 CS294-73 - Lecture 1 36
Roofline Model

•  An example of a cartoon for performance.

Empirical Roofline Graph (Results.cori1.nersc.gov.05/Run.002)
1000 844.5 GFLOPs/sec (Maximum)

s
B/
G
6
2.
69

s
B/
-4

G
L1

8
9.

s
GFLOPs / sec

B/
46

G
-1

.1
L2

88
100
-9

s
B/
L3

G
.8
07
-1
AM
R
D

10
0.01 0.1 1 10 100
FLOPs / Byte

1
L( )i,j = ( i,j+1 + i,j 1 + i+1,j + i 1,j 4 i,j )
h2
6 floating point operations (FLOPS), 16 Bytes data read / written.

08/29/2019 CS294-73 - Lecture 1

Roofline Model

•  An example of a cartoon for performance.

Single Socket Roofline for NERSC’s Cori (Haswell partition Cray XC40)

1000 GFLOP/s Spec

Multiply/Add

Add
ec
L1 Sp
L1

L2
100 L3

e c
Sp
GFLOP/s

AM
R
D

10
GMG/Cheby (fused)
GMG/Cheby
AMG/SpMV

FFT (2M)
FFT (1K)

1 DGEMM
0.01 0.1 1 10 100
FLOP/Byte

08/29/2019 CS294-73 - Lecture 1

What will you learn from taking this course ?

08/29/2019 CS294-73 - Lecture 1 39

4th Sem RR Campus Course Information
No ratings yet
4th Sem RR Campus Course Information
20 pages
02 Basicarch
No ratings yet
02 Basicarch
103 pages
Lecture02 Memhier MM Yelick20
No ratings yet
Lecture02 Memhier MM Yelick20
68 pages
Lecture 1
No ratings yet
Lecture 1
26 pages
NEP-B.Sc. (Computer Science) - Sem. I & II - (Syllabus)
No ratings yet
NEP-B.Sc. (Computer Science) - Sem. I & II - (Syllabus)
17 pages
$RCDVCIT
No ratings yet
$RCDVCIT
17 pages
CS54 System Software 3 1 0 4
No ratings yet
CS54 System Software 3 1 0 4
5 pages
09 ParallelizationRecap PDF
No ratings yet
09 ParallelizationRecap PDF
62 pages
217 Lec1
No ratings yet
217 Lec1
35 pages
MS in CSE
No ratings yet
MS in CSE
17 pages
Bits and Bytes
No ratings yet
Bits and Bytes
11 pages
Main Memory: Prof. Mike Giles
No ratings yet
Main Memory: Prof. Mike Giles
9 pages
Class01 cs230s22
No ratings yet
Class01 cs230s22
54 pages
Be Cse Syllabus
No ratings yet
Be Cse Syllabus
32 pages
Prof. Rajendra Singh (Rajju Bhaiya) University: (Two Years Degree Course in Computer Application)
No ratings yet
Prof. Rajendra Singh (Rajju Bhaiya) University: (Two Years Degree Course in Computer Application)
25 pages
SYSC2006 CourseOutlineWinter25
No ratings yet
SYSC2006 CourseOutlineWinter25
12 pages
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
No ratings yet
CSE 820 Graduate Computer Architecture Vectors and Multiprocessor Introduction
39 pages
CS 61C: Great Ideas in Computer Architecture: Course Introduction
No ratings yet
CS 61C: Great Ideas in Computer Architecture: Course Introduction
55 pages
CurriculumSyllabusBTCS 1 1 1
No ratings yet
CurriculumSyllabusBTCS 1 1 1
60 pages
Course Structure of IT502, Computer Architecture: Dr. Pramathanath Basu
No ratings yet
Course Structure of IT502, Computer Architecture: Dr. Pramathanath Basu
11 pages
Curriculum CSE 2021 v1-1
No ratings yet
Curriculum CSE 2021 v1-1
13 pages
CS Electives!
No ratings yet
CS Electives!
3 pages
Degree Program Course Contents 100L and 200L
No ratings yet
Degree Program Course Contents 100L and 200L
5 pages
Btech-CE 4
No ratings yet
Btech-CE 4
87 pages
2022-23 F.Y.B.Sc (Computer Science) (CBCS)
No ratings yet
2022-23 F.Y.B.Sc (Computer Science) (CBCS)
8 pages
Pps
No ratings yet
Pps
93 pages
Course Curriculum and Syllabus For B. Tech. Programme: Motilal Nehru National Institute of Technology Allahabad
No ratings yet
Course Curriculum and Syllabus For B. Tech. Programme: Motilal Nehru National Institute of Technology Allahabad
56 pages
CS2106 Lec1 Intro
No ratings yet
CS2106 Lec1 Intro
97 pages
Programming 1 Autumn 2023
No ratings yet
Programming 1 Autumn 2023
10 pages
F11 - Cache Aware Programming For Multicores
No ratings yet
F11 - Cache Aware Programming For Multicores
20 pages
ECE 4750 Computer Architecture, Fall 2019 Course Syllabus
No ratings yet
ECE 4750 Computer Architecture, Fall 2019 Course Syllabus
11 pages
B.C.A. Syllabus, PSC
No ratings yet
B.C.A. Syllabus, PSC
30 pages
Course Overview: Computer Architecture and Organization
No ratings yet
Course Overview: Computer Architecture and Organization
38 pages
Syllabus PDF
No ratings yet
Syllabus PDF
4 pages
RGU M.Tech (CSE) Syllabus
No ratings yet
RGU M.Tech (CSE) Syllabus
15 pages
Syllabus and Course Information
No ratings yet
Syllabus and Course Information
1 page
01 - Week - Lecture - General (2025AUT)
No ratings yet
01 - Week - Lecture - General (2025AUT)
21 pages
BTech - 2019 Course Scse 1.1.1
No ratings yet
BTech - 2019 Course Scse 1.1.1
250 pages
Comp Sci - Class X
No ratings yet
Comp Sci - Class X
6 pages
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
No ratings yet
(Applicable From The Academic Session 2018-2019) : Syllabus For B. Tech in Computer Science & Engineering
17 pages
Ecen 324 Fall 2010 Course Schedule: L# Date Topic Reading HW / Lab Exam
No ratings yet
Ecen 324 Fall 2010 Course Schedule: L# Date Topic Reading HW / Lab Exam
3 pages
WBUT 3rd Semester CSE Syllabus.
No ratings yet
WBUT 3rd Semester CSE Syllabus.
3 pages
Ece4750 Syllabus
No ratings yet
Ece4750 Syllabus
12 pages
Ch1 Cache Principles
No ratings yet
Ch1 Cache Principles
56 pages
Introduction To Computing Using Matlab: CS 1112 Fall 2013 (CS1142) Dr. K.-Y. Daisy Fan
No ratings yet
Introduction To Computing Using Matlab: CS 1112 Fall 2013 (CS1142) Dr. K.-Y. Daisy Fan
34 pages
Ecen 324 Fall 2010 Course Schedule, Rev 1: L# Date Topic Reading HW / Lab Exam
No ratings yet
Ecen 324 Fall 2010 Course Schedule, Rev 1: L# Date Topic Reading HW / Lab Exam
3 pages
CS530 Fall2015 Syllabus
No ratings yet
CS530 Fall2015 Syllabus
2 pages
Revised M.SC (Comp - Sci.) - I 2023-24 NEP 2020 Syllabus-1
No ratings yet
Revised M.SC (Comp - Sci.) - I 2023-24 NEP 2020 Syllabus-1
33 pages
Syllabus Spring2024
No ratings yet
Syllabus Spring2024
7 pages
Cse Syllabus Vi Semester
No ratings yet
Cse Syllabus Vi Semester
18 pages
PHD CSE
No ratings yet
PHD CSE
37 pages
Couse Book CSE
No ratings yet
Couse Book CSE
130 pages
Syllabus f22
No ratings yet
Syllabus f22
5 pages
BCS304 Notes
100% (1)
BCS304 Notes
167 pages
CIT 314 Module 1
No ratings yet
CIT 314 Module 1
59 pages
Cit 314
No ratings yet
Cit 314
167 pages
MCA - First - Year - Detailed - Syllabus - 2024-25
No ratings yet
MCA - First - Year - Detailed - Syllabus - 2024-25
33 pages
ECE 6913 Fall 2022 Syllabus1
No ratings yet
ECE 6913 Fall 2022 Syllabus1
4 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
Zig Programming: From Zero to Systems Master
From Everand
Zig Programming: From Zero to Systems Master
Niklas Hoffmann
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 12: Particle Methods Homework 3
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 12: Particle Methods Homework 3
32 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Development For Performance
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 14: Development For Performance
40 pages
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
No ratings yet
CS 294-73 Software Engineering For Scientific Computing Lecture 8: Unstructured Grids and Sparse Matrices
25 pages
CS 294-73 Software Engineering For Scientific Computing
No ratings yet
CS 294-73 Software Engineering For Scientific Computing
31 pages
CS 294-73 Software Engineering For Scientific Computing
No ratings yet
CS 294-73 Software Engineering For Scientific Computing
40 pages
7 String Matching: 7.1 Brute Force
No ratings yet
7 String Matching: 7.1 Brute Force
15 pages
3 Randomized Binary Search Trees: 3.1 Treaps
No ratings yet
3 Randomized Binary Search Trees: 3.1 Treaps
15 pages
2 Nuts and Bolts: 2.1 Deterministic vs. Randomized Algorithms
No ratings yet
2 Nuts and Bolts: 2.1 Deterministic vs. Randomized Algorithms
13 pages
The Urban-Rural Differences of Inflation in China
No ratings yet
The Urban-Rural Differences of Inflation in China
23 pages
B SC Economics Honours 2013-14 Syllabus PDF
No ratings yet
B SC Economics Honours 2013-14 Syllabus PDF
22 pages
Barenberg A - 208 PDF
No ratings yet
Barenberg A - 208 PDF
7 pages
Article7 PDF
No ratings yet
Article7 PDF
22 pages
Jhilick Latest
No ratings yet
Jhilick Latest
4 pages
A Deep Learning Based Static Taint Analysis Approach
No ratings yet
A Deep Learning Based Static Taint Analysis Approach
40 pages
Assignment 4 - OSF
No ratings yet
Assignment 4 - OSF
3 pages
Remote Radiotherapy Planning The EIMRT Project
No ratings yet
Remote Radiotherapy Planning The EIMRT Project
7 pages
Versa Training Lab Guide: Groups 1 - 2
No ratings yet
Versa Training Lab Guide: Groups 1 - 2
20 pages
MCA 2 Year Syllabus
No ratings yet
MCA 2 Year Syllabus
99 pages
Final ETI Micro Project Report
0% (1)
Final ETI Micro Project Report
17 pages
Solved Problems in Industrial Quality Control 20131 PDF
No ratings yet
Solved Problems in Industrial Quality Control 20131 PDF
59 pages
Installation en
No ratings yet
Installation en
1 page
Vlsi Interview Questions
0% (1)
Vlsi Interview Questions
10 pages
Data Collection
No ratings yet
Data Collection
7 pages
How To Restore Deleted Files From The Recycle Bin
No ratings yet
How To Restore Deleted Files From The Recycle Bin
1 page
Anurag Resume
No ratings yet
Anurag Resume
3 pages
Complete Guide To Install SCCM Software Update Point Role
No ratings yet
Complete Guide To Install SCCM Software Update Point Role
30 pages
Credit Card Usage Analysis Using KMeans Clustering Report
No ratings yet
Credit Card Usage Analysis Using KMeans Clustering Report
16 pages
Week 11 APP Tutorial Assignment
No ratings yet
Week 11 APP Tutorial Assignment
4 pages
Navneet Kaur PM 1
No ratings yet
Navneet Kaur PM 1
3 pages
Group Functions
No ratings yet
Group Functions
6 pages
HTSO by Tosif Ghazi
No ratings yet
HTSO by Tosif Ghazi
11 pages
Custom Iw 106: Product Specification Sheet
No ratings yet
Custom Iw 106: Product Specification Sheet
1 page
Badranaya Pitch Deck
No ratings yet
Badranaya Pitch Deck
29 pages
Data Mining Unit-IV
No ratings yet
Data Mining Unit-IV
37 pages
Finding The Groove
No ratings yet
Finding The Groove
7 pages
Lec 9 - System Analysis Part2
No ratings yet
Lec 9 - System Analysis Part2
17 pages
MSC Report - Final
No ratings yet
MSC Report - Final
142 pages
Internet Banking Manual - Final
No ratings yet
Internet Banking Manual - Final
11 pages
PS.2024.C3.Corte1.Pruebas de Integracion.223204.GallegosBorraz
No ratings yet
PS.2024.C3.Corte1.Pruebas de Integracion.223204.GallegosBorraz
6 pages
Global Innovation by Design Toshiba - A History of Leadership
No ratings yet
Global Innovation by Design Toshiba - A History of Leadership
6 pages
Barani Institute of Management Sciences: Final-Term Exam Fall-2019
No ratings yet
Barani Institute of Management Sciences: Final-Term Exam Fall-2019
2 pages
Toc QB It
No ratings yet
Toc QB It
15 pages

CS 294-73 Software Engineering For Scientific Computing: Pcolella@berkeley - Edu Pcolella@lbl - Gov

Uploaded by

CS 294-73 Software Engineering For Scientific Computing: Pcolella@berkeley - Edu Pcolella@lbl - Gov

Uploaded by

CS 294-73

• 5-6 homework assignments, adding up to 60% of the grade.

08/29/2019 CS294-73 - Lecture 1 2

• Laptop or desktop computer on which you have root permission

• Installed software (this is your IDE)

08/29/2019 CS294-73 - Lecture 1 3

• On midnight of the deadline date the homework submission

• We will be setting up times for you to get accounts.

08/29/2019 CS294-73 - Lecture 1 4

• Navigating and using Unix

• Emphasis in class lectures will be explaining what is really going on,

We will be mainly interested in scientific computing as it arises in

• Models – must be mathematically well posed.

• Data – inputs, outputs.

08/29/2019 CS294-73 - Lecture 1 6

08/29/2019 CS294-73 - Lecture 1 7

(Compare to Matlab, Python, ...).

• Abstraction, orthogonalization. Use of libraries and layered

08/29/2019 CS294-73 - Lecture 1

• Interested in scientific computing.

08/29/2019 CS294-73 - Lecture 1

What is a performance model ?

08/29/2019 CS294-73 - Lecture 1

• Data and instructions are equivalent in terms of the memory.

• The extent to which the cartoon is an illusion can have substantial

08/29/2019 CS294-73 - Lecture 1 11

• Take advantage of the principle of locality to:

Core Core Core

Shared Cache Second Memory

Latency (ns): ~1 ~5-10 ~100 ~107 ~1010

08/29/2019 CS294-73 - Lecture 1

• Two Different Types of Locality:

• Processors have relatively lots of bandwidth to memory, but also very

08/29/2019 CS294-73 - Lecture 1

Memory Address (one dot per Bad locality behavior

• Hit Time << Miss Penalty

08/29/2019 CS294-73 - Lecture 1

• A common way to exploit spatial locality is to try to get stride-1

• Each cache miss causes an access to the next deeper memory

08/29/2019 CS294-73 - Lecture 1 16

• SIMD (vector) instructions: a(i) = b(i) + c(i), i = 1, … , 4 is as fast as

I will have someone give a guest lecture on this during the

08/29/2019 CS294-73 - Lecture 1

• Most UNIX machines

08/29/2019 CS294-73 - Lecture 1 18

• Dense linear algebra

• Fast Fourier transforms

Each of these has its own distinctive combination of computation and

f = ⇥(g) if f = O(g) , g = O(f )

08/29/2019 CS294-73 - Lecture 1 21

Used to represent continuously varying

If B is a rectangle, data is stored in a contiguous block of memory.

In practice, things can get much more = (x) ! i ⇡ (ih)

To apply stencil operations, need to get values from neighboring

• Simplest case: triangular / tetrahedral

• Discrete values of the function to be

08/29/2019 CS294-73 - Lecture 1 24

Want to solve system of equations

08/29/2019 CS294-73 - Lecture 1 25

Want to store only non-zeros, so use compressed-sparse-row storage (CSR) format.

08/29/2019 CS294-73 - Lecture 1 27

• Matrix multiplication: indirect addressing.

• Gaussian elimination: fills in any columm

08/29/2019 CS294-73 - Lecture 1 28

F N/2 (E(x)) , F N/2 (O(x))

08/29/2019 CS294-73 - Lecture 1 29

08/29/2019 CS294-73 - Lecture 1 30

The number of flops to compute these

08/29/2019 CS294-73 - Lecture 1 31

Collection of particles, either representing physical particles, or a

To evaluate the force for a single particle requires N evaluations

08/29/2019 CS294-73 - Lecture 1 32

08/29/2019 CS294-73 - Lecture 1 33

• Coulomb / Newtonian potentials

08/29/2019 CS294-73 - Lecture 1 34

• “Buy”: use software developed and maintained by someone else.

CS 294-73  

•  5-6 homework assignments, adding up to 60% of the grade.

•  Laptop or desktop computer on which you have root permission

•  Installed software (this is your IDE)

•  On midnight of the deadline date the homework submission

•  We will be setting up times for you to get accounts.

•  Navigating and using Unix

•  Emphasis in class lectures will be explaining what is really going on,

•  Models – must be mathematically well posed.

•  Data – inputs, outputs.

•  Abstraction, orthogonalization. Use of libraries and layered

•  Interested in scientific computing.

•  Data and instructions are equivalent in terms of the memory.

•  The extent to which the cartoon is an illusion can have substantial

•  Take advantage of the principle of locality to:

•  Two Different Types of Locality:

•  Processors have relatively lots of bandwidth to memory, but also very

•  Hit Time << Miss Penalty

•  A common way to exploit spatial locality is to try to get stride-1

•  Each cache miss causes an access to the next deeper memory

•  SIMD (vector) instructions: a(i) = b(i) + c(i), i = 1, … , 4 is as fast as

•  Most UNIX machines

•  Dense linear algebra

•  Fast Fourier transforms

•  Simplest case: triangular / tetrahedral

•  Discrete values of the function to be

•  Matrix multiplication: indirect addressing.

•  Gaussian elimination: fills in any columm

•  Coulomb / Newtonian potentials

•  “Buy”: use software developed and maintained by someone else.

•  Unlike LAPACK and FFTW, frameworks typically are not “black

•  Models – How faithfully does the model reproduce reality, versus

•  An example of a cartoon for performance.

•  An example of a cartoon for performance.