0% found this document useful (0 votes)

73 views35 pages

Program Optimization Techniques

The document discusses various ways to optimize program performance, including: - Identifying hot spots in the code and only optimizing those parts - Using better algorithms and data structures to improve asymptotic complexity - Helping compilers perform optimizations by avoiding side effects and aliasing - Exploiting hardware features like caching, parallelism, and faster instruction types - Techniques like loop unrolling, inline functions, and custom memory allocators The overall goal is to improve performance through understanding algorithms, compilers, hardware, and profiling code execution.

Uploaded by

Brayan Barboza Girón

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views35 pages

Program Optimization Techniques

Uploaded by

Brayan Barboza Girón

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Program Optimization

Professor Jennifer Rexford http://www.cs.princeton.edu/~jrex

Goals of Todays Class

Improving program performance
When and what to optimize Better algorithms & data structures vs. tuning the code

Exploiting an understanding of underlying system

Compiler capabilities Hardware architecture Program execution

Why?
To be effective, and efficient, at making programs faster Avoid optimizing the fast parts of the code Help the compiler do its job better To review material from the second half of the course 2

Improving Program Performance

Most programs are already fast enough
No need to optimize performance at all Save your time, and keep the program simple/readable

Most parts of a program are already fast enough

Usually only a small part makes the program run slowly Optimize only this portion of the program, as needed

Steps to improve execution (time) efficiency

Do timing studies (e.g., gprof) Identify hot spots Optimize that part of the program Repeat as needed

Ways to Optimize Performance

Better data structures and algorithms
Improves the asymptotic complexity Better scaling of computation/storage as input grows E.g., going from O(n2) sorting algorithm to O(n log n) Clearly important if large inputs are expected Requires understanding data structures and algorithms

Better source code the compiler can optimize

Improves the constant factors Faster computation during each iteration of a loop E.g., going from 1000n to 10n running time Clearly important if a portion of code is running slowly Requires understanding hardware, compiler, execution 4

Helping the Compiler Do Its Job

Optimizing Compilers
Provide efficient mapping of program to machine
Register allocation Code selection and ordering Eliminating minor inefficiencies

Dont (usually) improve asymptotic efficiency

Up to the programmer to select best overall algorithm

Have difficulty overcoming optimization blockers

Potential function side-effects Potential memory aliasing

Limitations of Optimizing Compilers

Fundamental constraint
Compiler must not change program behavior Ever, even under rare pathological inputs

Behavior that may be obvious to the programmer can be obfuscated by languages and coding styles
Data ranges more limited than variable types suggest Array elements remain unchanged by function calls

Most analysis is performed only within functions

Whole-program analysis is too expensive in most cases

Most analysis is based only on static information

Compiler has difficulty anticipating run-time inputs 7

Avoiding Repeated Computation

A good compiler recognizes simple optimizations
Avoiding redundant computations in simple loops Still, programmer may still want to make it explicit

Example
Repetition of computation: n * i
for (i = 0; i < n; i++) for (j = 0; j < n; j++) a[n*i + j] = b[j]; for (i = int ni for (j a[ni } 0; i < n; i++) { = n * i; = 0; j < n; j++) + j] = b[j];

Worrying About Side Effects

Compiler cannot always avoid repeated computation
May not know if the code has a side effect that makes the transformation change the codes behavior

Is this transformation okay?

int func1(int x) { return f(x) + f(x) + f(x) + f(x); } int func1(int x) { return 4 * f(x); }

Not necessarily, if
int counter = 0; int f(int x) { return counter++; }

And this function may be defined in another file known only at link time! 9

Another Example on Side Effects

Is this optimization okay?
for (i = 0; i < strlen(s); i++) { /* Do something with s[i] */ } length = strlen(s); for (i = 0; i < length; i++) { /* Do something with s[i] */ }

Short answer: it depends

Compiler often cannot tell Most compilers do not try to identify side effects

Programmer knows best

And can decide whether the optimization is safe

Memory Aliasing
Is this optimization okay?
void twiddle(int *xp, int *yp) { *xp += *yp; *xp += *yp; }

void twiddle(int xp, int yp) { xp += 2 *yp; }

Not necessarily, what if xp and yp are equal?

First version: result is 4 times *xp Second version: result is 3 times *xp 11

Memory Aliasing
Memory aliasing
Single data location accessed through multiple names E.g., two pointers that point to the same memory location

Modifying the data using one name

Implicitly modifies the values seen through other names

xp, yp

Blocks optimization by the compiler

The compiler cannot tell when aliasing may occur and so must forgo optimizing the code

Programmer often does know

And can optimize the code accordingly

Another Aliasing Example

Is this optimization okay?
int *x, *y; *x = 5; *y = 10; printf(x=%d\n, *x);

printf(x=5\n);

Not necessarily
If y and x point to the same location in memory the correct output is x = 10\n 13

Summary: Helping the Compiler

Compiler can perform many optimizations
Register allocation Code selection and ordering Eliminating minor inefficiencies

But often the compiler needs your help

Knowing if code is free of side effects Knowing if memory aliasing will not happen

Modifying the code can lead to better performance

Profile the code to identify the hot spots Look at the assembly language the compiler produces Rewrite the code to get the compiler to do the right thing 14

Exploiting the Hardware

Underlying Hardware
Implements a collection of instructions
Instruction set varies from one architecture to another Some instructions may be faster than others

Registers and caches are faster than main memory

Number of registers and sizes of caches vary Exploiting both spatial and temporal locality

Exploits opportunities for parallelism

Pipelining: decoding one instruction while running another Benefits from code that runs in a sequence Superscalar: perform multiple operations per clock cycle Benefits from operations that can run independently Speculative execution: performing instructions before knowing they will be reached (e.g., without knowing outcome of a branch)

Addition Faster Than Multiplication

Adding instead of multiplying
Addition is faster than multiplication

Recognize sequences of products

Replace multiplication with repeated addition
for (i = int ni for (j a[ni } 0; i < n; i++) { = n * i; = 0; j < n; j++) + j] = b[j]; int ni = 0; for (i = 0; i < n; i++) { for (j = 0; j < n; j++) a[ni + j] = b[j]; ni += n; }

Bit Operations Faster Than Arithmetic

Shift operations to multiple/divide by powers of 2
x >> 3 is faster than x/8 x << 3 is faster than x * 8 53 0 0 1 1 0 1 0 1 53<<2 1 1 0 1 0 0 0 0

Bit masking is faster than mod operation

x & 15 is faster than x % 16 53 0 0 1 1 0 1 0 1 & 15 0 0 0 0 1 1 1 1 5 0 0 0 0 0 1 0 1 18

Caching: Matrix Multiplication

Caches
Slower than registers, but faster than main memory Both instruction caches and data caches

Locality
Temporal locality: recently-referenced items are likely to be referenced in near future Spatial locality: Items with nearby addresses tend to be referenced close together in time

Matrix multiplication
Multiply n-by-n matrices A and B, and store in matrix C Performance heavily depends on effective use of caches 19

Matrix Multiply: Cache Effects

for (i=0; i<n; i++) { for (j=0; j<n; j++) { for (k=0; k<n; k++) c[i][j] += a[i][k] * b[k][j]; } }

Reasonable cache effects

Good spatial locality for A Poor spatial locality for B Good temporal locality for C
(i,*)

(*,j) (i,j)

Matrix Multiply: Cache Effects

for (j=0; j<n; j++) { for (k=0; k<n; k++) { for (i=0; i<n; i++) c[i][j] += a[i][k] * b[k][j]; } }
(*,k) (k,j) (*,j)

Rather poor cache effects

Bad spatial locality for A Good temporal locality for B Bad spatial locality for C

Matrix Multiply: Cache Effects

for (k=0; k<n; k++) { for (i=0; i<n; i++) { for (j=0; j<n; j++) c[i][j] += a[i][k] * b[k][j]; } }

Good poor cache effects

Good temporal locality for A Good spatial locality for B Good spatial locality for C
(i,k) (k,*) (i,*)

Parallelism: Loop Unrolling

What limits the performance?
for (i = 0; i < length; i++) sum += data[i];

Limited apparent parallelism

One main operation per iteration (plus book-keeping) Not enough work to keep multiple functional units busy Disruption of instruction pipeline from frequent branches

Solution: unroll the loop

Perform multiple operations on each iteration 23

Parallelism: After Loop Unrolling

Original code
for (i = 0; i < length; i++) sum += data[i];

After loop unrolling (by three)

/* Combine three elements at a time */ limit = length 2; for (i = 0; i < limit; i+=3) sum += data[i] + data[i+1] + data[i+2]; /* Finish any remaining elements */ for ( ; i < length; i++) sum += data[i];

Program Execution

Avoiding Function Calls

Function calls are expensive
Caller saves registers and pushes arguments on stack Callee saves registers and pushes local variables on stack Call and return disrupt the sequence flow of the code

Function inlining:
void g(void) { /* Some code */ } void f(void) { g(); }

Some compilers support inline keyword directive.

void f(void) { /* Some code */ }

Writing Your Own Malloc and Free

Dynamic memory management
Malloc to allocate blocks of memory Free to free blocks of memory

Existing malloc and free implementations

Designed to handle a wide range of request sizes Good most of the time, but rarely the best for all workloads

Designing your own dynamic memory management

Forego using traditional malloc/free, and write your own E.g., if you know all blocks will be the same size E.g., if you know blocks will usually be freed in the order allocated E.g., <insert your known special property here>

Conclusion
Work smarter, not harder
No need to optimize a program that is fast enough Optimize only when, and where, necessary

Speeding up a program
Better data structures and algorithms: better asymptotic behavior Optimized code: smaller constants

Techniques for speeding up a program

Coax the compiler Exploit capabilities of the hardware Capitalize on knowledge of program execution 28

Course Wrap Up

The Rest of the Semester

Deans Date: Tuesday May 12
Final assignment due at 9pm Cannot be accepted after 11:59pm

Final Exam: Friday May 15

1:30-4:20pm in Friend Center 101 Exams from previous semesters are online at
http://www.cs.princeton.edu/courses/archive/spring09/cos217/exam2prep/

Covers entire course, with emphasis on second half of the term Open book, open notes, open slides, etc. (just no computers!) No need to print/bring the IA-32 manuals

Office hours during reading/exam period

Daily, times TBA on course mailing list

Review sessions
May 13-14, time TBA on course mailing list

Goals of COS 217

Understand boundary between code and computer
Machine architecture Operating systems Compilers

Learn C and the Unix development tools

C is widely used for programming low-level systems Unix has a rich development environment Unix is open and well-specified, good for study & research

Improve your programming skills

More experience in programming Challenging and interesting programming assignments Emphasis on modularity and debugging 31

Relationship to Other Courses

Machine architecture
Logic design (306) and computer architecture (471) COS 217: assembly language and basic architecture

Operating systems
Operating systems (318) COS 217: virtual memory, system calls, and signals

Compilers
Compiling techniques (320) COS 217: compilation process, symbol tables, assembly and machine language

Software systems
Numerous courses, independent work, etc. COS 217: programming skills, UNIX tools, and ADTs 32

Lessons About Computer Science

Modularity
Well-defined interfaces between components Allows changing the implementation of one component without changing another The key to managing complexity in large systems

Resource sharing
Time sharing of the CPU by multiple processes Sharing of the physical memory by multiple processes

Indirection
Representing address space with virtual memory Manipulating data via pointers (or addresses) 33

Lessons Continued
Hierarchy
Memory: registers, cache, main memory, disk, tape, Balancing the trade-off between fast/small and slow/big

Bits can mean anything

Code, addresses, characters, pixels, money, grades, Arithmetic can be done through logic operations The meaning of the bits depends entirely on how they are accessed, used, and manipulated

Have a Great Summer!!!

Lec02 2 Compiler Optimizations
No ratings yet
Lec02 2 Compiler Optimizations
32 pages
Lecture 11
No ratings yet
Lecture 11
26 pages
Code Optimization Techniques Overview
No ratings yet
Code Optimization Techniques Overview
57 pages
Serial Code Optimization Techniques
No ratings yet
Serial Code Optimization Techniques
31 pages
Program Optimization
No ratings yet
Program Optimization
63 pages
Optimizing Compilers Course Overview
No ratings yet
Optimizing Compilers Course Overview
13 pages
Compiler Optimization Techniques Guide
No ratings yet
Compiler Optimization Techniques Guide
10 pages
Unit IV
No ratings yet
Unit IV
57 pages
C Program Optimization Guide
No ratings yet
C Program Optimization Guide
2 pages
CODE Optimization
No ratings yet
CODE Optimization
50 pages
Copch 8
No ratings yet
Copch 8
18 pages
Basic Optimization Techniques for Code
100% (1)
Basic Optimization Techniques for Code
31 pages
Code Optimization for Developers
No ratings yet
Code Optimization for Developers
11 pages
CH5 1
No ratings yet
CH5 1
22 pages
C Programming Optimization Techniques
No ratings yet
C Programming Optimization Techniques
79 pages
Compiler Code Optimization Guide
No ratings yet
Compiler Code Optimization Guide
25 pages
Unit 5 Cd.
No ratings yet
Unit 5 Cd.
27 pages
Code Optimization Techniques in C
No ratings yet
Code Optimization Techniques in C
8 pages
Unit 5
No ratings yet
Unit 5
56 pages
280425
No ratings yet
280425
11 pages
Unit-5 Toc
No ratings yet
Unit-5 Toc
41 pages
Effective Code Optimization Techniques
No ratings yet
Effective Code Optimization Techniques
23 pages
Code Generation Techniques in Compilers
No ratings yet
Code Generation Techniques in Compilers
59 pages
Compiler Optimization Techniques Overview
No ratings yet
Compiler Optimization Techniques Overview
2 pages
Optimizing C Code for Microcontrollers
No ratings yet
Optimizing C Code for Microcontrollers
21 pages
Writing Optimized C Code For Microcontroller Applications
No ratings yet
Writing Optimized C Code For Microcontroller Applications
21 pages
Embedded C Programming Guide
100% (1)
Embedded C Programming Guide
57 pages
Key Compilation Techniques Explained
No ratings yet
Key Compilation Techniques Explained
15 pages
Chapter 10 - Code Optimization
No ratings yet
Chapter 10 - Code Optimization
11 pages
3 1
No ratings yet
3 1
12 pages
Program Optimization Techniques
No ratings yet
Program Optimization Techniques
23 pages
Compiler 9 (Code Optimization)
No ratings yet
Compiler 9 (Code Optimization)
9 pages
AT - Better C Code For ARM Devices
No ratings yet
AT - Better C Code For ARM Devices
30 pages
Code Optimization and Generation Techniques
No ratings yet
Code Optimization and Generation Techniques
7 pages
Compiler Optimization Techniques Guide
No ratings yet
Compiler Optimization Techniques Guide
9 pages
C Program Optimization Techniques
No ratings yet
C Program Optimization Techniques
37 pages
Understanding the Compilation Process
No ratings yet
Understanding the Compilation Process
20 pages
Program Design and Data Flow Analysis
No ratings yet
Program Design and Data Flow Analysis
60 pages
Code Optimization
0% (1)
Code Optimization
90 pages
Machine Dependent Code Optimization Techniques
No ratings yet
Machine Dependent Code Optimization Techniques
8 pages
Peephole Optimization Techniques Explained
No ratings yet
Peephole Optimization Techniques Explained
7 pages
Compiler Design: Code Optimization Techniques
No ratings yet
Compiler Design: Code Optimization Techniques
11 pages
CSE2002 Session38 Code Optimization3038
No ratings yet
CSE2002 Session38 Code Optimization3038
39 pages
Run-Time Storage Organization & Optimization
No ratings yet
Run-Time Storage Organization & Optimization
15 pages
Code Generation and Optimization Techniques
No ratings yet
Code Generation and Optimization Techniques
33 pages
Code Optimization Techniques Explained
No ratings yet
Code Optimization Techniques Explained
12 pages
Compiler Design Lec-8Code Generation and Optimization
No ratings yet
Compiler Design Lec-8Code Generation and Optimization
46 pages
UNIT 5 Notes CD
No ratings yet
UNIT 5 Notes CD
6 pages
Intermediate Code in Compiler Construction
No ratings yet
Intermediate Code in Compiler Construction
34 pages
ARM C Data Types and Optimization Techniques
No ratings yet
ARM C Data Types and Optimization Techniques
14 pages
Code Optimization
No ratings yet
Code Optimization
65 pages
Introduction To Data Structuretosend
No ratings yet
Introduction To Data Structuretosend
68 pages
Compiler Fundamentals by Christian Plessl
No ratings yet
Compiler Fundamentals by Christian Plessl
51 pages
Program Analysis and Optimization Basics
No ratings yet
Program Analysis and Optimization Basics
13 pages
Code Optimization Techniques in Compilation
No ratings yet
Code Optimization Techniques in Compilation
49 pages
2017 Waspmote Product Catalogue
No ratings yet
2017 Waspmote Product Catalogue
91 pages
2078-MINESParisTech-International Brochure 2015
No ratings yet
2078-MINESParisTech-International Brochure 2015
8 pages
Project Basico 10: A Tribute to Teachers
No ratings yet
Project Basico 10: A Tribute to Teachers
4 pages
Meta-Feature Extraction for ML Reproducibility
No ratings yet
Meta-Feature Extraction for ML Reproducibility
5 pages
E-Shopping System & DBMS Overview
No ratings yet
E-Shopping System & DBMS Overview
19 pages
Aspirants AS1143 11th Computer Applications List of 2m and 3m Questions English Medium
No ratings yet
Aspirants AS1143 11th Computer Applications List of 2m and 3m Questions English Medium
15 pages
RISC-V Rocket Chip Guide
No ratings yet
RISC-V Rocket Chip Guide
30 pages
Understanding GPU Architecture for AI
No ratings yet
Understanding GPU Architecture for AI
4 pages
Latency and Short-Term Outage in Digital Microwave Links
No ratings yet
Latency and Short-Term Outage in Digital Microwave Links
3 pages
Cache Problem Solution
No ratings yet
Cache Problem Solution
2 pages
Understanding Data Structures and Algorithms
No ratings yet
Understanding Data Structures and Algorithms
3 pages
Final Report Review 2
No ratings yet
Final Report Review 2
20 pages
Java Programming Course Curriculum
No ratings yet
Java Programming Course Curriculum
3 pages
TL-WN8200ND (Un) Ug V1
No ratings yet
TL-WN8200ND (Un) Ug V1
30 pages
Hardware and Networking Interview Prep
No ratings yet
Hardware and Networking Interview Prep
8 pages
PPS Material-2023
No ratings yet
PPS Material-2023
92 pages
Cybersecurity for Enterprises
No ratings yet
Cybersecurity for Enterprises
18 pages
QRadar SIEM Level 2 Quiz Results
No ratings yet
QRadar SIEM Level 2 Quiz Results
14 pages
STM Unit 1 Taxonomy of Bugs
No ratings yet
STM Unit 1 Taxonomy of Bugs
56 pages
Zoom Security Guidelines by CERT-In
No ratings yet
Zoom Security Guidelines by CERT-In
2 pages
Instruction Mnemonics and Syntax
No ratings yet
Instruction Mnemonics and Syntax
3 pages
Pearson Exedcel ICT Notes
No ratings yet
Pearson Exedcel ICT Notes
120 pages
3+2 Multiphase PWM Regulator For Intel Imvp8™ Desktop Cpus: Datasheet
No ratings yet
3+2 Multiphase PWM Regulator For Intel Imvp8™ Desktop Cpus: Datasheet
2 pages
MikroTik Network Training Guide
No ratings yet
MikroTik Network Training Guide
144 pages
Network Operating System (NOS) L1
No ratings yet
Network Operating System (NOS) L1
11 pages
c9606r Datasheet
No ratings yet
c9606r Datasheet
4 pages
Introduction To Natural Language Processing NLP
No ratings yet
Introduction To Natural Language Processing NLP
8 pages
Manual en
No ratings yet
Manual en
8 pages
Physical Address Generation
No ratings yet
Physical Address Generation
8 pages
NAS - LoRaWAN PULSER BK G - CM3061
No ratings yet
NAS - LoRaWAN PULSER BK G - CM3061
41 pages
Disaster Tweet Classification Report
No ratings yet
Disaster Tweet Classification Report
12 pages
Software Assignment Final
No ratings yet
Software Assignment Final
15 pages
TSW200 Datasheet v1.5
No ratings yet
TSW200 Datasheet v1.5
7 pages

Program Optimization Techniques

Uploaded by

Program Optimization Techniques

Uploaded by

Program Optimization

Professor Jennifer Rexford http://www.cs.princeton.edu/~jrex

Goals of Todays Class

Exploiting an understanding of underlying system

Improving Program Performance

Most parts of a program are already fast enough

Steps to improve execution (time) efficiency

Ways to Optimize Performance

Better source code the compiler can optimize

Helping the Compiler Do Its Job

Dont (usually) improve asymptotic efficiency

Have difficulty overcoming optimization blockers

Limitations of Optimizing Compilers

Most analysis is performed only within functions

Most analysis is based only on static information

Avoiding Repeated Computation

Worrying About Side Effects

Is this transformation okay?

Another Example on Side Effects

Short answer: it depends

Programmer knows best

void twiddle(int *xp, int *yp) { *xp += 2 * *yp; }

Not necessarily, what if xp and yp are equal?

Modifying the data using one name

Blocks optimization by the compiler

Programmer often does know

Another Aliasing Example

Summary: Helping the Compiler

But often the compiler needs your help

Modifying the code can lead to better performance

Exploiting the Hardware

Registers and caches are faster than main memory

Exploits opportunities for parallelism

Addition Faster Than Multiplication

Recognize sequences of products

Bit Operations Faster Than Arithmetic

Bit masking is faster than mod operation

Caching: Matrix Multiplication

Matrix Multiply: Cache Effects

Reasonable cache effects

Matrix Multiply: Cache Effects

Rather poor cache effects

Matrix Multiply: Cache Effects

Good poor cache effects

Parallelism: Loop Unrolling

Limited apparent parallelism

Solution: unroll the loop

Parallelism: After Loop Unrolling

After loop unrolling (by three)

Avoiding Function Calls

Some compilers support inline keyword directive.

void f(void) { /* Some code */ }

Writing Your Own Malloc and Free

Existing malloc and free implementations

Designing your own dynamic memory management

Techniques for speeding up a program

The Rest of the Semester

Final Exam: Friday May 15

Office hours during reading/exam period

Goals of COS 217

Learn C and the Unix development tools

Improve your programming skills

Relationship to Other Courses

Lessons About Computer Science

Bits can mean anything

Have a Great Summer!!!

You might also like

void twiddle(int xp, int yp) { xp += 2 *yp; }