CS 267 HW 1
Ben Brock
Optimizing Matrix Multiply
- In HW 1, you’ll be optimizing matrix multiply
- C = C + AB, where A, B, and C are dense matrices
- For simplicity, we’ll consider the case of square matrices
Problem Pseudocode
for i = 1 to N:
for j = 1 to N:
for k = 1 to N:
c[i, j] = c[i, j] + a[i, k] * b[k, j]
3 nested loops => n3 complexity
Your Job: Implement This Interface
void square_dgemm (int n, double* A, double* B,
double* C);
You write this function, we call your function in a test harness.
Your job is to make it run as fast as possible.
Optimization Techniques
1) Blocking
a) L1 blocking
b) Register blocking
c) L2 blocking
2) Copy optimization
a) Copy to an aligned buffer
b) Transpose?
3) Vectorization
a) Write small, fixed-size (n=8-16) GEMM, examine assembly
b) Intrinsics
Blocking (or Tiling)
Copy Optimization