Advanced Data Layouts in Taichi
Advanced Data Layouts in Taichi
3
N-body systems
Mandelbulb MPM88 + March Squares N-body with black hole(s) N-body with black hole(s)
@rockeyshao @wangfeng70117 @szl2 @logic-three-body
6
Gifts for the gifted
• Check your Github issues ☺
7
Outline Today
• Advanced dense data layouts
• Sparse data layouts
8
Outline Today
• Advanced dense data layouts
• Sparse data layouts
9
Advanced dense data layouts
Taichi
• ti.field()
• @ti.kernel
• Optimized for ti.field()
• OOP
• @data_oriented
11
Taichi: A data-oriented programming language
• ti.field()
• @ti.kernel
• Optimized for ti.field()
• OOP
• @data_oriented
12
import taichi as ti
ti.init(ti.gpu)
# number of planets
N = 300
# unit mass
m = 5
# galaxy size
galaxy_size = 0.4
Data
# planet radius (for rendering)
planet_radius = 2
# init vel
init_vel = 120
# time-step size
h = 1e-5
# substepping
substepping = 10
@ti.kernel
def initialize():
center = ti.Vector([0.5, 0.5])
for i in range(N):
theta = ti.random() * 4 * PI
r = (ti.sqrt(ti.random()) * 0.7 + 0.3) * galaxy_size
offset = r * ti.Vector([ti.cos(theta), ti.sin(theta)])
pos[i] = center+offset
vel[i] = [-offset.y, offset.x]
vel[i] *= init_vel
@ti.kernel
def compute_force():
# clear force
for i in range(N):
force[i] = ti.Vector([0.0, 0.0])
Computation
# compute gravitational force
for i in range(N):
p = pos[i]
for j in range(N):
if i != j: # double the computation for a better memory footprint and load balance
diff = p-pos[j]
r = diff.norm(1e-5)
@ti.kernel
def update():
dt = h/substepping
for i in range(N):
#symplectic euler
vel[i] += dt*force[i]/m
pos[i] += dt*vel[i]
initialize()
while gui.running:
Visualization
for i in range(substepping):
compute_force()
update()
...
gui.clear(0x112F41)
gui.circles(pos.to_numpy(), color=0xffffff, radius=planet_radius)
gui.show()
13
Performance @CPU…
WALL CLOCK TIME
Computation Data Access
80%
14
Performance @GPU…
WALL CLOCK TIME
Computation Data Access
80%
15
搬砖 Example (a slide from @禹鹏)
...
...
...
. .
.. ..
...
16
Before we go: packed mode
• Initialized in ti.init()
• Decides whether to pad the data to the power of two
• Default choice: packed=False, will do the padding
• We assume packed=True in this class for simplicity
ti.init(packed=True)
a = ti.field(ti.i32, shape=(18, 65)) # no padding
17
Taichi: optimized for data-access
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
18
Taichi: optimized for data-access
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
19
Taichi: optimized for data-access
Data in memory
Data prefetched
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
20
Taichi: optimized for data-access
0 4 8 12 Data in memory
Data prefetched
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
21
Taichi: A data-oriented programming language
0 1 4 5 8 9 12 13 Data in memory
Data prefetched
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
22
Taichi: optimized for data-access
0 1 2 4 5 6 8 9 10 12 13 14 Data in memory
Data prefetched
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
23
Taichi: optimized for data-access
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Data in memory
Data prefetched
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
24
Taichi: optimized for data-access
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Data in memory
Data prefetched
Access Order
Memory Order
x = ti.field(ti.i32, shape=16)
@ti.kernel
def fill():
for i in x:
x[i] = i
fill()
25
How about multi-dimensional fields?
@ti.kernel
def fill():
for i,j in x:
x[i,j] = 10*i + j
fill()
26
N-D fields are stored in our 1-D memory…
27
However the access pattern is not determined…
Access Order
Access Order
@ti.kernel
def fill():
for i,j in x:
x[i,j] = 10*i + j
fill()
28
What we want:
• Store our data in a memory-access-friendly way.
Access Order
Access Order
29
Ideal memory layout of an N-D field:
1 2 3 4
5 6 7 8
1 2 3 4 5 6 7 8
30
Ideal memory layout of an N-D field:
1 5
2 6
Access Order
1 2 3 4 5 6 7 8
3 7
4 8
Memory Order
31
Ideal memory layout of an N-D field:
1 2 5 6
3 4 7 8
1 2 3 4 5 6 7 8
Memory Order
32
What we want:
Access Order
• Store our data in a memory-access-friendly way.
=
Memory Order
Access Order
Access Order
33
What we want:
• memory-access-friendly? x = ti.field(ti.i32, shape = (4, 4))
• using ti.field()?
Access Order
Access Order
34
Access row/col-majored arrays/fields in C/C++
But that requires a huge stack in my brain …
int x[3][2]; // row-major
int y[2][3]; // column-major
foo(){
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 2; j++) {
do_something(x[i][j]);
}
}
C/C++
35
Upgrade your ti.field()
36
Layout 101: from shape to ti.root
x = ti.Vector.field(3, ti.f32, shape = 16)
x = ti.Vector.field(3, ti.f32)
ti.root.dense(ti.i, 16).place(x)
ti.root In English:
Each cell of root has a dense container with 16 cells along
the ti.i axis. Each cell of a dense container has field x
37
Layout 101: from shape to ti.root
x = ti.Vector.field(3, ti.f32, shape = 16) Root
x = ti.Vector.field(3, ti.f32)
ti.root.dense(ti.i, 16).place(x)
ti.root In English:
Each cell of root has a dense container with 16 cells along
the ti.i axis. Each cell of a dense container has field x
38
Layout 101: from shape to ti.root
x = ti.Vector.field(3, ti.f32, shape = 16) Root
x = ti.Vector.field(3, ti.f32)
ti.root.dense(ti.i, 16).place(x)
ti.i axis
16 cells
ti.root In English:
Each cell of root has a dense container with 16 cells along
the ti.i axis. Each cell of a dense container has field x
39
Layout 101: from shape to ti.root
x = ti.Vector.field(3, ti.f32, shape = 16) Root
ti.Vector.field(3, ti.f32)
x = ti.Vector.field(3, ti.f32)
ti.root.dense(ti.i, 16).place(x)
ti.root In English:
Each cell of root has a dense container with 16 cells along
the ti.i axis. Each cell of a dense container has field x
40
ti.root: more examples:
x = ti.field(ti.f32, shape=()) x = ti.field(ti.f32)
ti.root.place(x)
41
ti.root: the root of a SNode-tree
• SNode: Structural Node Root
• An SNode tree:
• ti.root the root of the SNode-tree
• .dense() a dense container describing shape
• .place(ti.field()) a field describing cell data
• …
42
ti.root: the root of a SNode-tree
• SNode: Structural Node
• An SNode tree:
• ti.root the root of the SNode-tree
• .dense() a dense container describing shape
• .place(ti.field()) a field describing cell data Field
• …
dense
root
43
The SNode-tree
x = ti.field(ti.i32, shape = (4, 4)) Root
x = ti.field(ti.i32)
ti.root.dense(ti.ij, (4, 4)).place(x)
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.j, 4).place(x)
44
Row-major v.s. column-major
Access Order
Access Order
45
Row-major v.s. column-major
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, 3).dense(ti.j, 2).place(x) # row-major
ti.root.dense(ti.j, 2).dense(ti.i, 3).place(y) # column-major
Field x Field y
Root Root
ti.i
ti.i
46
Row-major access
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.j, 4).place(x) # row-major
47
Access row/col-majored fields
import taichi as ti import taichi as ti
ti.init(arch = ti.cpu, cpu_max_num_threads=1) ti.init(arch = ti.cpu, cpu_max_num_threads=1)
x = ti.field(ti.i32) x = ti.field(ti.i32)
ti.root.dense(ti.i, 3).dense(ti.j, 2).place(x) ti.root.dense(ti.j, 2).dense(ti.i, 3).place(x)
# row-major # column-major
@ti.kernel @ti.kernel
def fill(): def fill():
for i,j in x: for i,j in x:
x[i, j] = i*10 + j x[i, j] = i*10 + j
@ti.kernel @ti.kernel
def print_field(): def print_field():
for i,j in x: for i,j in x:
print("x[",i,",",j,"]=",x[i,j],sep='', en print("x[",i,",",j,"]=",x[i,j],sep='', en
d=' ') d=' ')
fill() fill()
print_field() print_field()
48
Access row/col-majored fields
import taichi as ti import taichi as ti
ti.init(arch = ti.cpu, cpu_max_num_threads=1) ti.init(arch = ti.cpu, cpu_max_num_threads=1)
x = ti.field(ti.i32) x = ti.field(ti.i32)
ti.root.dense(ti.i, 3).dense(ti.j, 2).place(x) ti.root.dense(ti.j, 2).dense(ti.i, 3).place(x)
# row-major # column-major
@ti.kernel @ti.kernel
def fill(): def fill():
for i,j in x: for i,j in x:
x[i, j] = i*10 + j x[i, j] = i*10 + j
@ti.kernel @ti.kernel
def print_field(): def print_field():
for i,j in x: for i,j in x:
print("x[",i,",",j,"]=",x[i,j],sep='', en print("x[",i,",",j,"]=",x[i,j],sep='', en
d=' ') d=' ')
fill() fill()
print_field() print_field()
@ti.kernel
foo(){ def foo():
for (int i = 0; i < 3; i++) { for i,j in x:
for (int j = 0; j < 2; j++) { do_something(x[i, j])
do_something(x[i][j]);
} for i,j in y:
} do_something(y[i, j])
50
A special case:
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.i, 4).place(x) # what is this?
Root
ti.i
ti.i
51
Hierarchical layouts
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.i, 4).place(x) # A hierarchical 1-D field
Root
ti.i
ti.i
52
Access a hierarchical 1-D field
import taichi as ti My first execution:
ti.init(arch = ti.cpu) [Taichi] Starting on arch=x64
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.i, 4).place(x)
# A hierarchical 1-D field My second execution:
[Taichi] Starting on arch=x64
@ti.kernel 0 1 2 3 4 5 6 7 12 13 14 15 8 9 10 11
def print_id():
for i in x:
print(i, end = ' ') My third execution:
[Taichi] Starting on arch=x64
print_id() 4 5 6 7 12 13 14 15 0 1 2 3 8 9 10 11
53
Hierarchical layouts
x = ti.field(ti.i32)
ti.root.dense(ti.i, 4).dense(ti.i, 4).place(x) # A hierarchical 1-D field
ti.i
ti.i
54
Block-majored access?
55
Block-majored access using hierarchical fields
x = ti.field(ti.i32)
ti.root.dense(ti.ij, (2,2)).dense(ti.ij, (2,2)).place(x) # block-major
Root
ti.ij ti.ij
56
Flat layouts v.s. hierarchical layouts
import taichi as ti import taichi as ti
ti.init(arch = ti.cpu, cpu_max_num_threads=1) ti.init(arch = ti.cpu, cpu_max_num_threads=1)
@ti.kernel @ti.kernel
def fill(): def fill():
for i,j in z: for i,j in z:
z[i, j] = i*10 + j z[i, j] = i*10 + j
@ti.kernel @ti.kernel
def print_field(): def print_field():
for i,j in z: for i,j in z:
print("z[",i,",",j,"]=",z[i,j],sep='', end=' ') print("z[",i,",",j,"]=",z[i,j],sep='', end=' ')
fill() fill()
print_field() print_field()
First loop over ti.j, then ti.i First loop over ti.j, then ti.i,
in 2x2 blocks
57
Why do we need block-majored access?
@ti.kernel
def update_flow(self):
for P in ti.grouped(self.Flow):
# self.Flow[P]=self.FlowNext[P]
for i in ti.static(range(9)):
prePos = P-self.e[i]
self.Flow[P][i] = self.kar_avg[P][i] * \
(self.FlowNext[P][self.k[i]] - self.FlowNext[prePos][i]) + \
self.FlowNext[prePos][i]
58
Why do we need block-majored access?
9-point stencil
59
Array of structures (AoS) v.s. structure of arrays (SoA) in C/C++
struct S1 struct S2
{ {
int x[8]; int x;
int y[8]; int y;
} }
S1 soa; S2 aos[8];
x y
SoA
AoS
60
AoS v.s. SoA, which one is better?
struct S1 struct S2
• It really depends… {
int x[8];
{
int x;
int y[8]; int y;
} }
S1 soa; S2 aos[8];
do_something(soa.x[0]); do_something(aos[0].x);
x y do_something(soa.x[1]); do_something(aos[0].y);
SoA
AoS
61
SoA in Taichi
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, 8).place(x)
ti.root.dense(ti.i, 8).place(y)
# address: low ........................... high
# x[0] x[1] … x[7] y[0] y[1] … y[7]
x y
SoA
62
AoS in Taichi
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, 8).place(x, y)
# address: low .......................... high
# x[0] y[0] x[1] y[1] … x[7] y[7]
x y
AoS
63
AoS in Taichi
• Only same-shaped fields can be placed in AoS fashion
x = ti.field(ti.i32)
y = ti.field(ti.i32)
ti.root.dense(ti.i, 8).place(x)
ti.root.dense(ti.i, 16).place(y)
# different-shaped fields x and y can not be placed in AoS fashion
64
Switching between AoS and SoA in Taichi
x = ti.field(ti.i32) x = ti.field(ti.i32)
y = ti.field(ti.i32) y = ti.field(ti.i32)
ti.root.dense(ti.i, 8).place(x, y) ti.root.dense(ti.i, 8).place(x)
ti.root.dense(ti.i, 8).place(y)
@ti.kernel @ti.kernel
def foo(): def foo():
for i in x: for i in x:
do_something(x[i]) do_something(x[i])
for i in y: for i in y:
do_something(y[i]) do_something(y[i])
65
SoA Example, N-body:
pos = ti.Vector.field(2, ti.f32, N)
vel = ti.Vector.field(2, ti.f32, N)
force = ti.Vector.field(2, ti.f32, N)
...
@ti.kernel
def update():
dt = h/substepping
for i in range(N):
#symplectic euler
vel[i] += dt*force[i]/m
pos[i] += dt*vel[i]
66
AoS Example, N-body:
pos = ti.Vector.field(2, ti.f32)
vel = ti.Vector.field(2, ti.f32)
force = ti.Vector.field(2, ti.f32)
ti.root.dense(ti.i, N).place(pos, vel, force)
...
@ti.kernel
def update():
dt = h/substepping
for i in range(N):
#symplectic euler
vel[i] += dt*force[i]/m
pos[i] += dt*vel[i]
67
Loop over advanced data layouts
• Note
68
Loop over advanced data layouts
• Note = None
• You can access your advanced data layouts using struct-for(s) as if they were
your old friend ti.field() defined with shape.
69
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• Tree-structured: SNode-trees.
• The SNode stands for “Structural Nodes”
• All fields in Taichi are built using SNode-trees
• ti.root is actuall “the root” of an SNode-tree
• x=ti.field(ti.f32, shape=N) <==> x=ti.field(ti.f32) + ti.root.dense(ti.i, N).place(x)
• ti.root.dense(ti.ij, (N, M)) <==> ti.root.dense(ti.i, N).dense(ti.j, M)
• You can append (multiple) dense cells to other dense cells
• Row/col-major: ti.root.dense(ti.i, N).dense(ti.j, M)
• Hierarchical layouts: ti.root.dense(ti.i, N).dense(ti.i, M)
• SoA/AoS: ti.root.dense(ti.i, N).place(x, y, z)
• You do not need to worry about the access of your data layouts
• The Taichi struct-for handles it for you
70
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• Tree-structured: SNode-trees.
• The SNode stands for “Structural Nodes”
• All fields in Taichi are built using SNode-trees
• ti.root is actuall “the root” of an SNode-tree
• x=ti.field(ti.f32, shape=N) <==> x=ti.field(ti.f32) + ti.root.dense(ti.i, N).place(x)
• ti.root.dense(ti.ij, (N, M)) <==> ti.root.dense(ti.i, N).dense(ti.j, M)
• You can append (multiple) dense cells to other dense cells
• Row/col-major: ti.root.dense(ti.i, N).dense(ti.j, M)
• Hierarchical layouts: ti.root.dense(ti.i, N).dense(ti.i, M)
• SoA/AoS: ti.root.dense(ti.i, N).place(x, y, z)
• You do not need to worry about the access of your data layouts
• The Taichi struct-for handles it for you
71
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• Tree-structured: SNode-trees.
• The SNode stands for “Structural Nodes”
• All fields in Taichi are built using SNode-trees
• ti.root is actuall “the root” of an SNode-tree
• x=ti.field(ti.f32, shape=N) <==> x=ti.field(ti.f32) + ti.root.dense(ti.i, N).place(x)
• ti.root.dense(ti.ij, (N, M)) <==> ti.root.dense(ti.i, N).dense(ti.j, M)
• You can append (multiple) dense cells to other dense cells
• Row/col-major: ti.root.dense(ti.i, N).dense(ti.j, M)
• Hierarchical layouts: ti.root.dense(ti.i, N).dense(ti.i, M)
• SoA/AoS: ti.root.dense(ti.i, N).place(x, y, z)
• You do not need to worry about the access of your data layouts
• The Taichi struct-for handles it for you
72
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• Tree-structured: SNode-trees.
• The SNode stands for “Structural Nodes”
• All fields in Taichi are built using SNode-trees
• ti.root is actuall “the root” of an SNode-tree
• x=ti.field(ti.f32, shape=N) <==> x=ti.field(ti.f32) + ti.root.dense(ti.i, N).place(x)
• ti.root.dense(ti.ij, (N, M)) <==> ti.root.dense(ti.i, N).dense(ti.j, M)
• You can append (multiple) dense cells to other dense cells
• Row/col-major: ti.root.dense(ti.i, N).dense(ti.j, M)
• Hierarchical layouts: ti.root.dense(ti.i, N).dense(ti.i, M)
• SoA/AoS: ti.root.dense(ti.i, N).place(x, y, z)
• You do not need to worry about the access of your data layouts
• The Taichi struct-for handles it for you
73
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• ti.i, ti.j, ti.k, ti.l <==> ti.axes(0), ti.axes(1), ti.axes(2), ti.axes(3)
• Currently Taichi supports at most 8 axes to ti.axes(7)
• ti.root.dense(ti.axes(0), 1).dense(ti.axes(1), 2).dense(ti.axes(2), 3).dense(ti.axes(3),
4).dense(ti.axes(4), 5).dense(ti.axes(5), 6).dense(ti.axes(6), 7).dense(ti.axes(7), 8).place(x)
• Get your Taichi updated to get the correct behavior for row/col-majored fields
• We have a new release today (10/12/2021)
74
Before moving to the next topic…
• Generates advanced dense data layouts using ti.root
• ti.i, ti.j, ti.k, ti.l <==> ti.axes(0), ti.axes(1), ti.axes(2), ti.axes(3)
• Currently Taichi supports at most 8 axes to ti.axes(7)
• ti.root.dense(ti.axes(0), 1).dense(ti.axes(1), 2).dense(ti.axes(2), 3).dense(ti.axes(3),
4).dense(ti.axes(4), 5).dense(ti.axes(5), 6).dense(ti.axes(6), 7).dense(ti.axes(7), 8).place(x)
• Get your Taichi updated to get the correct behavior for row/col-majored fields
• We have a new release (ver. 0.8.3) today (10/12/2021)
75
Sparse data layouts
The SNode-tree
• root: the root of the data structure
• dense: a fixed-length contiguous array.
Field
dense
root
77
The SNode-tree
• root: the root of the data structure
• dense: a fixed-length contiguous array.
• bitmasked: similar to dense, but it also uses
a mask to maintain sparsity information,
Field
one bit per child.
• pointer: stores pointers instead of the
dense
whole structure to save memory and
maintain sparsity
root
78
Sparse computation! but why?
• MPM simulation ➔➔
• 256x256 grid cells in total
• Subdivided to 16x16 blocks
• Each block has 16x16 grid cells
• Allocating memory for the total 256x256
grid cells is a waste.
• The dark blocks are filled with zeros anyway
79
Sparse computation! Then how?
• A dense SNode-tree: x = ti.field(ti.i32)
block1 = ti.root.dense(ti.i, 3)
block2 = block1.dense(ti.j, 3)
block2.place(x)
# equivalent to ti.root.dense(ti.i,3)
.dense(ti.j,3).place(x)
Root
80
Sparse computation! Then how?
• A dense SNode-tree: 1 0 0
0 0 0
0 0 0
Root
81
Sparse computation! Then how?
• A dense SNode-tree: 1 0 0
0 0 0
0 0 0
Root
1 0 0 0 0 0 0 0 0 82
Sparse computation! Then how?
• A dense SNode-tree: 1 0 0
0 0 0
0 0 0
Root
1 0 0 0 0 0 0 0 0 83
From .dense() to .pointer()
• A sparse SNode-tree: 1 0 0
0 0 0
0 0 0
Root
1 0 0 0 0 0 0 0 0 84
From .dense() to .pointer()
• A sparse SNode-tree: 1 0 0
0 0 0
0 0 0
Root
1 0 0 0 0 0 0 0 0 85
From .dense() to .pointer()
• A sparse SNode-tree: x = ti.field(ti.i32)
block1 = ti.root.pointer(ti.i, 3)
block2 = block1.dense(ti.j, 3)
block2.place(x)
# equivalent to ti.root.pointer(ti.i,3)
.dense(ti.j,3).place(x)
Root
1 0 0 0 0 0 0 0 0 86
From .dense() to .pointer()
• A sparse SNode-tree: x = ti.field(ti.i32)
block1 = ti.root.pointer(ti.i, 3)
block2 = block1.dense(ti.j, 3)
block2.place(x)
# equivalent to ti.root.pointer(ti.i,3)
.dense(ti.j,3).place(x)
Root
1 0 0 0 0 0 0 0 0 87
Activation
• A sparse SNode-tree born empty: x = ti.field(ti.i32)
block1 = ti.root.pointer(ti.i, 3)
block2 = block1.dense(ti.j, 3)
block2.place(x)
# equivalent to ti.root.pointer(ti.i,3)
.dense(ti.j,3).place(x)
Root
0 0 0 0 0 0 0 0 0 88
Activation
• Once writing an inactive cell: x[0,0] = 1
# activates block1[0]
Root
1 0 0 0 0 0 0 0 0 89
Activation
• Once writing an inactive cell: x[0,0] = 1
# activates block1[0] and thereby block2[0],
block2[1] and block2[2]
Root
1 0 0 0 0 0 0 0 0 90
Data access in a sparse field (a sparse SNode-tree)
• Use Taichi struct-for to access a @ti.kernel
def access_all():
sparse field for i,j in x:
• Inactive pointers are skipped print(x[i, j]) # 1, 0, 0
1 0 0 0 0 0 0 0 0 91
Why activating x[0, 1] and x[0, 2] as well?
• Because they belong to the same dense block
x = ti.field(ti.i32)
block1 = ti.root.pointer(ti.i, 3)
block2 = block1.dense(ti.j, 3)
block2.place(x)
# equivalent to ti.root.pointer(ti.i,3)
Root .dense(ti.j,3).place(x)
1 0 0 0 0 0 0 0 0 92
Why not using pointer everywhere?
• Bad design idea:
• a ti.f32 → 32 bits x = ti.field(ti.i32)
• a taichi pointer → 64 bits
block1 = ti.root.pointer(ti.i, 3)
block2 = block1.pointer(ti.j, 3)
block2.place(x)
# equivalent to ti.root.pointer(ti.i,3)
Root .pointer(ti.j,3).place(x)
1 0 0 0 0 0 0 0 0 93
Use bitmasks if you really want to flag leaf cells one at a time…
BM BM BM BM BM BM BM BM BM
1 0 0 0 0 0 0 0 0 94
Use bitmasks if you really want to flag leaf cells one at a time…
Root
BM BM BM BM BM BM BM BM BM
1 0 0 0 0 0 0 0 0 95
Use bitmasks if you really want to flag leaf cells one at a time…
Root
BM BM BM BM BM BM BM BM BM
1 0 0 0 0 0 0 0 0 96
Manual sparse field manipulation
• API x = ti.field(ti.i32)
97
Manual sparse field manipulation
• API x = ti.field(ti.i32)
98
Manual sparse field manipulation
• API x = ti.field(ti.i32)
99
Manual sparse field manipulation
• API x = ti.field(ti.i32)
100
Putting things together
• Previous section:
• Row-major v.s. col-major, flat v.s. hierarchical layouts
• This section:
• .dense() v.s. .pointer()/.bitmasked()
101
Putting things together
• A column-majored 2x4 2D sparse field:
x = ti.field(ti.i32)
ti.root.pointer(ti.j,4).dense(ti.i,2).place(x)
Root
P P P P
D D D D
D D D D
102
Putting things together
• A block-majored (block size = 3) 9x1 1D sparse field:
x = ti.field(ti.i32)
ti.root.pointer(ti.i,3).bitmasked(ti.i,3).place(x)
Root
P B
B
P
B
P
103
Putting things together
• I wrote this because I could: Root
• x: A column-majored 2x3 2D sparse field
• y/z: block-majored sparse 4x1 1D sparse fields
p1 p1 p1 p2 d21 y
• y and z share the same sparsity pattern on p2
d21
d21.place(y)
b22.place(z)
104
A rolling Taichi [Code]
n = 512
x = ti.field(ti.i32)
105
Sparse data layouts
• Append more types to your SNode-tree:
• .pointer() to represent sparse cells
• .bitmasked() to represent sparse leaf cells
• Activate cells (and its ancestors) by writing
• x[0,0] = 1
• Use Taichi struct-for(s) to access sparse fields
• as if they were dense ☺
106
Sparse data layouts
• Append more types to your SNode-tree:
• .pointer() to represent sparse cells
• .bitmasked() to represent sparse leaf cells
• Activate cells (and its ancestors) by writing
• x[0,0] = 1
• Use Taichi struct-for(s) to access sparse fields
• as if they were dense ☺
107
Sparse data layouts
• Append more types to your SNode-tree:
• .pointer() to represent sparse cells
• .bitmasked() to represent sparse leaf cells
• Activate cells (and its ancestors) by writing
• x[0,0] = 1
• Use Taichi struct-for(s) to access sparse fields
• as if they were dense ☺
108
Sparse data layouts
• Limited backend compatibility
• Supported by CPU/CUDA/Metal backends
• Sparse matrices are usually NOT implemented in Taichi via sparse data
layouts.
• Will cover it next week
109
Remark
• Advanced Data layouts for
• Dense data structures:
• .dense()
• row-major v.s. col-major, hierarchical v.s. flat, AoS v.s. SoA
• Sparse data structures:
• .pointer() / .bitmasked()
110
A bigger picture
• The SNode-tree
• root: the root of the data structure
• dense: a fixed-length contiguous array
• bitmasked: similar to dense, but it also uses a mask
to maintain sparsity information, one bit per child Field
• pointer: stores pointers instead of the whole
structure to save memory and maintain sparsity
dense
root
111
A bigger picture
• The SNode-tree
• root: the root of the data structure
• dense: a fixed-length contiguous array
• bitmasked: similar to dense, but it also uses a mask
to maintain sparsity information, one bit per child Field
• pointer: stores pointers instead of the whole
structure to save memory and maintain sparsity
• dynamic: variable-length array, with a predefined dense
maximum length
112
Taichi: a data-oriented programming language
• Focus on data-access
• Faster data-access ≈ better performance in GPU
113
Homework
N-body: [Link]
• Check the performance
• SoA v.s. AoS
115
Perlin noise: [Link]
• Check the performance
• Flat layout v.s. hierarchical layout
116
A rolling Taichi: [Link]
• Check the performance
• Sparse (.pointer()) layout v.s. dense
(.dense()) layout
117
Share your homework
• Could be ANYTHING you programmed using Taichi
118
Gifts for the gifted
• Next check: Nov. 9th 2021
119
Final tip
• Update your Taichi to 0.8.3 (released today on 10/12/2021)
• python -m pip install taichi --upgrade
120
Questions?
本次答疑:10/14
下次直播:10/19
直播回放:Bilibili 搜索「太极图形」
主页&课件:https://github.com/taichiCourse01