ChainerX
and How to Take Part
Hiroyuki Vincent Yamazaki, @hvy @ Preferred Networks.
Mar. 30, 2019.
Chainer Meetup #09 @ Preferred Networks.
What makes
a modern deep learning framework?
• Speed
• Fast trial-and-error
• Fast training and inference
• Environment Support
• Quick adoption of new hardwares/environments
• Quick Deployment
• Quick application of research outcome
Chainer
• Speed
• Fast trial-and-error
• Fast training and inference
• Environment Support
• Quick adoption of new hardwares/environments
• Quick Deployment
• Quick application of research outcome
Chainer
ChainerX
• how it makes Chainer a modern deep learning framework
• how it started and where it is heading
• how to contribute to it
This talk is about ChainerX and...
• understand ChainerX and some of its internals
• are ready to try ChainerX
• be curious to modify it to your needs
You hopefully after this talk...
What is ChainerX?
A NumPy-like ndarray library with autograd,
built from scratch with experiences from Chainer
• Subproject of Chainer started in late 2017
• With both internal and external Chainer developers
• Merged into master as of v6.0.0b1 and will be included in v6
https://github.com/chainer/chainer/tree/master/chainerx
https://github.com/chainer/chainer/tree/master/chainerx_cc
How it started
@beam2d @niboshi @asi1024 @hvy @sonots @takagi
import chainerx as chx
# Array creation, chx.ndarray, similar to NumPy
x = chx.ones((2, 3), dtype=chx.float32, device='native')
# Flag to record computational graph
x.require_grad()
# Define-by-run/eager forward pass, again similar to NumPy
y = chx.exp(x + 1).sum()
# Backpropagation
chx.backward(y)
# Computed gradient is also a chx.ndarray
gx = x.grad
chainerx.add
chainerx.amax
chainerx.arange
chainerx.argmax
chainerx.array
chainerx.asanyarray
chainerx.asarray
chainerx.ascontiguousarray
chainerx.average_pool
chainerx.batch_norm
chainerx.broadcast_to
chainerx.clip
chainerx.concatenate
chainerx.conv
chainerx.conv_transpose
chainerx.copy
chainerx.diag
chainerx.diagflat
chainerx.divide
chainerx.dot
chainerx.empty
chainerx.empty_like
chainerx.equal
chainerx.exp
chainerx.eye
chainerx.fixed_batch_norm
chainerx.floor_divide
chainerx.frombuffer
chainerx.fromfile
chainerx.fromfunction
chainerx.fromiter
chainerx.fromstring
chainerx.full
chainerx.full_like
chainerx.greater
chainerx.greater_equal
chainerx.hstack
chainerx.identity
chainerx.isfinite
chainerx.isinf
chainerx.isnan
chainerx.less
chainerx.less_equal
chainerx.linear
chainerx.linspace
chainerx.loadtxt
chainerx.log
chainerx.log_softmax
chainerx.logical_not
chainerx.logsumexp
chainerx.max
chainerx.max_pool
chainerx.maximum
chainerx.minimum
chainerx.multiply
chainerx.ndarray
chainerx.negative
chainerx.not_equal
chainerx.ones
chainerx.ones_like
chainerx.ravel
chainerx.relu
chainerx.reshape
chainerx.sigmoid
chainerx.split
chainerx.sqrt
chainerx.square
chainerx.squeeze
chainerx.stack
chainerx.subtract
chainerx.sum
chainerx.take
chainerx.tanh
chainerx.to_numpy
chainerx.transpose
chainerx.true_divide
chainerx.vstack
chainerx.zeros
chainerx.zeros_like
chainerx.activation
chainerx.creation
chainerx.random
chainerx.manipulation
chainerx.math
chainerx.dtype
chainerx.bool
chainerx.bool_
chainerx.float
chainerx.float16
chainerx.float32
chainerx.float64
chainerx.int
chainerx.int16
chainerx.int32
chainerx.int64
chainerx.int8
chainerx.uint8
chainerx.all_dtypes
chainerx.Context
chainerx.ContextScope
chainerx.Backend
chainerx.BackpropId
chainerx.BackpropScope
chainerx.Device
chainerx.DeviceScope
chainerx.ForceBackpropMode
chainerx.NoBackpropMode
chainerx.grad
chainerx.backprop_scope
chainerx.backward
chainerx.check_backward
chainerx.check_double_backward
chainerx.context_scope
chainerx.force_backprop_mode
chainerx.get_backend
chainerx.get_default_context
chainerx.get_default_device
chainerx.get_device
chainerx.is_available
chainerx.is_backprop_required
chainerx.no_backprop_mode
chainerx.set_default_context
chainerx.using_device
chainerx.newaxis
…
Why ChainerX?
Speed, environment support and quick deployment
• Written in C++
• Speed
• No Python runtime
required for deployment
• Python binding on top
• Lightweight
• 1-to-1 C++ mappings
• Pluggable backends
• Extensible to new
hardwares/environments
Autograd
Backpropable ndarray
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Backend/Device interface
Custom
Backend/
Device
...
#include "chainerx.h"
namespace chx = chainerx;
chx::Array x = chx::Ones(
{2, 3}, chx::Dtype::kFloat32,
chx::GetDevice("native"));
x.RequireGrad();
chx::Array y = chx::Exp(x + 1).Sum();
chx::Backward(y);
chx::Array gy = *x.GetGrad();
C++ API
import chainerx as chx
x = chx.ones(
(2, 3), dtype=chx.float32,
device='native')
x.require_grad()
y = chx.exp(x + 1).sum()
chx.backward(y)
gx = x.grad
Python API
ChainerX internals
Explaining basic types and functions
// Call a routine to create a graph.
Internally uses chx::BackwardBuilder to do so
chx::Array y =
chx::Conv(x, w, b, {1, 1}, {1, 1});
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
ArrayNode ArrayNode ArrayNode
OpNode, Conv
Array, y
ArrayBody
ArrayNode
chainerx namespace omitted for clarity
// Flag to record computational graph
x.RequireGrad();
w.RequireGrad();
b.RequireGrad();
// Create input ndarrays
chx::Array x = ...
chx::Array w = ...
chx::Array b = ...
chainerx::Array (chainerx::ArrayBody)
• Core data type in ChainerX, an ndarray with autograd
• Has ndarray properties such as
• pointer to allocated data,
shape, dtype, strides
• Associated with a single device
• Data resides on e.g. "native" or "cuda:2"
• Holds references to its
• gradients, also chainerx::Arrays
• nodes in the computational graphs
Array, x
device
ArrayBody
data
Array, gx
ArrayNode
ArrayBody
chainerx::ArrayNode
• A node representing an
array in the
computational graph
• Owned by
chainerx::ArrayBody
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
ArrayNode ArrayNode ArrayNode
OpNode, Conv
Array, y
ArrayBody
ArrayNode
chainerx::OpNode
• A node representing an
operation in the
computational graph
• Referenced by
chainerx::ArrayNode
Array, x
ArrayBody
Array, w
ArrayBody
Array, b
ArrayBody
ArrayNode ArrayNode ArrayNode
OpNode, Conv
Array, y
ArrayBody
ArrayNode
• An array is constructed by specifying the allocating device
chainerx::Device& gpu = chainerx::GetDevice("cuda:0");
chainerx::Array x =
chainerx::Ones({2, 3}, chainerx::Dtype::kFloat32, gpu);
• A device defines
• how memory is allocated and freed
• chainerx::Device::Allocate
• operations on data
• chainerx::Device::{
Fill,Arange,Add,Subtract,Multiply,Divide,Sum,Dot,...}
chainerx::Device (1/2)
chainerx::Device (2/2)
• chainerx::Device is an interface
• Concrete implementations provided by ChainerX
• chainerx::native::NativeDevice
• chainerx::cuda::CudaDevice
• Can be implemented for other devices and dynamically loaded as
shared libraries
Routines (1/2)
• Backpropable autograd operations on chainerx::Arrays
• chainerx::{
Add,Subtract,Multiply,Divide,
Sum,Transpose,Reshape,Dot,
Conv,BatchNorm,MaxPool,...}
Routines (2/2)
• Defines forward and backward
logic using
chainerx::BackwardBuilder
• Delegates actual computations
to the device methods
• chainerx::Dot calls
chainerx::Device::Dot
Array Dot(const Array& a, const Array& b, Dtype dtype) {
int64_t m = a.shape()[0];
int64_t k = a.shape()[1];
int64_t n = b.shape()[1];
Array out = Empty({m, n}, dtype, a.device());
{
NoBackpropModeScope scope{};
a.device().Dot(a, b, out);
}
{
BackwardBuilder bb{"dot", {a, b}, out};
if (BackwardBuilder::Target bt = bb.CreateTarget(0)) {
bt.Define([b_tok = bb.RetainInput(1), a_dtype = a.dtype()](BackwardContext& bctx) {
const Array& b = bctx.GetRetainedInput(b_tok);
bctx.input_grad() = Dot(*bctx.output_grad(), b.Transpose(), a_dtype);
});
}
if (BackwardBuilder::Target bt = bb.CreateTarget(1)) {
bt.Define([a_tok = bb.RetainInput(0), b_dtype = b.dtype()](BackwardContext& bctx) {
const Array& a = bctx.GetRetainedInput(a_matrix_tok);
bctx.input_grad() = Dot(a.Transpose(), *bctx.output_grad(), b_dtype);
});
}
bb.Finalize();
}
return out;
}
Chainer integration
How ChainerX can be used from Chainer
Architecture
Variable and functions APIs
Autograd
Backpropable ndarray
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Backend/Device interface
Custom
Backend/
Device
...
Training and model APIs
CuPy
Autograd
NumPy
• Various APIs in Chainer
v6 work with and utilize
chainerx
• Variable and
FunctionNode
delegates autograd
computations to ChainerX
Chainer
import chainer as ch
import cupy as cp
class ResNet50(ch.Chain):
…
model = ResNet50()
model.to_device(0)
arr = cp.array(...)
x = ch.Variable(arr)
y = model(x)
loss = …
loss.backward()
Autograd
Backpropable ndarray
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Backend/Device interface
Custom
Backend/
Device
...
Training and model APIs
CuPy
Variable and functions APIs
CuPy
Autograd
NumPy
Chainer
on ChainerX
import chainer as ch
import chainerx as chx
class ResNet50(ch.Chain):
…
model = ResNet50()
model.to_device('cuda:0')
arr = chx.array(...)
x = ch.Variable(arr)
y = model(x)
loss = …
loss.backward()
Training and model APIs
CuPy
Variable and functions APIs
CuPy
Autograd
NumPy
Autograd
Backpropable ndarray
CUDA
Backend/
Device
Native
Backend/
Device
Python binding
Backend/Device interface
Custom
Backend/
Device
...
How to take part in developing ChainerX
Contribution guide explained
It’s all documented
• A section in the Chainer documentation
https://docs.chainer.org/en/latest/chainerx/index.html
• On GitHub
• Look for issues/PRs labeled
• ChainerX needs to support more routines
• A list of unimplemented routines
https://github.com/chainer/chainer/issues/6423
contribution-welcomeChainerX
Future of ChainerX
Future roadmap
• Integrate into Chainer
• Wider range of supported routines
• Dynamic device operation registration
• Concrete third party backends
• Stable C++ interface
• Wider coverage of “compiled models”
Summary
ChainerX is implemented in C++ with far less host-side
overhead, made accessible to Python-free deployments and
allows third parties to implement backends and devices for
hardware/environment support
Taking Chainer to the next level
by being accessible via Python and used by Chainer
and you can take part of ChainerX on GitHub
Contributions, ideas and discussions are welcome
• Follow @ChainerOfficial on Twitter
• Join chainer on Slack
• Job application to https://www.preferred-networks.jp/en/jobs
We are hiring
Additional resources
• ChainerX documentation
• ChainerX Product Backlog
• ChainerX examples (MLP, ResNet50)
• ChainerX Python bindings
• ChainerX C++ Backpropagation

ChainerX and How to Take Part