Wasm
Wasm
Andreas Haas Andreas Rossberg Derek L. Schuff∗ Ben L. Titzer Michael Holman
∗
Google GmbH, Germany / Google Inc, USA Microsoft Inc, USA
{ahaas,rossberg,dschuff,titzer}@google.com [email protected]
Modules A binary takes the form of a module. It con- Traps Some instructions may produce a trap, which im-
tains definitions for functions, globals, tables, and memories. mediately aborts the current computation. Traps cannot cur-
Each definition may be exported under one or more names. rently be handled by WebAssembly code, but an embed-
Definitions can also be imported, specifying a module/item der will typically provide means to handle this condition.
name pair and a suitable type. Imports can be re-exported. Embedded in JavaScript, a WebAssembly trap will throw
While a module corresponds to the static representation a JavaScript exception containing a stacktrace with both
of a program, an instance of a module corresponds to a dy- JavaScript and WebAssembly stack frames. It can be caught
namic representation, complete with mutable memory and and inspected by the surrounding JavaScript code.
an execution stack. The instantiation operation for modules
Machine Types WebAssembly has only four basic value
is provided by the embedder, such as a JavaScript virtual
types t, all of which are available in common hardware.
machine or an operating system. Instantiating a module re-
These are integers and IEEE 754 floating point numbers,
quires providing definitions for all imports, which may be
each in 32 and 64 bit width. 32 bit integers can be used as ad-
exports from previously created WebAssembly instances.
dresses in the linear memory (Section 2.2), and indexes into
WebAssembly computation can then be initiated by invok-
function tables (Section 2.4). Most WebAssembly instruc-
ing a function exported from the instance.
tions simply execute operators on values of these basic data
Functions The code in a module is organized into indi- types. The grammar in Figure 1 conveniently distinguishes
vidual functions. Each function takes a sequence of Web- several categories, such as unary and binary operators, tests
Assembly values as parameters and returns a sequence of and comparisons. WebAssembly provides conversions be-
values as results as defined by its function type. Functions tween all four types, and the ability to reinterpret the bits of
can call each other, including recursively. Functions are not values across equally-sized types. Like common hardware,
first class and cannot be nested within each other. As we will WebAssembly has no distinction between signed and un-
see later, the contents of the call stack for execution are not signed integer types. Instead, when the signedness of values
exposed, and thus cannot be directly accessed by a running matters to an instruction, a sign extension suffix u or s se-
WebAssembly program, even a buggy or malicious one. lects either unsigned or 2’s complement signed behavior.
Instructions WebAssembly computation is based on a
stack machine; code for a function consists of a sequence of Local Variables Functions f declare mutable local vari-
instructions that manipulate values on an implicit operand ables of types t∗ . Locals are zero-initialized and read or writ-
stack, popping argument values and pushing result values. ten by index via the get local and set local instructions,
However, thanks to the type system (Section 4), the layout respectively; tee local allows writing a local variable while
of the operand stack can be statically determined at any point leaving the input value on the operand stack, which is very
in the code, so that actual implementations can compile the common in real code. The index space for local variables
data flow between instructions directly without ever materi- starts with and includes the function parameters, meaning
alizing the operand stack.The stack organization is merely a that function parameters are also mutable.
way to achieve a compact program representation, as it has going so far as to field full-scale production prototypes and development
been shown to be smaller than a register machine [38].2 tools for both representations. We found that post-order ASTs decode and
verify faster than pre-order ASTs, but that the stack machine, which can be
2 We also explored compressed, byte-encoded ASTs for WebAssembly, first seen as a generalization of the post-order format, more easily extended to
with a pre-order encoding and then later with a post-order encoding, even multi-value support and allowed even more space optimizations.
Global Variables A module may also declare typed global Of course, that entails that big-endian platforms require ex-
variables accessed with the get global and set global in- plicit endian conversions. However, these conversions can
structions to read or write individual values. Globals can be be subjected to classical compiler optimizations such as re-
either mutable or immutable and require an initializer which dundancy elimination and code motion by the WebAssembly
must be a constant expression that evaluates without access engine. Thus the semantics of memory access is completely
to any function, table, memory, local or mutable global. Im- deterministic and portable across all engines and platforms,
porting globals and initializer expressions allow a limited even for unaligned accesses and unrestricted type-punning.
form of configurability, e.g. for linking.
Security Linear memory is disjoint from code space, the
So far so boring. In the following sections we turn our execution stack, and the engine’s data structures; therefore
attention to more interesting or unusual features of the Web- compiled programs cannot corrupt their execution environ-
Assembly semantics. ment, jump to arbitrary locations, or perform other undefined
behavior. At worst, a buggy or exploited WebAssembly pro-
2.2 Linear Memory gram can make a mess of the data in its own memory. This
The main storage of a WebAssembly program is a large array means that even untrusted modules can be safely executed
of bytes referred to as a linear memory or simply memory. in the same address space as other code. Achieving fast in-
process isolation was a necessary design constraint for inter-
Creation and Growing Each module can define at most acting with untrusted JavaScript and the full complement of
one memory, which may be shared with other instances Web APIs in a high-performance way. It also allows a Web-
via import/export. Memory is created with an initial size Assembly engine to be embedded into any other managed
but may be dynamically grown with the grow memory in- language runtime without violating memory safety, as well
struction. Growing may fail with an out-of-memory condi- as enabling programs with many independent instances with
tion indicated by grow memory returning −1 to be han- their own memory to exist in the same process.
dled by the program.3 The size can be queried with the
current memory instruction. The unit of size and growth 2.3 Structured Control Flow
is a page, which is defined to be 64 KiB, the least com-
WebAssembly represents control flow differently from most
mon multiple of minimum page sizes on modern hardware.
stack machines. It does not offer simple jumps but instead
The page size allows reusing virtual memory hardware for
provides structured control flow constructs more akin to a
bounds checks (Section 7). Page size is fixed instead of be-
programming language. This ensures by construction that
ing system-specific to prevent a common portability hazard.
control flow cannot form irreducible loops, contain branches
Access Memory is accessed with load and store instruc- to blocks with misaligned stack heights, or branch into the
tions that take a static alignment exponent a, a positive static middle of a multi-byte instruction. These properties allow
offset o, an optional static width expressed as a packed type WebAssembly code to be validated in a single pass, com-
tp, and the dynamic i32 address. Addresses are simply un- piled in a single pass, or even transformed to an SSA-form
signed integers starting at 0. The effective address of an ac- intermediate form in a single pass. Structured control flow
cess is the sum of the 32 bit static offset o and the dynamic disassembled to a text format is also easier to read, an often
i32 address as a 33 bit address (i.e., no wraparound), which overlooked but important human factor on the web, where
allows specific optimizations (Section 7). All memory ac- users are accustomed to inspecting the code of Web pages to
cess is dynamically checked against the memory size; out learn and share in an open manner.
of bounds access results in a trap. Memory can be accessed
Control Constructs and Blocks As required by the gram-
with 8, 16, 32, or 64 bit wide loads and stores, with packed
mar in Figure 1, the block, loop and if constructs must be
integer loads performing a zero or sign extension sx to either
terminated by an end opcode and be properly nested to be
32 or 64 bits. Unaligned access, where 2a is smaller than the
considered well-formed. The inner instruction sequences e∗
(packed) type’s width, is supported, e.g. accessing a 32 bit
in these constructs form a block. Note that loop does not au-
value on an odd address. Such access may be slow on some
tomatically iterate its block but allows constructing a loop
platforms, but always produces the same unexciting results.
manually with explicit branches. The if construct encloses
Endianness Byte order in memory is observable to pro- two blocks separated by an else opcode. The else can be
grams that load and store to aliased locations with differ- omitted if the second block is empty. Executing an if pops
ent types. Contemporary hardware seems to be converging an i32 operand off the stack and executes one of the blocks
on little-endian byte order, either being natively little-endian depending on whether the value is non-zero.
or having optional endian conversion included in memory
access, or being architecturally neutral with both variants Branches and Labels Branches have “label” immediates
available. Recognizing this convergence, we chose to de- that do not reference program positions in the instruction
fine WebAssembly memory to have little-endian byte order. stream but instead reference outer control constructs by rel-
ative nesting depth. That means that labels are effectively
3 Tosupport additional optimizations, WebAssembly also allows declaring scoped: branches can only reference constructs in which
an upper limit for each memory’s size, which we omit in this presentation. they are nested. Taking a branch “breaks from” that con-
struct’s block; 4 the exact effect depends on the target con- unordered cases. Various forms of loops can likewise be
struct: in case of a block or if it is a forward jump, resuming expressed with combinations of loop, block, br and br if.
execution after the matching end (like a break statement); By design, unstructured and irreducible control flow us-
with a loop it is a backward jump, restarting the loop (like a ing goto is impossible in WebAssembly. It is the responsi-
continue statement). bility of producers to transform unstructured and irreducible
The br if instruction branches if its input is non-zero, control flow into structured form. This is the established ap-
and br table selects a target from a list of label immediates proach to compiling for the Web (e.g. the relooper algo-
based on an index operand, with the last label being the rithm [43]), where JavaScript is also restricted to structured
target for out-of-bounds index values. These two instructions control. In our experience building an LLVM backend for
allow minimal code that avoids any jumps-to-jumps. WebAssembly, irreducible control flow is rare, and a sim-
ple restructuring algorithm is all that is necessary to trans-
Block Signatures and Unwinding Every control construct late any CFG to WebAssembly. The benefit of requiring re-
is annotated with a function type tf = t∗1 → t∗2 that de- ducible control flow by construction is that many algorithms
scribes how it changes the stack.5 Conceptually, blocks ex- in consumers are much simpler and faster.
ecute like function calls. Each block pops its argument val-
ues t∗1 off the stack, creates a new stack, pushes the argu- 2.4 Function Calls and Tables
ments onto the new stack, executes, pops its results off the A function body is a block (Section 2.3) whose signature
internal stack, and then pushes its results t∗2 onto the outer maps the empty stack to the function’s result. The arguments
stack. Since the beginning and end of a block represent con- to a function are stored in the first local variables of the func-
trol join points, all branches must also produce compatible tion. Execution of a function can complete in one of three
stacks. Consequently, branch instructions themselves expect ways: (1) by reaching the end of the block, in which case
operands, depending on whether they jump to the start or the operand stack must match the function’s result types; (2)
end of the block, i.e., with types t∗1 for loop targets and t∗2 by a branch targeting the function block, with the result val-
for block or if. ues as operands; (3) by executing return, which is simply
Branching unwinds a block’s local operand stack by im- shorthand for a branch that targets the function’s block.
plicitly popping all remaining operands, similar to returning
from a function call. When a branch crosses several block Direct Calls Functions can be invoked directly using the
boundaries, all respective stacks up to and including the tar- call instruction which takes an index immediate identifying
get block’s are unwound. This liberates producers from hav- the function to call. The call instruction pops the function
ing to track stack height across sub-expressions in order to arguments from the operand stack and pushes the function’s
make them match up at branches by adding explicit drops. return values upon return.
Production implementations perform register allocation
Indirect Calls Function pointers can be emulated with the
and compile away the operand stack when generating ma-
call indirect instruction which takes a runtime index into a
chine code. However, the design still allows simple inter-
global table of functions defined by the module. The func-
preters, e.g., to implement debuggers. An interpreter can
tions in this table are not required to have the same type.
have a contiguous stack and just remember the height upon
Instead, the type of the function is checked dynamically
entry to each block in a separate control stack. Further, it can
against an expected type supplied to the call indirect in-
make a prepass to construct a mapping from branches to in-
struction. The dynamic signature check protects integrity of
struction position and avoid dynamically searching for end
the execution environment; a successful signature check en-
opcodes, making all interpreter operations constant-time.6
sures that a single machine-level indirect jump to the com-
piled code of the target function is safe. In case of a type
Expressiveness Structured control flow may seem like a
mismatch or an out of bounds table access, a trap occurs.
severe limitation. However, most control constructs from
The heterogeneous nature of the table is based on experi-
higher-level languages are readily expressible with the suit-
ence with asm.js’s multiple homogeneous tables; it allows
able nesting of blocks. For example, a C-style switch state-
more faithful representation of function pointers and simpli-
ment with fall-through,
fies dynamic linking. To aid dynamic linking scenarios fur-
switch (x) { block block block block ther, exported tables can be grown and mutated dynamically
case 0: ...A... br table 0 1 2
through external APIs.
end ...A...
case 1: ...B... break; becomes
default: ...C... end ...B... br 1 External and Foreign Calls Functions can be imported to
} end ...C... a module and are specified by name and signature. Both
end
direct and indirect calls can invoke an imported function,
Slightly more finesse is required for fall-through between and through export/import, multiple module instances can
communicate.
4 The instruction name br can also be read as “break” wrt. to a block. Additionally, the import mechanism serves as a safe for-
5 Inthe initial version of WebAssembly, t∗1 must be empty and |t∗2 | ≤ 1. eign function interface through which a WebAssembly pro-
6 That is the approach V8 takes in its debugging interpreter. gram can communicate with its embedding environment. For
example, when WebAssembly is embedded in JavaScript, The initial release of WebAssembly also imposes a few
imported functions may be host functions that are defined in restrictions, likely lifted in future releases:
JavaScript. Values crossing the language boundary are auto- • Blocks and functions may produce at most one value.
matically converted according to JavaScript rules.7
• Blocks may not consume outer operands.
2.5 Determinism • Constant expressions for globals may only be of the form
The design of WebAssembly has sought to provide a portable (t.const c) or (get global i), where i refers to an import.
target for low-level code without sacrificing performance.
Where hardware behavior differs it usually is corner cases 3. Execution
such as out-of-range shifts, integer divide by zero, overflow
Presenting WebAssembly as a language provides us with
or underflow in floating point conversion, and alignment.
convenient and effective formal tools for specifying and
Our design gives deterministic semantics to all of these
reasoning about its semantics very precisely. In this section
across all hardware with only minimal execution overhead.
we define execution in terms of a standard reduction relation.
However, there remain three sources of implementation-
dependent behavior that can be viewed as non-determinism: 3.1 Stores and Instances
NaN Payloads WebAssembly follows the IEEE-754 stan- Execution operates relative to a global store s. The upper
dard for floating point arithmetic. However, IEEE-754 does part of Figure 2 defines syntax for representations of stores
not specify the exact bit pattern for NaN values in all cases, and other runtime objects. A store is a record of the lists
and we found that CPUs differ significantly, while normaliz- of module instances, tables and memories that have been
ing after every numeric operation is too expensive. However, allocated in it. Indices into these lists can be thought of as
we still want to enable compilers targeting WebAssembly to addresses, and “allocation” simply appends to these lists.
employ techniques like NaN-boxing [21]. Based on our ex- As described in Section 2.1, a module must be instanti-
perience with JavaScript engines, we established sufficient ated before it can be used. The result is an initialized in-
rules: (1) instructions only output canonical NaNs with a stance. Figure 2 represents such an instance as a record of the
non-deterministic sign bit, unless (2) if an input is a non- entities it defines. Tables and memories reside in the global
canonical NaN, then the output NaN is non-deterministic. store and are only referenced by address, since they can be
shared between multiple instances. The representation of in-
Resource Exhaustion Available resources are always fi-
stantiated tables is simply a list of closures cl and that of
nite and differ wildly across devices. In particular, an en-
memories a list of bytes b.
gine may be out of memory when trying to grow the linear
A closure is the runtime representation of a function, con-
memory – semantically a grow memory instruction non-
sisting of the actual function definition and a reference to the
deterministically returns −1. A call or call indirect instruc-
instance from which it originated. The instance is used to
tion may also experience stack overflow, but this is not se-
access stateful objects such as the globals, tables, and mem-
mantically observable from within WebAssembly itself.
ories, since functions can be imported from a different in-
Host Functions WebAssembly programs can call host stance. An implementation can eliminate closures by spe-
functions which are themselves non-deterministic or change cializing generated machine code to an instance.
WebAssembly state. Naturally, the effect of calling host Globals are represented by the values they hold. Since
functions is outside the realm of WebAssembly’s semantics. mutable globals cannot be aliased, they reside in their defin-
ing instance. Values are simply represented by a t.const in-
WebAssembly does not (yet) have threads, and therefore struction, which is convenient for the reduction rules.
no non-determinism arising from concurrent memory ac- We use notation like sfunc to refer to the func component
cess. Adding threads and a memory model is the subject of of a store record s; similarly for other records. Indexing
ongoing work beyond the scope of this paper. xs(i) denotes the i-element in a sequence xs. We extend
2.6 Omissions and Restrictions indexing to stores with the following short-hands:
WebAssembly as presented here is almost complete except sfunc (i, j) = sinst (i)func (j) stab (i, j) = stab (sinst (i)tab )(j)
for some minor omissions related to module initialization: sglob (i, j) = sinst (i)glob (j) smem (i, j) = smem (sinst (i)mem )(j)
• Tables can be partially initialized, and initialization can
For memories, we generalize indexing notation to slices, i.e.,
be applied to imported tables.
smem (i, j, k) denotes the byte sequence smem (i, j) . . . smem (i,
• Memory segments can be pre-initialized with static data. j + k − 1); plus, smem (i, ∗) is the complete memory in in-
• A module can specify a designated startup function. stance i. Finally, we write “s with glob(i, j) = v” for the
store s0 that is equal to s, except that s0glob (i, j) = v.
• Tables and memories can have an optional maximum size
that limits how much they can be grown. 3.2 Instantiation
7 Where trying to communicate an i64 value produces a JavaScript type Instantiating a module m = (module f ∗ glob ∗ tab ? mem ? )
error, because JavaScript cannot yet represent such values adequately. in a store s requires providing external function closures
(store) s ::= {inst inst ∗ , tab tabinst ∗ , mem meminst ∗ }
(instances) inst ::= {func cl ∗ , glob v ∗ , tab i? , mem i? }
tabinst ::= cl ∗
meminst ::= b∗
(closures) cl ::= {inst i, code f } (where f is not an import and has all exports ex ∗ erased)
(values) v ::= t.const c
(administrative operators) e ::= . . . | trap | call cl | labeln{e∗ } e∗ end | localn{i; v ∗ } e∗ end
(local contexts) L0 ::= v ∗ [ ] e∗
Lk+1 ::= v ∗ labeln{e∗ } Lk end e∗
Reduction ∗
s; v ∗ ; e∗ ,→i s0 ; v 0 ; e0
∗ ∗ ∗
s; v ∗ ; e∗ ,→i s0 ; v 0 ; e0 s; v ∗ ; e∗ ,→i s; v ∗ ; e∗
s; v ; Lk [e∗ ] ,→i s0 ; v 0 ∗ ; Lk [e0 ∗ ]
∗ s; v0 ; localn{i; v } e∗ end ,→j s0 ; v0∗ ; localn{i; v 0 ∗ } e0 ∗ end
∗ ∗
L0 [trap] ,→ trap if L0 6= [ ]
(t.const c) t.unop ,→ t.const unop t (c)
(t.const c1 ) (t.const c2 ) t.binop ,→ t.const c if c = binop t (c1 , c2 )
(t.const c1 ) (t.const c2 ) t.binop ,→ trap otherwise
(t.const c) t.testop ,→ i32.const testop t (c)
(t.const c1 ) (t.const c2 ) t.relop ,→ i32.const relop t (c1 , c2 )
?
(t1 .const c) t2 .convert t1 sx ? ,→ t2 .const c0 if c0 = cvtsx
t1 ,t2 (c)
(t1 .const c) t2 .convert t1 sx ? ,→ trap otherwise
(t1 .const c) t2 .reinterpret t1 ,→ t2 .const constt2 (bitst1 (c))
unreachable ,→ trap
nop ,→
v drop ,→
v1 v2 (i32.const 0) select ,→ v2
v1 v2 (i32.const k + 1) select ,→ v1
∗
v n block (tn m
1 → t2 ) e end ,→ labelm{} v n e∗ end
∗ ∗ n ∗
v n loop (tn1 → t m
2 ) e end ,→ labeln{loop (tn m
1 → t2 ) e end} v e end
∗ ∗ ∗
(i32.const 0) if tf e1 else e2 end ,→ block tf e2 end
(i32.const k + 1) if tf e∗1 else e∗2 end ,→ block tf e∗1 end
labeln{e∗ } v ∗ end ,→ v∗
labeln{e∗ } trap end ,→ trap
labeln{e∗ } Lj [v n (br j)] end ,→ v n e∗
(i32.const 0) (br if j) ,→
(i32.const k + 1) (br if j) ,→ br j
(i32.const k) (br table j1k j j2∗ ) ,→ br j
(i32.const k + n) (br table j1k j) ,→ br j
s; call j ,→i call sfunc (i, j)
s; (i32.const j) call indirect tf ,→i call stab (i, j) if stab (i, j)code = (func tf local t∗ e∗ )
s; (i32.const j) call indirect tf ,→i trap otherwise
∗
v n (call cl ) ,→ localm{cl inst ; v n (t.const 0)k } block ( → tm 2 ) e end end ...
localn{i; vl∗ } v n end ,→ vn | . . . if cl code = (func (tn m
1 → t2 ) local t e )
k ∗
Typing Instructions C ` e∗ : tf
C ` t.const c : → t C ` t.unop : t → t C ` t.binop : t t → t C ` t.testop : t → i32 C ` t.relop : t t → i32
? 0 0
t1 6= t2 sx = ⇔ (t1 = in ∧ t2 = in ∧ |t1 | < |t2 |) ∨ (t1 = fn ∧ t2 = fn ) t1 6= t2 |t1 | = |t2 |
C ` t1 .convert t2 sx ? : t2 → t1 C ` t1 .reinterpret t2 : t2 → t1
Clocal (i) = t Clocal (i) = t Clocal (i) = t Cglobal (i) = mut? t Cglobal (i) = mut t
C ` get local i : → t C ` set local i : t → C ` tee local i : t → t C ` get global i : → t C ` set global i : t →
Typing Modules
tf = t∗1 → t∗2 C, local t∗1 t∗ , label (t∗2 ), return (t∗2 ) ` e∗ : → t∗2 tg = mut? t C ` e∗ : → t ex ∗ = ∨ tg = t
∗ ∗
C ` ex func tf local t e : ex tf ∗ ∗ C ` ex global tg e : ex ∗ tg
∗ ∗
n
(Cfunc (i) = tf )
C ` ex ∗ table n in : ex ∗ n C ` ex ∗ memory n : ex ∗ n
tg = t
C ` ex ∗ func tf im : ex ∗ tf C ` ex ∗ global tg im : ex ∗ tg C ` ex ∗ table n im : ex ∗ n C ` ex ∗ memory n im : ex ∗ n
S = {inst C ∗ , tab n∗ , mem m∗ } (S ` inst : C)∗ ((S ` cl : tf )∗ )∗ (n ≤ |cl ∗ |)∗ (m ≤ |b∗ |)∗
∗ ∗ ∗
` {inst inst , tab (cl ) , mem (b∗ )∗ } : S
C ` e∗0 : tn ∗
1 → t2 C, label (tn ∗ ∗
1 ) ` e : → t2 S ` cl : tf S; (tn ) `i v ∗ ; e∗ : tn
C ` trap : tf C ` labeln{e∗0 } e∗ end : → t∗2 S; C ` call cl : tf S; C ` localn{i; v ∗ } e∗ end : → tn
Figure 4. Store and configuration typing and rules for administrative instructions
Theorems With the help of these auxiliary judgements we accomplish I/O and timers, and specifies how WebAssembly
can now formulate the relevant properties: traps are handled. In our work the primary use case has been
the Web and JavaScript embedding, so these mechanisms are
P ROPOSITION 4.1 (Preservation). If `i s; v ∗ ; e∗ : t∗ and
∗ ∗ ∗ ∗ implemented in terms of JavaScript and Web APIs.
s; v ∗ ; e∗ ,→i s0 ; v 0 ; e0 , then `i s0 ; v 0 ; e0 : t∗ .
JavaScript API In the browser, WebAssembly modules
P ROPOSITION 4.2 (Progress). If `i s; v ∗ ; e∗ : t∗ , then ei-
∗ ∗ ∗ can be loaded, compiled and invoked through a JavaScript
ther e∗ = v 0 , or e∗ = trap, or s; v ∗ ; e∗ ,→i s0 ; v 0 ; e0 .
API. The rough recipe is to (1) acquire a binary module from
These properties ensure that all valid programs either di- a given source, such as network or disk, (2) instantiate it
verge, trap, or terminate with values of the correct types. providing the necessary imports, (3) call the desired export
functions. Since compilation and instantiation may be slow,
5. Binary Format they are provided as asynchronous methods whose results
WebAssembly is transmitted over the wire as a binary en- are wrapped in promises.
coding of the abstract syntax presented in Figure 1. For space Linking An embedder can instantiate multiple modules
reasons, and because much of the format is rather straight- and use exports from one as imports to the other. This allows
forward, we only give a brief summary here. instances to call each other’s functions, share memory, or
A binary represents a single module and is divided into share function tables. Imported globals can serve as configu-
sections according to the different kinds of entities declared ration parameters for linking. In the browser, the JavaScript
in it, plus a few auxiliary sections. Function types are col- API also allows creating and initializing memories or tables
lected in their own section to allow sharing. Code for func- externally, or accessing exported memories and tables. They
tion bodies is deferred to a separate section placed after all are represented as objects of dedicated JavaScript classes,
declarations. This way, a browser engine can minimize page- and each memory is backed by a standard ArrayBuffer.
load latency by starting streaming compilation as soon as
function bodies arrive over the wire. It can also parallelize Interoperability It is possible to link multiple modules that
compilation of consecutive function bodies. To aid this fur- have been created by different producers. However, as a low-
ther, each body is preceded by its size so that a decoder can level language, WebAssembly does not provide any built-in
skip ahead and parallelize even its decoding. object model. It is up to producers to map their data types
Instructions are represented with one-byte opcodes (fu- to numbers or the memory. This design provides maximum
ture opcodes may be multi-byte). All integral numbers, in- flexibility to producers, and unlike previous VMs, does not
cluding opcode immediates, are encoded in LEB128 for- privilege any specific programming or object model while
mat [6]. The binary format strives for overall simplicity and penalizing others. Though WebAssembly has a program-
is regular enough to be expressible by a simple grammar. ming language shape, it is an abstraction over hardware, not
over a programming language.
6. Embedding and Interoperability Interested producers can define common ABIs on top of
WebAssembly such that modules can interoperate in hetero-
WebAssembly is similar to a virtual ISA in that it does not
geneous applications. This separation of concerns is vital for
define how programs are loaded into the execution engine
making WebAssembly universal as a code format.
or how they perform I/O. This intentional design separation
is captured in the notion of embedding a WebAssembly im-
plementation into an execution environment. The embedder 7. Implementation
defines how modules are loaded, how imports and exports A major design goal of WebAssembly has been high perfor-
between modules are resolved, provides foreign functions to mance without sacrificing safety or portability. Throughout
its design process, we have developed independent imple- flow8 . In the case of V8, decoding targets the TurboFan
mentations of WebAssembly in all major browsers to vali- compiler’s sea of nodes [12] graph-based IR, producing a
date and inform the design decisions. This section describes loosely-ordered graph that is suitable for subsequent opti-
some points of interest of those implementations. mization and scheduling. Once decoded to an intermediate
V8 (the JavaScript engine in Google’s Chrome), Spider- representation, compilation is then a matter of running the
Monkey (the JavaScript engine in Mozilla’s Firefox) and existing compiler backend, including instruction selection,
JavaScriptCore (the JavaScript engine in WebKit) reuse their register allocation, and code generation. Because some Web-
optimizing JIT compilers to compile modules ahead-of-time Assembly operations may not be directly available on all
before instantiation. This achieves predictable and high peak platforms, such as 64 bit integers on 32 bit architectures, IR
performance and avoids the unpredictability of warmup time rewriting and lowering might be performed before feeding
which has often been a problem for JavaScript. to the backend of the compiler. Our experience reusing the
However, other implementation strategies also make advanced JITs from 4 different JavaScript engines has been
sense. Chakra (the JavaScript engine in Microsoft Edge) a resounding success, allowing all engines to achieve high
instead lazily translates individual functions to an inter- performance in a short time.
preted internal bytecode format upon first execution, and
Reference Interpreter In addition to production-quality
later JIT-compiling the hottest functions. The advantage of
implementations for four major browsers, we implemented
this approach is faster startup and potentially lower memory
a reference interpreter for the entire WebAssembly language.
consumption. We expect more strategies to evolve over time.
For this we used OCaml [26] due to the ability to write in
Validation A key design goal of WebAssembly has been a high-level stylized way that closely matches the formal-
fast validation of code. In the four aforementioned imple- ization, approximating an “executable specification”. The
mentations, the same basic strategy of an abstract control reference interpreter includes a full binary encoder and de-
stack, an abstract operand stack with types, and a for- coder, validator, and interpreter, as well as an extensive test
ward program counter is used. Validation proceeds by on- suite. It is used to test both production implementations and
the-fly checking of the incoming bytecodes, with no in- the formal specification as well as to prototype new features.
termediate representation being constructed.We measured
single-threaded validation speed at between 75 MiB/s and 7.1 Bounds Checks
150 MiB/s on a suite of representative benchmarks on a By design, all memory accesses in WebAssembly can be
modern workstation. This is approximately fast enough to guaranteed safe with a single dynamic bounds check. Each
perform validation at full network speed of 1 Gib/s. instruction t.load a o k with type t, alignment a, static offset
Baseline JIT Compiler Mozilla’s SpiderMonkey engine o and dynamic address k represents a read of the memory at
includes two WebAssembly compilation tiers. The first is a smem (i, k + o, |t|). That means bytes k + o to k + o + |t| − 1
WebAssembly-specific fast baseline JIT that emits machine will be accessed by the instruction and must be in bounds.
code in a single pass that is combined with validation. The That amounts to checking k + o + |t| ≤ memsize. In
JIT creates no internal IR during compilation but does track a WebAssembly engine, the memory for an instance will
register state and attempts to do simple greedy register allo- be allocated in a large contiguous range beginning at some
cation in the forward pass. The baseline JIT is designed only (possibly nondeterministic) base in the engine’s process, so
for fast startup while the Ion optimizing JIT is compiling the the above access amounts to an access of base[k + n].
module in parallel in the background. The Ion JIT is also Code Specialization While base can be stored in a dedi-
used by SpiderMonkey as its top tier for JavaScript. cated machine register for quick access, a JIT compiler can
Optimizing JIT Compiler V8, SpiderMonkey, JavaScript- be even more aggressive and actually specialize the ma-
Core, and Chakra all include optimizing JITs for their top chine code generated for a module to a specific memory
tier execution of JavaScript and reuse them for maximum base, embedding the base address as a constant directly into
peak performance of WebAssembly. Both V8 and Spider- the code, freeing a register. First, the JIT can reduce the
Monkey top-tier JITs use SSA-based intermediate repre- cost of the bounds check by reorganizing the expression
sentations. As such, it was important to validate that Web- k + o + |t| ≤ memsize to k ≤ memsize − o − |t| and then
Assembly could be decoded to SSA form in a single lin- constant-fold the right hand side.9 Although memsize is not
ear pass to be fed to these JITs. Although details are be- necessarily constant (since memory can be grown dynami-
yond the scope of this paper, both V8 and SpiderMonkey cally) it changes so infrequently that the JIT can embed it
implement direct-to-SSA translation in a single pass during in generated machine code, later patching the machine code
validation of WebAssembly bytecode, while Chakra imple- if the memory size changes. Unlike other speculation tech-
ments a WebAssembly-to-internal-bytecode translation to niques, the change of a constant value is controlled enough
be fed through their adaptive optimization system. This is that deoptimization [22] of the code is not necessary.
greatly helped by the structured control flow constructs of 8 Which is also the case for many JVMs, because irreducible control flow
WebAssembly, making the decoding algorithm far simpler never results from Java-source-generated bytecode.
and more efficient and avoiding the limitation that many 9 This identity holds because we very carefully defined that effective address
JITs have in that they do not support irreducible control calculations do not wrap around.
250%
VM startup
compilation 2000 80
validation
relative execution time, native is 100%
150% 0
0 20 40 60 80
1000
100%
50% 500
WebAssembly/asm.js
0% WebAssembly/native
0
m
m
i
g
rre ky
ria n
do ce
du n
dy rbin
fd og
ge d
ge m
gr esu ver
sc mv
lu idt
p
lu
id vt
sy d
m
k
tri rk
trm v
m
0 500 1000 1500 2000
ad
r2
l
ch bic
va tio
-2
-2
so
se m
2m
3m
sy
co oles
hm
itg
am m
g m
dc
sy
np
td
el
co la
Virtual Memory Techniques On 64 bit platforms, the SpiderMonkey normalized to native execution.10 Times for
WebAssembly engine can make use of virtual memory tech- both engines are shown as stacked bars, and the results show
niques to eliminate bounds checks for memory accesses alto- that there are still some execution time differences between
gether. The engine simply reserves 8 GiB of virtual address them because of different code generators11 . We measured a
space and marks as inaccessible all pages except the valid fixed VM startup time of 18 ms for V8 and 30 ms for Spi-
portion of memory near the beginning. Since WebAssembly derMonkey. These times are included along with compila-
memory addresses and offsets are 32 bit integers, by defini- tion times as bars stacked on top of the execution time of
tion an access of base[n + k] cannot be larger than 8 GiB each benchmark. Note that the fixed cost of VM startup as a
from the beginning of base. Since most 64 bit CPU architec- stacked bar also gives a clue to which benchmarks are short-
tures offer 32 bit arithmetic on general purpose registers that running (startup and compilation are significant). Overall,
clears the upper 32 bits of the output register, the JIT can the results show that WebAssembly is very competitive with
simply emit accesses to (base + n)[k] and rely on the hard- native code, with 7 benchmarks within 10 % of native and
ware protection mechanism to catch out-of-bounds accesses. nearly all of them within 2× of native.
Moreover, since the memory size is no longer embedded in We also measured the execution time of the PolyBenchC
the generated machine code and base memory address does benchmarks running on asm.js. On average, WebAssembly
not change, no code patching is necessary to grow memory. is 33.7 % faster than asm.js. Especially validation is signifi-
cantly more efficient. For SpiderMonkey, WebAssembly val-
7.2 Improving Compile Time idation takes less than 3 % of the time of asm.js validation.
Parallel Compilation Since both V8 and SpiderMonkey In V8, memory consumption of WebAssembly validation is
implement ahead-of-time compilation, it is a clear perfor- less than 1 % of that for asm.js validation.
mance win to parallelize compilation of WebAssembly mod- Figure 6 compares code sizes between WebAssembly
ules, dispatching individual functions to different threads. (generated from asm.js inside V8), minified asm.js, and x86-
Both V8 and SpiderMonkey achieve a 5-6× improvement 64 native code. For the WebAssembly to asm.js comparison
in compilation speed with 8 compilation threads. we use the Unity benchmarks [9], for the WebAssembly to
native code comparison the PolyBenchC [7] and SciMark [8]
Code Caching While implementors have spent a lot of benchmarks. For each function in these benchmarks, a yel-
resources improving compilation speed of JITs to reduce low point is plotted at hsize asmjs , size wasm i and a blue point
cold startup time of WebAssembly, we expect that warm at hsize x86 , size wasm i. Any point below the diagonal repre-
startup time will become important as users will likely visit sents a function for which WebAssembly is smaller than
the same Web pages repeatedly. The JavaScript API for In- the corresponding other representation. On average, Web-
dexedDB [5] now allows JavaScript to manipulate and com- Assembly code is 62.5 % the size of asm.js (median 68.6 %),
pile WebAssembly modules and store their compiled repre- and 85.3 % of native x86-64 code size (median 78 %). The
sentation as an opaque blob in IndexedDB. This allows a few cases where WebAssembly is larger than native code are
JavaScript application to first query IndexedDB for a cached due to C’s pointers to stack locals requiring a shadow stack.
version of their WebAssembly module before downloading
and compiling it. This mechanism has already been imple- 10 Experiments were performed on a Linux 3.13.0-100 workstation with two
mented in V8 and SpiderMonkey and accounts for an order 12-core 2.60 GHz Intel Xeon processors (2 hyperthreads per core), 30 MiB
of magnitude startup time improvement. shared L3-cache, and 64 GiB RAM. We use Clang 3.8.0-2 to generate native
code and Emscripten 1.37.3 to generate both WebAssembly and asm.js
7.3 Measurements code. All results are averaged over 15 runs.
11 V8 is faster on some benchmarks and SpiderMonkey on others. Neither
Figure 5 shows the execution time of the PolyBenchC [7] engine is universally faster than the other, but both achieve good results.
benchmark suite running on WebAssembly on both V8 and The difference is most pronounced for short-running programs.
8. Related Work LLVM. However, Mu does not enforce memory safety, since
The most direct precursors of WebAssembly are (P)NaCl [42, it is meant more as a substrate for language implementors
11, 18] and asm.js [4], which we discussed in Section 1.1. to build upon. The safety mechanisms are left up to higher
Efficient memory safety is a hard design constraint of layers of the stack, such as a trusted client language VM on
WebAssembly. Previous systems such as CCured [34] and top of Mu. Since the client language VM is trusted, a bug in
Cyclone [23] have imposed safety at the C language level, that layer could allow an incorrect program to read or write
which generally requires program changes. Other attempts memory arbitrarily or exhibit other undefined behavior.
have enforced it at the C abstract machine level with combi- For managed language systems with bytecode as their
nations of static and runtime checks [10, 20, 31], sometimes distribution format, the speed and simplicity of validation
assisted by hardware [16, 30]. For example, the Secure Vir- is key to good performance and high assurance. Our work
tual Architecture [13] defines an abstract machine based on was directly informed by experience with stack machines
LLVM bitcode that enforces the SAFECode [17] properties. such as the JVM [27] and CIL [33] and their validation al-
Typed intermediate languages carry type information gorithms. By designing WebAssembly in lock-step with a
throughout the compilation process. For example, TIL [28] formalization we managed to make its semantics drastically
and FLINT [37] pioneered typed ILs for functional lan- simpler. For example, JVM bytecode verification takes more
guages, allowing higher confidence in compiler correct- than 150 pages to describe in the current JVM specification,
ness and more type-based optimizations. However, typed while for WebAssembly it fits on one page (Figure 3). It took
ILs have a different purpose than a compilation target. They a decade of research to hash out the details of correct JVM
are typically compiler-specific data structures that exist only verification [25], including the discovery of inherent vulner-
as an intermediate stage of compilation, not as a storage or abilities [15, 19] – such as a potential O(n3 ) worst-case of
execution format. the iterative dataflow approach that is a consequence of the
Typed Assembly languages [29] do serve as a compila- JVM’s unrestricted gotos and other idiosyncracies [39] that
tion target, usually taking the form of a complex type sys- had to be fixed with the addition of stack maps.
tem imposed on top of an existing assembly language. Com- Both the JVM and the CIL, as well as Android Dalvik [3],
pilers that target typed assembly languages must produce allow bytecode to create irreducible loops and unbalanced
well-typed (or proof-carrying [32]) code by preserving types locking structures, features which usually cause optimizing
throughout compilation. The modelling of complex types JITs to give up and relegate methods containing those con-
imposes a severe burden on compilers, requiring them to pre- structs to an interpreter. In contrast, the structured control
serve and transform quantified types throughout compilation flow of WebAssembly makes validation and compilation fast
and avoid optimizations that break the type system. and simple and paves the way for structured locking and ex-
We investigated reusing another compiler IR which has ception constructs in the future.
a binary format. In fact, LLVM bitcode is the binary for-
mat used by PNaCl. Disadvantages with LLVM bitcode in 9. Future Directions
particular are that (1) it is not entirely stable, (2) it has un- The initial version of WebAssembly presented here focuses
defined behavior which had to be corrected in PNaCl, (3) it on supporting low-level code, specifically compiled from
was found to be less compact than a stack machine, (4) it C/C++. A few important features are still missing for fully
essentially requires every consumer to either include LLVM comprehensive support of this domain and will be added in
or reimplement a fairly complex LLVM IR decoder/verifier, future versions, such as zero-cost exceptions, threads, and
and (5) the LLVM backend is notoriously slow. Other com- SIMD instructions. Some of these features are already being
piler IRs have similar, sometimes worse, properties. In gen- prototyped in implementations of WebAssembly.
eral, compiler IRs are better suited to optimization and trans- Beyond that, we intend to evolve WebAssembly further
formation, and not as compact, verifiable code formats. into an attractive target for high-level languages by including
In comparison to typed intermediate languages, typed as- relevant primitives like tail calls, stack switching, or corou-
sembly languages, and safe “C” machines, WebAssembly tines. A highly important goal is to provide access to the
radically reduces the scope of responsibility for the VM: it is advanced and highly tuned garbage collectors that are built
not required to enforce the type system of the original pro- into all Web browsers, thus eliminating the main shortcom-
gram at the granularity of individual objects; instead it must ing relative to JavaScript when compiling to the Web.
only enforce memory safety at the much coarser granular- Finally, we anticipate that WebAssembly will find a wide
ity of a module’s memory. This can be done efficiently with range of use cases off the Web, and expect that it will poten-
simple bounds checks or virtual memory techniques. tially grow additional feature necessary to support these.
Mu [40] is a low-level “micro virtual machine” de-
signed to be a minimalist set of abstractions over hardware, Acknowledgements
memory management, and concurrency. It offers an object We thank the members of the W3C WebAssembly commu-
model complete with typed pointers and automatic memory nity group for their many contributions to the design and
management, concurrency abstractions such as threads and implementation of WebAssembly, and Conrad Watt for valu-
stacks, as well as an intermediate representation based on able feedback on the formalization.
References [19] A. Gal, C. W. Probst, and M. Franz. Complexity-based
[1] Activex controls. https://msdn.microsoft.com/en-us/ denial of service attacks on mobile-code systems. Technical
library/aa751968(v=vs.85).aspx. Accessed: 2016-11- Report 04-09, School of Information and Computer Science,
14. University of California, Irvine, Irvine, CA, April 2004.
[2] Adobe Shockwave Player. https://get.adobe.com/ [20] M. Grimmer, R. Schatz, C. Seaton, T. Würthinger, and
shockwave/. Accessed: 2016-11-14. H. Mössenböck. Memory-safe execution of C on a Java VM.
In Proceedings of the 10th ACM Workshop on Programming
[3] ART and Dalvik. https://source.android.com/devices/ Languages and Analysis for Security, PLAS’15, pages 16–27,
tech/dalvik/. Accessed: 2016-11-14. New York, NY, USA, 2015. ACM.
[4] asm.js. http://asmjs.org. Accessed: 2016-11-08. [21] D. Gudeman. Representing type information in dynamically
[5] Indexed Database API. https://www.w3.org/TR/IndexedDB/. typed languages. Technical Report 93-27, Department of
Accessed: 2016-11-08. Computer Science, University of Arizona, Phoenix, Arizona,
October 1993.
[6] LEB128. https://en.wikipedia.org/wiki/LEB128.
Accessed: 2016-11-08. [22] U. Hölzle, C. Chambers, and D. Ungar. Debugging opti-
mized code with dynamic deoptimization. SIGPLAN Not.,
[7] PolyBenchC: the polyhedral benchmark suite. http://web.
27(7):32–43, July 1992.
cs.ucla.edu/~pouchet/software/polybench/. Ac-
cessed: 2017-03-14. [23] T. Jim, J. G. Morrisett, D. Grossman, M. W. Hicks, J. Cheney,
and Y. Wang. Cyclone: A safe dialect of C. In Proceedings
[8] Scimark 2.0. http://math.nist.gov/scimark2/. Ac-
the USENIX Annual Technical Conference, ATEC ’02, pages
cessed: 2017-03-15.
275–288, Berkeley, CA, USA, 2002. USENIX Association.
[9] Unity benchmarks. http://beta.unity3d.com/jonas/
[24] C. Lattner and V. Adve. LLVM: A compilation framework
benchmark2015/. Accessed: 2017-03-15.
for lifelong program analysis & transformation. In Proceed-
[10] P. Akritidis, M. Costa, M. Castro, and S. Hand. Baggy ings of the International Symposium on Code Generation and
bounds checking: An efficient and backwards-compatible de- Optimization, CGO ’04, Palo Alto, California, Mar 2004.
fense against out-of-bounds errors. In Proceedings of the
[25] X. Leroy. Java bytecode verification: Algorithms and formal-
18th Conference on USENIX Security Symposium, SSYM’09,
izations. J. Autom. Reason., 30(3-4):235–269, Aug. 2003.
pages 51–66, Berkeley, CA, USA, 2009. USENIX Associa-
tion. [26] X. Leroy, D. Doligez, A. Frisch, J. Garrigue, D. Rémy, and
J. Vouillon. The OCaml system. INRIA, 2016.
[11] J. Ansel, P. Marchenko, U. Erlingsson, E. Taylor, B. Chen,
D. L. Schuff, D. Sehr, C. L. Biffle, and B. Yee. Language- [27] T. Lindholm, F. Yellin, G. Bracha, and A. Buckley. The Java
independent sandboxing of just-in-time compilation and self- Virtual Machine Specification (Java SE 8 Edition). Technical
modifying code. In Proceedings of the ACM SIGPLAN Con- report, Oracle, 2015.
ference on Programming Language Design and Implementa- [28] G. Morrisett, D. Tarditi, P. Cheng, C. Stone, P. Cheng, P. Lee,
tion, PLDI ’11, pages 355–366, New York, NY, USA, 2011. C. Stone, R. Harper, and P. Lee. The TIL/ML compiler:
ACM. Performance and safety through types. In In Workshop on
[12] C. Click and M. Paleczny. A simple graph-based intermediate Compiler Support for Systems Software, 1996.
representation. SIGPLAN Not., 30(3):35–49, Mar. 1995. [29] G. Morrisett, D. Walker, K. Crary, and N. Glew. From System
[13] J. Criswell, A. Lenharth, D. Dhurjati, and V. Adve. Se- F to Typed Assembly Language. ACM TOPLAS, 21(3):527–
cure Virtual Architecture: A safe execution environment for 568, May 1999.
commodity operating systems. SIGOPS Oper. Syst. Rev., [30] S. Nagarakatte, M. M. K. Martin, and S. Zdancewic. Watch-
41(6):351–366, Oct. 2007. dogLite: Hardware-accelerated compiler-based pointer check-
[14] N. G. de Bruijn. Lambda calculus notation with nameless ing. In Proceedings of Annual IEEE/ACM International Sym-
dummies: a tool for automatic formula manipulation with posium on Code Generation and Optimization, CGO ’14,
application to the Church-Rosser theorem. Indag. Math., pages 175:175–175:184, New York, NY, USA, 2014. ACM.
34:381–392, 1972. [31] S. Nagarakatte, J. Zhao, M. M. Martin, and S. Zdancewic.
[15] D. Dean, E. Felten, and D. Wallach. Java security: from SoftBound: Highly compatible and complete spatial memory
HotJava to Netscape and beyond. In Symposium on Security safety for C. SIGPLAN Not., 44(6):245–258, June 2009.
and Privacy. IEEE Computer Society Press, 1996. [32] G. C. Necula. Proof-carrying code. In Proceedings of the
[16] J. Devietti, C. Blundell, M. M. K. Martin, and S. Zdancewic. 24th ACM SIGPLAN-SIGACT Symposium on Principles of
HardBound: Architectural support for spatial safety of the C Programming Languages, POPL ’97, pages 106–119, New
programming language. SIGPLAN Not., 43(3):103–114, Mar. York, NY, USA, 1997. ACM.
2008. [33] G. C. Necula, S. McPeak, S. P. Rahul, and W. Weimer. CIL:
[17] D. Dhurjati, S. Kowshik, and V. Adve. SAFECode: Enforcing Intermediate language and tools for analysis and transforma-
alias analysis for weakly typed languages. SIGPLAN Not., tion of C programs. In Proceedings of the 11th International
41(6):144–157, June 2006. Conference on Compiler Construction, CC ’02, pages 213–
228, London, UK, UK, 2002.
[18] A. Donovan, R. Muth, B. Chen, and D. Sehr. PNaCl: Portable
native client executables. Technical report, 2010. [34] G. C. Necula, S. McPeak, and W. Weimer. CCured: Type-safe
retrofitting of legacy code. SIGPLAN Not., 37(1):128–139, [40] K. Wang, Y. Lin, S. M. Blackburn, M. Norrish, and A. L.
Jan. 2002. Hosking. Draining the Swamp: Micro virtual machines as a
[35] B. Pierce. Types and Programming Languages. The MIT solid foundation for language development. In 1st Summit on
Press, Cambridge, Massachusetts, USA, 2002. Advances in Programming Languages, volume 32 of SNAPL
’15, pages 321–336, Dagstuhl, Germany, 2015.
[36] G. Plotkin. A structural approach to operational semantics.
Journal of Logic and Algebraic Programming, 60-61:17–139, [41] A. Wright and M. Felleisen. A syntactic approach to type
2004. soundness. Information and Computation, 115, 1994.
[37] Z. Shao. An overview of the FLINT/ML compiler. In Proc. [42] B. Yee, D. Sehr, G. Dardyk, B. Chen, R. Muth, T. Ormandy,
1997 ACM SIGPLAN Workshop on Types in Compilation S. Okasaka, N. Narula, and N. Fullagar. Native Client: A
(TIC’97), Amsterdam, The Netherlands, June 1997. sandbox for portable, untrusted x86 native code. In IEEE
Symposium on Security and Privacy, Oakland ’09, IEEE, 3
[38] Y. Shi, K. Casey, M. A. Ertl, and D. Gregg. Virtual machine Park Avenue, 17th Floor, New York, NY 10016, 2009.
showdown: Stack versus registers. ACM Transactions on Ar-
chitecture and Code Optimizations, 4(4):2:1–2:36, Jan. 2008. [43] A. Zakai. Emscripten: An LLVM-to-JavaScript compiler. In
Proceedings of the ACM International Conference on Object
[39] R. F. Strk and J. Schmid. Java bytecode verification is not Oriented Programming Systems Languages and Applications,
possible (extended abstract). In Formal Methods and Tools OOPSLA ’11, pages 301–312, New York, NY, USA, 2011.
for Computer Science (Proceedings of Eurocast 2001, pages ACM.
232–234, 2001.