Ebpf, Fpga
Ebpf, Fpga
DOI:10.1145/ 35 43 6 6 8
hXDP: Efficient
To view the accompanying Technical Perspective,
visit doi.acm.org/10.1145/3543844 tp
AU G U ST 2 0 2 2 | VO L. 6 5 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 93
research highlights
Before presenting a more detailed description of the hXDP solutions the FPGA-based executor would be 2–3× slower
concept, we now give a brief background about XDP. than the CPU core.
Furthermore, existing solutions to speed-up sequential
2.1. XDP primer code execution, for example, superscalar architectures,
XDP allows programmers to inject programs at the NIC are too expensive in terms of hardware resources to be
driver level, so that such programs are executed before a adopted in this case. In fact, in a superscalar architecture the
network packet is passed to the Linux’s network stack. XDP speed-up is achieved leveraging instruction-level parallel-
programs are based on the Linux’s eBPF technology. eBPF ism at runtime. However, the complexity of the hardware
provides an in-kernel virtual machine for the sandboxed required to do so grows exponentially with the number
execution of small programs within the kernel context. In of instructions being checked for parallel execution. This
its current version, the eBPF virtual machine has 11 64b rules out re-using general-purpose soft-core designs, such
registers: r0 holds the return value from in-kernel functions as those based on RISC-V.16, 14
and programs, r1 – r5 are used to store arguments that are
passed to in-kernel functions, r6 – r9 are registers that are 2.3. hXDP Overview
preserved during function calls, and r10 stores the frame hXDP addresses the outlined challenge by taking a soft-
pointer to access the stack. The eBPF virtual machine has ware-hardware co-design approach. In particular, hXDP
a well-defined ISA composed of more than 100 fixed length provides both a compiler and the corresponding hardware
instructions (64b). Programmers usually write an eBPF pro- module. The compiler takes advantage of eBPF ISA optimi-
gram using the C language with some restrictions, which zation opportunities, leveraging hXDP’s hardware module
simplify the static verification of the program. features that are introduced to simplify the exploitation of
eBPF programs can also access kernel memory areas such opportunities. Effectively, we design a new ISA that
called maps, that is, kernel memory locations that essen- extends the eBPF ISA, specifically targeting the execution
tially resemble tables. For instance, eBPF programs can use of XDP programs.
maps to implement arrays and hash tables. An eBPF pro- The compiler optimizations perform transformations
gram can interact with map’s locations by means of pointer at the eBPF instruction level: remove unnecessary instruc-
deference, for un-structured data access, or by invoking tions; replace instructions with newly defined more con-
specific helper functions for structured data access, for cise instructions; and parallelize instruction execution. All
example, a lookup on a map configured as a hash table. the optimizations are performed at compile time, moving
Maps are especially important since they are the only mean most of the complexity to the software compiler, thereby
to keep state across program executions, and to share infor- reducing the target hardware complexity. Accordingly, the
mation with other eBPF programs and with programs run- hXDP hardware module implements an infrastructure to
ning in user space. run up to 4 instructions in parallel, implementing a Very
Long Instruction Word (VLIW) soft processor. The VLIW
2.2. Challenges soft processor does not provide any runtime program opti-
To grasp an intuitive understanding of the design challenge mization, for example, branch prediction, instruction re-
involved in supporting XDP on FPGA, we now consider the ordering. We rely entirely on the compiler to optimize XDP
example of an XDP program that implements a simple state- programs for high-performance execution, thereby freeing
ful firewall for checking the establishment of bi-directional the hardware module of complex mechanisms that would
TCP or UDP flows. A C program describing this simple fire- use more hardware resources.
wall function is compiled to 71 eBPF instructions. Ultimately, the hXDP hardware component is deployed
We can build a rough idea of the potential best-case as a self-contained IP core module to the FPGA. The mod-
speed of this function running on an FPGA-based eBPF ule can be interfaced with other processing modules if
executor, assuming that each eBPF instruction requires 1 needed, or just placed as a bump-in-the-wire between the
clock cycle to be executed, that clock cycles are not spent for
any other operation, and that the FPGA has a 156MHz clock Figure 2. An overview of the XDP workflow and architecture,
rate, which is common in FPGA NICs.32 In such a case, a including the contribution of this article.
naive FPGA implementation that implements the sequen-
tial eBPF executor would provide a maximum throughput eBPF BCC toolstack Control
Program Program
of 2.8 Million packets per second (Mpps), under optimistic
ELF
assumptions, for example, assuming no additional over- Object file Compiler
This paper contribution
AU G U ST 2 0 2 2 | VO L. 6 5 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 95
research highlights
Instruction
Packet
Select. Decoder
Finally, during the commit stage the results of the IE phase
fetch
logic write Instr. Instr.
1, 2, 4, 6, 8
Decoder bytes
Exec decode are stored back to the register file, or to one of the memory
packet selection signal
read Write
1, 2, 4, 6, 8 Buffer
start Helper jump Program areas. Sephirot terminates execution when it finds an exit
bytes exit function counter
Read Packet Write Scratch call instruction, in which case it signals to the APS the packet
logic Buffer logic memory Instr.
Memory forwarding decision.
32B
Helper BUS
Prog.
32B
Structured
4.2. Pipeline optimizations
Data BUS
Input access to
Helper Functions maps
Queue We now list a subset of notable architectural optimizations
Output Maps
New packet
input frames Queue Memory Configurator
we applied to our design.
Program state self-reset. As we have seen in Section 3,
eBPF programs may perform zero-ing of the variables they
are going to use. We provide automatic reset of the stack
core; the Helper Functions Module (HF); and the Memory and of the registers at program initialization. This is an
Maps Module (MM). All the modules work in the same inexpensive feature in hardware, which improves security9
clock frequency domain. Incoming data is received by and allows us to remove any such zero-ing instruction from
the PIQ. The APS reads a new packet from the PIQ into the program.
its internal packet buffer. In doing so, the APS provides Parallel branching. The presence of branch instructions
a byte-aligned access to the packet data through a data may cause performance problems with architectures that
bus, which Sephirot uses to selectively read/write the lack branch prediction, and speculative and out-of-order
packet content. When the APS makes a packet available execution. For Sephirot, this forces a serialization of
to the Sephirot core, the execution of a loaded eBPF the branch instructions. However, in XDP programs there
program starts. Instructions are entirely executed within are often series of branches in close sequence, especially
Sephirot, using four parallel execution lanes, unless during header parsing. We enabled the parallel execu-
they call a helper function or read/write to maps. In such tion of such branches, establishing a priority ordering of
cases, the corresponding modules are accessed using the the Sephirot’s lanes. That is, all the branch instructions
helper bus and the data bus, respectively. We detail the are executed in parallel by the VLIW’s lanes. If more than
architecture’s core component, that is, the Sephirot one branch is taken, the highest priority one is selected to
eBPF processor, next. update the program counter. The compiler takes that into
Sephirot is a VLIW processor with four parallel lanes account when scheduling instructions, ordering the branch
that execute eBPF instructions. Sephirot is designed as a instructions accordingly.b
pipeline of four stages: instruction fetch (IF); instruction Early processor exit. The processor stops when an exit
decode (ID); instruction execute (IE); and commit. A pro- instruction is executed. The exit instruction is recognized
gram is stored in a dedicated instruction memory, from during the IF phase, which allows us to stop the processor
which Sephirot fetches the instructions in order. The pipeline early and save the three remaining clock cycles.
processor has another dedicated memory area to imple- This optimization improves also the performance gain
ment the program’s stack, which is 512B in size, and 11 64b obtained by extending the ISA with parameterized exit
registers stored in the register file. These memory and reg- instructions, as described in Section 3. In fact, XDP pro-
ister locations match one-to-one the eBPF virtual machine grams usually perform a move of a value to r0, to define the
specification. Sephirot begins execution when the APS forwarding action, before calling an exit. Setting a value to
has a new packet ready for processing, and it gives the pro- a register always needs to traverse the entire Sephirot
cessor start signal. pipeline. Instead, with a parameterized exit we remove the
On processor start (IF stage), a VLIW instruction is read need to assign a value to r0, since the value is embedded in
and the 4 extended eBPF instructions that compose it are a newly defined exit instruction.
statically assigned to their respective execution lanes. In
this stage, the operands of the instructions are pre-fetched 4.3 Implementation
from the register file. The remaining 3 pipeline stages are We prototyped hXDP using the NetFPGA,32 a board embed-
performed in parallel by the four execution lanes. During ID, ding 4 10Gb ports and a Xilinx Virtex7 FPGA. The hXDP
memory locations are pre-fetched, if any of the eBPF instruc- implementation uses a frame size of 32B and is clocked at
tions is a load, while at the IE stage the relevant subunit is 156.25MHz. Both settings come from the standard configu-
activated, using the related pre-fetched values. The subunits ration of the NetFPGA reference NIC design.
are the Arithmetic and Logic Unit (ALU), the Memory Access The hXDP FPGA IP core takes 9.91% of the FPGA logic
Unit, and the Control Unit. The ALU implements all the resources, 2.09% of the register resources, and 3.4% of the
operations described by the eBPF ISA, with the notable dif-
ference that it is capable of performing operations on three b This applies equally to a sequence of if...else or goto statements.
250 bound-checks
on the effect of the hXDP design choices. Unfortunately, the 6B load/store
NFP4000 offers only limited eBPF support, which does not parametrized-exit
3-operands
allow us to run a complete evaluation. We further include a 150 ×86 JIT
comparison of hXDP to other FPGA NIC programming solu-
tions, before concluding the section with a brief discussion
of the evaluation results. 50
p2
fo
ck
an
i
ne
al
ta
o
xd
tr
xd
ew
n
_
_i
_s
_i
Ka
tu
st
q
r
fir
p
rx
te
ju
p_
e_
p_
ad
_i
ro
pl
xd
p_
tx
p_
sim
xd
xd
Table 1. Tested Linux XDP example programs. Table 2. Programs’ number of instructions, ×86 runtime instruction-
per-cycle (IPC) and hXDP static IPC mean rates.
Program Description
xdp1 parse pkt headers up to IP, and XDP_DROP Program # instr. ×86 IPC hXDP IPC
xdp2 parse pkt headers up to IP, and XDP_TX xdp1 61 2.20 1.70
xdp_adjust_tail receive pkt, modify pkt into ICMP pkt and XDP_TX xdp2 78 2.19 1.81
router_ipv4 parse pkt headers up to IP, look up in routing table and xdp_adjust_tail 117 2.37 2.72
forward (redirect) router_ipv4 119 2.38 2.38
rxq_info (drop) increment counter and XDP_DROP rxq_info 81 2.81 1.76
rxq_info (tx) increment counter and XDP_TX tx_ip_tunnel 283 2.24 2.83
tx_ip_tunnel parse pkt up to L4, encapsulate and XDP_TX simple_firewall 72 2.16 2.66
redirect_map output pkt from a specified interface (redirect) Katran 268 2.32 2.62
AU G U ST 2 0 2 2 | VO L. 6 5 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 97
research highlights
at 800MHz. The server machine is equipped with an Intel executors with a very high clock frequency are advantaged,
Xeon E5-1630 v3 @3.70GHz, an Intel XL710 40GbE NIC, since they can run more instructions per second. However,
and running Linux v.5.6.4 with the i40e Intel NIC drivers. notice the clock frequencies of the CPUs deployed at, for
During the tests, we use different CPU frequencies, that example, Facebook’s datacenters15 have frequencies close to
is, 1.2GHz, 2.1GHz, and 3.7GHz, to cover a larger spec- 2.1GHz, favoring many-core deployments in place of high-
trum of deployment scenarios. In fact, many deployments frequency ones. hXDP clocked at 156MHz is still capable of
favor CPUs with lower frequencies and a higher number outperforming a CPU core clocked at that frequency.
of cores.15 We use a DPDK packet generator to perform Linux examples. We finally measure the performance
throughput and latency measurements. The packet gen- of the Linux’s XDP examples listed in Table 1. These applica-
erator is capable of generating a 40Gbps throughput with tions allow us to better understand the hXDP performance
any packet size and it is connected back-to-back with the with programs of different types (see Figure 9). We can iden-
system-under-test, that is, the hXDP prototype running tify three categories of programs. First, programs that for-
on the NetFPGA, the Netronome SmartNIC, or the Linux ward packets to the NIC interfaces are faster when running
server running XDP. Delay measurements are performed on hXDP. These programs do not pass packets to the host
using hardware packet timestamping at the traffic genera- system, and thus, they can live entirely in the NIC. For such
tor’s NIC and measure the round-trip time. Unless differ- programs, hXDP usually performs at least as good as a single
ently stated, all the tests are performed using packets with ×86 core clocked at 2.1GHz. In fact, processing XDP on the
size 64B belonging to a single network flow. This is a chal- host system incurs the additional PCIe transfer overhead
lenging workload for the systems under test. to send the packet back to the NIC. Second, programs that
Applications performance. In Section 2, we mentioned always drop packets are usually faster on ×86, unless the pro-
that an optimistic upper-bound for the simple firewall cessor has a low frequency, such as 1.2GHz. Here, it should
performance would have been 2.8Mpps. When using be noted that such programs are rather uncommon, for
hXDP with all the compiler and hardware optimizations example, programs used to gather network traffic statistics
described in this paper, the same application achieves a receiving packets from a network tap. Finally, programs that
throughput of 6.53Mpps, as shown in Figure 7. This is only are long, for example, tx_ip_tunnel has 283 instructions,
12% slower than the same application running on a pow- are faster on ×86. Like we noticed in the case of Katran, with
erful ×86 CPU core clocked at 3.7GHz and 55% faster than longer programs the hXDP’s implementation low clock fre-
the same CPU core clocked at 2.1GHz. In terms of latency, quency can become problematic.
hXDP provides about 10× lower packet processing latency, 5.1.1. Comparison to other FPGA solutions. hXDP
for all packet sizes (see Figure 8). This is the case since provides a more flexible programming model than previous
hXDP avoids crossing the PCIe bus and has no software- work for FPGA NIC programming. However, in some cases,
related overheads. We omit latency results for the remain- simpler network functions implemented with hXDP could
ing applications, since they are not significantly different. be also implemented using other programming approaches
While we are unable to run the simple firewall applica- for FPGA NICs, while keeping functional equivalence. One
tion using the Netronome’s eBPF implementation, Figure such example is the simple firewall presented in this article,
8 shows also the forwarding latency of the Netronome which is supported also by FlowBlaze.28
NFP4000 (nfp label) when programmed with an XDP pro- Throughput. Leaving aside the cost of reimplementing the
gram that only performs packet forwarding. Even in this function using the FlowBlaze abstraction, we can gener-
case, we can see that hXDP provides a lower forwarding ally expect hXDP to be slower than FlowBlaze at process-
latency, especially for packets of smaller sizes. ing packets. In fact, in the simple firewall case, FlowBlaze
When measuring Katran we find that hXDP is instead can forward about 60Mpps vs. 6.5Mpps of hXDP. The
38% slower than the ×86 core at 3.7GHz and only 8% faster FlowBlaze design is clocked at 156MHz, like hXDP, and
than the same core clocked at 2.1GHz. The reason for this its better performance is due to the high level of special-
relatively worse hXDP performance is the overall program ization. FlowBlaze is optimized to process only packet
length. Katran’s program has many instructions, as such headers, using statically defined functions. This requires
loading a new bitstream on the FPGA when the function
Figure 7. Throughput for real-world applications. hXDP is faster than
Figure 8. Packet forwarding latency for different packet sizes.
a high-end CPU core clocked at over 2GHz.
50 ×[email protected]
×[email protected]
7 nfp
×[email protected] hXDP
6 ×[email protected] 40
latency [us]
hXDP
5 30
Mpps
4
20
3
2 10
1
0
0 64 256 512 1518
simple firewall katran packet size [bytes]
10
6. RELATED WORK
0 NIC programming. AccelNet11 is a match-action offload-
ing engine used in large cloud datacenters to offload vir-
p_ er )
p_ _in 4
xd rec tx)
ju ap
l
p1
p_ op)
el
ai
xd out (tx
xd rxq _ipv
nn
_t
xd
p_ t_m
di fo (
tx (dr
2
tu
st
p_ dp
nf
ad
_i
i
q_
r
re
p_
xd
xd
AU G U ST 2 0 2 2 | VO L. 6 5 | N O. 8 | C OM M U N IC AT ION S OF T HE ACM 99
research highlights
Acknowledgments 23. Kaufmann, A., Peter, S., Anderson, T., Design and Implementation (NSDI 19).
The research leading to these results has received fund- Krishnamurthy, A. Flexnic: Rethinking Boston, MA, USENIX Association, 2019,
network DMA. In 15th Workshop on Hot 531–548
ing from the ECSEL Joint Undertaking in collaboration Topics in Operating Systems (HotOS 29. Sultana, N., Galea, S., Greaves, D.,
with the European Union’s H2020 Framework Programme XV), Kartause Ittingen, Switzerland, Wojcik, M., Shipton, J., Clegg, R., Mai, L.,
USENIX Association, 2015. Bressana, P., Soulé, R., Mortier, R.,
(H2020/2014–2020) and National Authorities, under grant 24. Kicinski, J., Viljoen, N. eBPF hardware Costa, P., Pietzuch, P., Crowcroft, J.,
agreement n. 876967 (Project “BRAINE”). offload to SmartNICs: cls bpf and Moore, A.W., Zilberman, N. Emu: Rapid
XDP. Proc. Netdev 1, 2016. prototyping of networking services.
25. Michel, O., Bifulco, R., Rétvári, G., In 2017 USENIX Annual Technical
Schmid, S. The programmable data Conference (USENIX ATC 17), Santa
References USENIX Symposium on Networked plane: Abstractions, architectures, Clara, CA, USENIX Association, 2017,
1. P4-NetFPGA. https://github.com/ Systems Design and Implementation algorithms, and applications. 54, 4 459–471.
NetFPGA/P4-NetFPGA-public/wiki. (NSDI 18), Renton, WA, USENIX (2021). 30. Wang, H., Soulé, R., Dang, H.T.,
2. Bernstein, A.J. Analysis of programs Association, 2018, 51–66. 26. NEC. Building an Open vRAN Lee, K.S., Shrivastav, V., Foster, N.,
for parallel processing. IEEE Trans. 12. FlowBlaze. Repository with FlowBlaze Ecosystem White Paper. 2020. https:// Weatherspoon, H. P4fpga: A rapid
Electron. Comput EC-15, 5 (1966), source code and additional material. www.nec.com/en/global/solutions/5g/ prototyping framework for p4. In
757–763. http://axbryd.com/FlowBlaze.html. index.html. Proceedings of the Symposium on
3. Bosshart, P., Daly, D., Gibb, G., 13. Forencich, A., Snoeren, A.C., Porter, G., 27. Netronome. AgilioTM CX 2x40GbE SDN Research, SOSR ’17. New York,
Izzard, M., McKeown, N., Rexford, J., Papen, G. Corundum: An open- intelligent server adapter. https:// NY, USA, Association for Computing
Schlesinger, C., Talayco, D., Vahdat, A., source 100-Gbps NIC. In 28th IEEE www.netronome.com/media/ Machinery, 2017, 122–135.
Varghese, G., Walker, D. P4: International Symposium on Field- redactor_files/PB_Agilio_ 31. Xilinx. 5G Wireless Solutions Powered
Programming protocol-independent Programmable Custom Computing CX_2x40GbE.pdf. by Xilinx. 2020. https://www.xilinx.com/
packet processors. SIGCOMM Machines, 2020. 28. Pontarelli, S., Bifulco, R., Bonola, M., applications/megatrends/5g.html
Comput. Commun. Rev 44, 3 (2014), 14. Gautschi, M., Schiavone, P.D., Cascone, C., Spaziani, M., Bruschi, V., 32. Zilberman, N., Audzevich, Y.,
87–95. Traber, A., Loi, I., Pullini, A., Rossi, D., Sanvito, D., Siracusano, G., Capone, A., Covington, G.A., Moore, A.W. NetFPGA
4. Bosshart, P., Gibb, G., Kim, H.-S., Flamand, E., Gürkaynak, F.K., Benini, L.. Honda, M., Huici, F., Siracusano, G. SUME: Toward 100 Gbps as Research
Varghese, G., McKeown, N., Izzard, M., Near-threshold risc-v core with dsp Flowblaze: Stateful packet processing Commodity. IEEE Micro ’14 34, 5
Mujica, F., Horowitz, M.. Forwarding extensions for scalable iot endpoint in hardware. In 16th USENIX (2014), 32–41.
metamorphosis: Fast programmable devices. IEEE Trans. Very Large Symposium on Networked Systems
match-action processing in hardware Scale Integr. VLSI Syst 25, 10 (2017),
for sdn. In Proceedings of the ACM 2700–2713.
SIGCOMM 2013 Conference on 15. Hazelwood, K, Bird, S., Brooks, D., Marco Spaziani Brunella and Giacomo Giuseppe Bianchi and Luca Petrucci
SIGCOMM, SIGCOMM ‘13 (New Chintala, S., Diril, U., Dzhulgakov, D., Belocchi ({spaziani, belocchi}@axbryd.com), ({giuseppe.bianchi, luca.petrucci@}@
York, NY, USA, 2013). Association for Fawzy, M., Jia, B., Jia, Y., Kalro, A., Law, Axbryd/University of Rome Tor Vergata, uniroma2.it), University of Rome Tor
Computing Machinery, 99–110. J., Lee, K., Lu, J., Noordhuis, P., Rome, Italy. Vergata, Rome, Italy.
5. Brunella, M.S., Pontarelli, S., Bonola, M., Smelyanskiy, M., Xiong, L., Wang, X.
Bianchi, G. V-PMP: A VLIW packet Applied machine learning at Facebook: Salvatore Pontarelli (salvatore@axbryd. Aniello Cammarano (cammarano@
manipulator processor. In 2018 a datacenter infrastructure perspective. com), Axbryd/University of Rome La axbryd.com), Axbryd, Rome, Italy.
European Conference on Networks In High Performance Computer Sapienza, Rome, Italy.
and Communications (EuCNC), IEEE, Architecture (HPCA). IEEE, 2018. Alessandro Palumbo (palumbo@ing.
2018, 1–9. 16. Heinz, C., Lavan, Y., Hofmann, J., Marco Bonola ([email protected]), uniroma2.it), University of Rome Tor
6. Caulfield, A.M., Chung, E.S., Putnam, A., Koch, A. A catalog and in-hardware Axbryd/CNIT, Rome, Italy. Vergata, Rome, Italy.
Angepat, H., Fowers, J., Haselman, M., evaluation of open-source drop-in
Heil, S., Humphrey, M., Kaur, P., Kim, J., compatible risc-v softcore processors. Giuseppe Siracusano and Roberto
Lo, D., Massengill, T., Ovtcharov, K., In 2019 International Conference Bifulco ({giuseppe.siracusano, roberto.
Papamichael, M., Woods, L., Lanka, S., on ReConFigurable Computing and bifulco}@neclab.eu), NEC Laboratories
Chiou, D., Burger, D. A cloud-scale FPGAs (ReConFig), IEEE, 2019, 1–8. Europe, Heidelberg, Germany.
acceleration architecture. In 2016 17. Hennessy, J.L., Patterson, D.A. A new
49th Annual IEEE/ACM International golden age for computer architecture.
Symposium on Microarchitecture Commun. ACM 62, 2 (2019), 48–60.
(MICRO), 2016, 1–13. 18. Hohlfeld, O., Krude, J., Reelfs, J.H., This work is licensed under a Creative Commons
7. Chen, T., Moreau, T., Jiang, Z., Zheng, L., Rüth, J., Wehrle, K. Demystifying Attribution-NonCommercial-ShareAlike International 4.0 License.
Yan, E., Shen, H., Cowan, M., Wang, L., the performance of XDP BPF. In
Hu, Y., Ceze, L., Guestrin, C., 2019 IEEE Conference on Network
Krishnamurthy, A. TVM: An automated Softwarization (NetSoft), IEEE, 2019,
end-to-end optimizing compiler 208–212.
for deep learning. In 13th USENIX 19. Høiland-Jørgensen, T., Brouer, J.D.,
Symposium on Operating Systems Borkmann, D., Fastabend, J., Herbert, T.,
Design and Implementation (OSDI Ahern, D., Miller, D. The express data
18), USENIX Association, Carlsbad, path: Fast programmable packet
CA, 2018, 578–594. processing in the operating system
8. Chiou, D. The microsoft catapult kernel. In Proceedings of the 14th
project. In 2017 IEEE International International Conference on Emerging
Symposium on Workload Networking EXperiments and
Characterization (IISWC), IEEE, 2017, Technologies, CoNEXT ’18. New York,
124–124. NY, USA, Association for Computing
9. Dumitru, M.V., Dumitrescu, D., Raiciu, C.. Machinery, 2018, 54–66.
Can we exploit buggy p4 programs? In 20. Intel. 5g wireless. 2020 https://
Proceedings of the Symposium on SDN www.intel.com/content/www/
Research, SOSR ’20, Association for us/en/communications/products/
Computing Machinery, New York, NY, programmable/applications/
USA, 2020, 62–68. baseband.html.
10. Facebook. Facebook. Katran source 21. Iseli, C., Sanchez, E. Spyder: A
code repository, 2018. https: //github. reconfigurable vliw processor using
com/facebookincubator/katran. FPGAs. In [1993] Proceedings IEEE
11. Firestone, D., Putnam, A., Mundkur, S., Workshop on FPGAs for Custom
Chiou, D., Dabagh, A., Andrewartha, M., Computing Machines. IEEE, 1993,
Angepat, H., Bhanu, V., Caulfield, A., 17–24.
Chung, E., Chandrappa, H.K., 22. Jones, A.K., Hoare, R., Kusic, D.,
Chaturmohta, S., Humphrey, M., Lavier, Fazekas, J., Foster, J. An fpga-
J., Lam, N., Liu, F., Ovtcharov, K., based vliw processor with cust om
Padhye, J., Popuri, G., Raindel, S., hardware execution. In Proceedings
Sapre, T., Shaw, M., Silva, G., Sivakumar, of the 2005 ACM/SIGDA 13th
M., Srivastava, N., Verma, A., International Symposium on
Zuhair, Q., Bansal, D., Burger, D., Field-Programmable Gate Arrays, Watch the authors discuss
Vaid, K., Maltz, D.A., Greenberg, FPGA ’05. New York, NY, USA, this work in the exclusive
A. Azure accelerated networking: Association for Computing Communications video.
Smartnics in the public cloud. In 15th Machinery, 2005, 107–117. https://cacm.acm.org/videos/hxdp