SlideShare a Scribd company logo
12
How does a packet get through the
Network Stack?
(c) Karen Sagovac
Most read
13
Packet Processing
Link Layer
Ingress QoS
Proto Handler
IPv4
IPv6
ARP
IPX
...
Drop
The Feast!
RX Handler
Open vSwitch
Team
Bonding
Bridge
macvlan
macvtap
Packet Socket
ETH_P_ALL
tcpdump
Most read
16
TCP Fast Open
(net.ipv4.tcp_fastopen)
2nd
Req SYN
SYN+ACK
ACK+HTTP GET
Data
2x RTT
SYN+Cookie+HTTP GET
SYN+ACK+Data
2nd
Req
1x RTT
Client Server
SYN
SYN+ACK
ACK+HTTP GET
1st
Req
Data
2x RTT2x RTT
Regular
Client Server
SYN
SYN+ACK+Cookie
ACK+HTTP GET
1st
Req
Data
2x RTT
Fast Open
Most read
Kernel Networking Walkthrough
LinuxCon 2015, Seattle
Thomas Graf
Kernel & Open vSwitch Team
Noiro Networks (Cisco)
Agenda
● Getting packets from/to the NIC
● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO
● Packet processing
● RX Handler, IP Processing, TCP Processing, TCP Fast Open
● Queuing from/to userspace
● Socket Buffers, Flow Control, TCP Small Queues
● Q&A
Touring the Network Stack
Expectation Reality
How does a packet get in and out of
the Network Stack?
Receive & Transmit Process
Ring Buffer
DMA
Parse
L2 & IP
Parse
TCP/UDP
Socket Buffer
Task /
Container
read()
Ring Buffer
Construct
IP
Construct
TCP/UDP
Local?
Socket Buffer
Forward
Route?
write()
NIC Network Stack
(Kernel Space)
Process
(User Space)
The 3 ways into the Network Stack
Ring Buffer
Network
Stack
Interrupt Driven
A
Ring Buffer
Network
Stack
NAPI based Polling poll()
B
Ring Buffer Network
Stack
Busy Polling busy_poll()
Task
C
RSS – Receive Side Scaling
● NIC distributes packets across multiple RX queues
allowing for parallel processing.
● Separate IRQ per RX queue, thus selects CPU to run
hardware interrupt handler on.
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
CPU 1
CPU 2
CPU 1
CPU 2
filter
RPS – Receive Packet Steering
● Software filter to select CPU # for processing
● Use it to ...
RX-queue-1
RX-queue-2
RX-queue-3
RX-queue-4
CPU 1
CPU 2
CPU 3
CPU 1
CPU 2
CPU 3
... redo queue - CPU mapping ... distribute single queue to
multiple CPUs
Hardware Offload
● RX/TX Checksumming
● Perform CPU intensive checksumming in
hardware.
● Virtual LAN filtering and tag stripping
● Strip 802.1Q header and store VLAN ID
in network packet meta data.
● Filter out unsubscribed VLANs.
● Segmentation Offload
Generic Receive Offload
(ethtool -K eth0 gro on)
Ring Buffer
Network
Stack
poll()
NAPI based GRO
MTU
GRO
Up to 64K
It's more effective to process 1x64K bytes packet
instead of 40x1500 bytes packets.
Segmentation Offload
(ethtool -K eth0 tso on)
(ethtool -K eth0 gso on)
Ring Buffer
Network
Stack
Generic Segmentation Offload (GSO)
ethtool -K eth0 gso on
MTU
TCP Segmentation Offload (TSO)
ethtool -K eth0 tso on
MTU
Up to 64K
How does a packet get through the
Network Stack?
(c) Karen Sagovac
Packet Processing
Link Layer
Ingress QoS
Proto Handler
IPv4
IPv6
ARP
IPX
...
Drop
The Feast!
RX Handler
Open vSwitch
Team
Bonding
Bridge
macvlan
macvtap
Packet Socket
ETH_P_ALL
tcpdump
IP Processing
IP
Handler Route Lookup
PREROUTING
IPv4
Construction
Route Lookup
Local Output
OUTPUT
POSTROUTINGLink Layer
FORWARD
Forwarding
L4
(TCP, ...)
Local Delivery
INPUT
User
Space
TCP Processing
IP
Socket Filter
Receive TCP
Parse TCP
Lookup Socket
Backlog
socket locked
Receive Socket Buffer
Prequeue
task exists
process context ← softirq
Task
poll()read()
TCP Fast Open
(net.ipv4.tcp_fastopen)
2nd
Req SYN
SYN+ACK
ACK+HTTP GET
Data
2x RTT
SYN+Cookie+HTTP GET
SYN+ACK+Data
2nd
Req
1x RTT
Client Server
SYN
SYN+ACK
ACK+HTTP GET
1st
Req
Data
2x RTT2x RTT
Regular
Client Server
SYN
SYN+ACK+Cookie
ACK+HTTP GET
1st
Req
Data
2x RTT
Fast Open
Memory Accounting & Flow Control
Socket Buffers & Flow Control
(net.ipv4.tcp_{r|w}mem)
ssh
TX Ring Buffer
TCP/IP
Socket Buffer
wmem
overlimit?
Block or EWOULDBLOCK
wmem += packet-size
ssh
RX Ring Buffer
TCP/IP
Socket Buffer
rmem -= packet-size
rmem
overlimit?
Reduce TCP Window
rmem += packet-size
wmem -= packet-size
write()
TCP Small Queues
(net.ipv4.tcp_limit_output_bytes)
ssh
TX Ring Buffer
Driver
TCP/IP
Socket Buffer
write()
Queuing Discipline
torrent
Socket Buffer
write()
TSQ: max 128Kb in flight per socket
Q&A
Contact:
● E-Mail: tgraf@suug.ch
● Twitter: @tgraf__

More Related Content

What's hot (20)

DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Dive
rajdeep
 
Understanding DPDK
Understanding DPDKUnderstanding DPDK
Understanding DPDK
Denys Haryachyy
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
DPDK In Depth
DPDK In DepthDPDK In Depth
DPDK In Depth
Kernel TLV
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
DPDK KNI interface
DPDK KNI interfaceDPDK KNI interface
DPDK KNI interface
Denys Haryachyy
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
vivekkonnect
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF Abyss
Sasha Goldshtein
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
Viller Hsiao
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
YongKi Kim
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 
DevConf 2014 Kernel Networking Walkthrough
DevConf 2014   Kernel Networking WalkthroughDevConf 2014   Kernel Networking Walkthrough
DevConf 2014 Kernel Networking Walkthrough
Thomas Graf
 
OpenvSwitch Deep Dive
OpenvSwitch Deep DiveOpenvSwitch Deep Dive
OpenvSwitch Deep Dive
rajdeep
 
BPF: Tracing and more
BPF: Tracing and moreBPF: Tracing and more
BPF: Tracing and more
Brendan Gregg
 
The TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux KernelThe TCP/IP Stack in the Linux Kernel
The TCP/IP Stack in the Linux Kernel
Divye Kapoor
 
Tutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting routerTutorial: Using GoBGP as an IXP connecting router
Tutorial: Using GoBGP as an IXP connecting router
Shu Sugimoto
 
eBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux KerneleBPF - Rethinking the Linux Kernel
eBPF - Rethinking the Linux Kernel
Thomas Graf
 
Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!Understanding eBPF in a Hurry!
Understanding eBPF in a Hurry!
Ray Jenkins
 
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
Valeriy Kravchuk
 
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/NeutronOverview of Distributed Virtual Router (DVR) in Openstack/Neutron
Overview of Distributed Virtual Router (DVR) in Openstack/Neutron
vivekkonnect
 
Staring into the eBPF Abyss
Staring into the eBPF AbyssStaring into the eBPF Abyss
Staring into the eBPF Abyss
Sasha Goldshtein
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
Kernel TLV
 
BPF Internals (eBPF)
BPF Internals (eBPF)BPF Internals (eBPF)
BPF Internals (eBPF)
Brendan Gregg
 
Linux kernel tracing
Linux kernel tracingLinux kernel tracing
Linux kernel tracing
Viller Hsiao
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
Kernel TLV
 
Understanding Open vSwitch
Understanding Open vSwitch Understanding Open vSwitch
Understanding Open vSwitch
YongKi Kim
 
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NATOpen vSwitch - Stateful Connection Tracking & Stateful NAT
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Thomas Graf
 
DPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet ProcessingDPDK: Multi Architecture High Performance Packet Processing
DPDK: Multi Architecture High Performance Packet Processing
Michelle Holley
 

Similar to LinuxCon 2015 Linux Kernel Networking Walkthrough (20)

CCIE_RS_Quick_Review_Kit
CCIE_RS_Quick_Review_KitCCIE_RS_Quick_Review_Kit
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
CCIE
CCIE CCIE
CCIE
Lahcene Berkani
 
Cisco -Ccie rs quick_review_kit
Cisco -Ccie rs quick_review_kitCisco -Ccie rs quick_review_kit
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
Using Mikrotik Switch Features to Improve Your Network
Using Mikrotik Switch Features to Improve Your Network Using Mikrotik Switch Features to Improve Your Network
Using Mikrotik Switch Features to Improve Your Network
GLC Networks
 
Enterprise network design multi layer network and security.pptx
Enterprise network design multi layer network and security.pptxEnterprise network design multi layer network and security.pptx
Enterprise network design multi layer network and security.pptx
bipinbhattarai12
 
5 продвинутых технологий Cisco, которые нужно знать
5 продвинутых технологий Cisco, которые нужно знать5 продвинутых технологий Cisco, которые нужно знать
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
abdenour boussioud
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
Switching techniques in networking and uses
Switching techniques in networking and usesSwitching techniques in networking and uses
Switching techniques in networking and uses
lochanraj1
 
linux networking laboratory presentation .pptx
linux networking laboratory presentation .pptxlinux networking laboratory presentation .pptx
linux networking laboratory presentation .pptx
AnuradhaJadiya1
 
Ccna3 mod9-vtp
Ccna3 mod9-vtpCcna3 mod9-vtp
Ccna3 mod9-vtp
Milton Bengui
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In Kubernetes
Don Jayakody
 
packet traveling (pre cloud)
packet traveling (pre cloud)packet traveling (pre cloud)
packet traveling (pre cloud)
iman darabi
 
Geep networking stack-linuxkernel
Geep networking stack-linuxkernelGeep networking stack-linuxkernel
Geep networking stack-linuxkernel
Kiran Divekar
 
Hp a5500
Hp a5500Hp a5500
Hp a5500
Michel Hidalgo
 
Huawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdfHuawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
12 ethernet-wifi
12 ethernet-wifi12 ethernet-wifi
12 ethernet-wifi
Olivier Bonaventure
 
Mikrotik Bridge Deep Dive
Mikrotik Bridge Deep DiveMikrotik Bridge Deep Dive
Mikrotik Bridge Deep Dive
GLC Networks
 
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-NetzwerkstackL2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
CCIE_RS_Quick_Review_Kit
CCIE_RS_Quick_Review_KitCCIE_RS_Quick_Review_Kit
CCIE_RS_Quick_Review_Kit
Chris S Chen
 
Cisco -Ccie rs quick_review_kit
Cisco -Ccie rs quick_review_kitCisco -Ccie rs quick_review_kit
Cisco -Ccie rs quick_review_kit
Stoyan Stoyanov
 
Using Mikrotik Switch Features to Improve Your Network
Using Mikrotik Switch Features to Improve Your Network Using Mikrotik Switch Features to Improve Your Network
Using Mikrotik Switch Features to Improve Your Network
GLC Networks
 
Enterprise network design multi layer network and security.pptx
Enterprise network design multi layer network and security.pptxEnterprise network design multi layer network and security.pptx
Enterprise network design multi layer network and security.pptx
bipinbhattarai12
 
5 продвинутых технологий Cisco, которые нужно знать
5 продвинутых технологий Cisco, которые нужно знать5 продвинутых технологий Cisco, которые нужно знать
5 продвинутых технологий Cisco, которые нужно знать
SkillFactory
 
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
Best practices for catalyst 4500 4000, 5500-5000, and 6500-6000 series switch...
abdenour boussioud
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
HungWei Chiu
 
Switching techniques in networking and uses
Switching techniques in networking and usesSwitching techniques in networking and uses
Switching techniques in networking and uses
lochanraj1
 
linux networking laboratory presentation .pptx
linux networking laboratory presentation .pptxlinux networking laboratory presentation .pptx
linux networking laboratory presentation .pptx
AnuradhaJadiya1
 
Packet Walk(s) In Kubernetes
Packet Walk(s) In KubernetesPacket Walk(s) In Kubernetes
Packet Walk(s) In Kubernetes
Don Jayakody
 
packet traveling (pre cloud)
packet traveling (pre cloud)packet traveling (pre cloud)
packet traveling (pre cloud)
iman darabi
 
Geep networking stack-linuxkernel
Geep networking stack-linuxkernelGeep networking stack-linuxkernel
Geep networking stack-linuxkernel
Kiran Divekar
 
Huawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdfHuawei_HCNA_Routing_and_Switching.pdf
Huawei_HCNA_Routing_and_Switching.pdf
PauloDiniz60
 
Mikrotik Bridge Deep Dive
Mikrotik Bridge Deep DiveMikrotik Bridge Deep Dive
Mikrotik Bridge Deep Dive
GLC Networks
 
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-NetzwerkstackL2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
L2/L3 für Fortgeschrittene - Helle und dunkle Magie im Linux-Netzwerkstack
Maximilan Wilhelm
 
introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack introduction to linux kernel tcp/ip ptocotol stack
introduction to linux kernel tcp/ip ptocotol stack
monad bobo
 
Ad

More from Thomas Graf (16)

BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Thomas Graf
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
Thomas Graf
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
Thomas Graf
 
BPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable DatapathBPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable Datapath
Thomas Graf
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
Thomas Graf
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services
Thomas Graf
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Thomas Graf
 
SDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center NetworkingSDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center Networking
Thomas Graf
 
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
BPF  & Cilium - Turning Linux into a Microservices-aware Operating SystemBPF  & Cilium - Turning Linux into a Microservices-aware Operating System
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
Thomas Graf
 
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and SecurityCilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Thomas Graf
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Thomas Graf
 
Cilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPFCilium - API-aware Networking and Security for Containers based on BPF
Cilium - API-aware Networking and Security for Containers based on BPF
Thomas Graf
 
Cilium - Network security for microservices
Cilium - Network security for microservicesCilium - Network security for microservices
Cilium - Network security for microservices
Thomas Graf
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
Thomas Graf
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
Thomas Graf
 
BPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable DatapathBPF: Next Generation of Programmable Datapath
BPF: Next Generation of Programmable Datapath
Thomas Graf
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
Thomas Graf
 
Cilium - BPF & XDP for containers
Cilium - BPF & XDP for containersCilium - BPF & XDP for containers
Cilium - BPF & XDP for containers
Thomas Graf
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
Thomas Graf
 
LinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVSLinuxCon 2015 Stateful NAT with OVS
LinuxCon 2015 Stateful NAT with OVS
Thomas Graf
 
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Taking Security Groups to Ludicrous Speed with OVS (OpenStack Summit 2015)
Thomas Graf
 
2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services2015 FOSDEM - OVS Stateful Services
2015 FOSDEM - OVS Stateful Services
Thomas Graf
 
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RCThe Next Generation Firewall for Red Hat Enterprise Linux 7 RC
The Next Generation Firewall for Red Hat Enterprise Linux 7 RC
Thomas Graf
 
SDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center NetworkingSDN & NFV Introduction - Open Source Data Center Networking
SDN & NFV Introduction - Open Source Data Center Networking
Thomas Graf
 
Ad

Recently uploaded (20)

European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 
European Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility TestingEuropean Accessibility Act & Integrated Accessibility Testing
European Accessibility Act & Integrated Accessibility Testing
Julia Undeutsch
 
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
GDG Cloud Southlake #43: Tommy Todd: The Quantum Apocalypse: A Looming Threat...
James Anderson
 
Let’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack CommunityLet’s Get Slack Certified! 🚀- Slack Community
Let’s Get Slack Certified! 🚀- Slack Community
SanjeetMishra29
 
Gihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai TechnologyGihbli AI and Geo sitution |use/misuse of Ai Technology
Gihbli AI and Geo sitution |use/misuse of Ai Technology
zainkhurram1111
 
Securiport - A Border Security Company
Securiport  -  A Border Security CompanySecuriport  -  A Border Security Company
Securiport - A Border Security Company
Securiport
 
Agentic AI - The New Era of Intelligence
Agentic AI - The New Era of IntelligenceAgentic AI - The New Era of Intelligence
Agentic AI - The New Era of Intelligence
Muzammil Shah
 
Cyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptxCyber Security Legal Framework in Nepal.pptx
Cyber Security Legal Framework in Nepal.pptx
Ghimire B.R.
 
Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...Cyber security cyber security cyber security cyber security cyber security cy...
Cyber security cyber security cyber security cyber security cyber security cy...
pranavbodhak
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AI Emotional Actors:  “When Machines Learn to Feel and Perform"AI Emotional Actors:  “When Machines Learn to Feel and Perform"
AI Emotional Actors: “When Machines Learn to Feel and Perform"
AkashKumar809858
 
Supercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMsSupercharge Your AI Development with Local LLMs
Supercharge Your AI Development with Local LLMs
Francesco Corti
 
Evaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical ContentEvaluation Challenges in Using Generative AI for Science & Technical Content
Evaluation Challenges in Using Generative AI for Science & Technical Content
Paul Groth
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
LSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection FunctionLSNIF: Locally-Subdivided Neural Intersection Function
LSNIF: Locally-Subdivided Neural Intersection Function
Takahiro Harada
 
AI Trends - Mary Meeker
AI Trends - Mary MeekerAI Trends - Mary Meeker
AI Trends - Mary Meeker
Razin Mustafiz
 
Palo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity FoundationPalo Alto Networks Cybersecurity Foundation
Palo Alto Networks Cybersecurity Foundation
VICTOR MAESTRE RAMIREZ
 
Grannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI ExperiencesGrannie’s Journey to Using Healthcare AI Experiences
Grannie’s Journey to Using Healthcare AI Experiences
Lauren Parr
 
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Agentic AI Explained: The Next Frontier of Autonomous Intelligence & Generati...
Aaryan Kansari
 
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure ModesCognitive Chasms - A Typology of GenAI Failure Failure Modes
Cognitive Chasms - A Typology of GenAI Failure Failure Modes
Dr. Tathagat Varma
 

LinuxCon 2015 Linux Kernel Networking Walkthrough

  • 1. Kernel Networking Walkthrough LinuxCon 2015, Seattle Thomas Graf Kernel & Open vSwitch Team Noiro Networks (Cisco)
  • 2. Agenda ● Getting packets from/to the NIC ● NAPI, Busy Polling, RSS, RPS, XPS, GRO, TSO ● Packet processing ● RX Handler, IP Processing, TCP Processing, TCP Fast Open ● Queuing from/to userspace ● Socket Buffers, Flow Control, TCP Small Queues ● Q&A
  • 3. Touring the Network Stack Expectation Reality
  • 4. How does a packet get in and out of the Network Stack?
  • 5. Receive & Transmit Process Ring Buffer DMA Parse L2 & IP Parse TCP/UDP Socket Buffer Task / Container read() Ring Buffer Construct IP Construct TCP/UDP Local? Socket Buffer Forward Route? write() NIC Network Stack (Kernel Space) Process (User Space)
  • 6. The 3 ways into the Network Stack Ring Buffer Network Stack Interrupt Driven A Ring Buffer Network Stack NAPI based Polling poll() B Ring Buffer Network Stack Busy Polling busy_poll() Task C
  • 7. RSS – Receive Side Scaling ● NIC distributes packets across multiple RX queues allowing for parallel processing. ● Separate IRQ per RX queue, thus selects CPU to run hardware interrupt handler on. RX-queue-1 RX-queue-2 RX-queue-3 RX-queue-4 CPU 1 CPU 2 CPU 1 CPU 2 filter
  • 8. RPS – Receive Packet Steering ● Software filter to select CPU # for processing ● Use it to ... RX-queue-1 RX-queue-2 RX-queue-3 RX-queue-4 CPU 1 CPU 2 CPU 3 CPU 1 CPU 2 CPU 3 ... redo queue - CPU mapping ... distribute single queue to multiple CPUs
  • 9. Hardware Offload ● RX/TX Checksumming ● Perform CPU intensive checksumming in hardware. ● Virtual LAN filtering and tag stripping ● Strip 802.1Q header and store VLAN ID in network packet meta data. ● Filter out unsubscribed VLANs. ● Segmentation Offload
  • 10. Generic Receive Offload (ethtool -K eth0 gro on) Ring Buffer Network Stack poll() NAPI based GRO MTU GRO Up to 64K It's more effective to process 1x64K bytes packet instead of 40x1500 bytes packets.
  • 11. Segmentation Offload (ethtool -K eth0 tso on) (ethtool -K eth0 gso on) Ring Buffer Network Stack Generic Segmentation Offload (GSO) ethtool -K eth0 gso on MTU TCP Segmentation Offload (TSO) ethtool -K eth0 tso on MTU Up to 64K
  • 12. How does a packet get through the Network Stack? (c) Karen Sagovac
  • 13. Packet Processing Link Layer Ingress QoS Proto Handler IPv4 IPv6 ARP IPX ... Drop The Feast! RX Handler Open vSwitch Team Bonding Bridge macvlan macvtap Packet Socket ETH_P_ALL tcpdump
  • 14. IP Processing IP Handler Route Lookup PREROUTING IPv4 Construction Route Lookup Local Output OUTPUT POSTROUTINGLink Layer FORWARD Forwarding L4 (TCP, ...) Local Delivery INPUT User Space
  • 15. TCP Processing IP Socket Filter Receive TCP Parse TCP Lookup Socket Backlog socket locked Receive Socket Buffer Prequeue task exists process context ← softirq Task poll()read()
  • 16. TCP Fast Open (net.ipv4.tcp_fastopen) 2nd Req SYN SYN+ACK ACK+HTTP GET Data 2x RTT SYN+Cookie+HTTP GET SYN+ACK+Data 2nd Req 1x RTT Client Server SYN SYN+ACK ACK+HTTP GET 1st Req Data 2x RTT2x RTT Regular Client Server SYN SYN+ACK+Cookie ACK+HTTP GET 1st Req Data 2x RTT Fast Open
  • 17. Memory Accounting & Flow Control
  • 18. Socket Buffers & Flow Control (net.ipv4.tcp_{r|w}mem) ssh TX Ring Buffer TCP/IP Socket Buffer wmem overlimit? Block or EWOULDBLOCK wmem += packet-size ssh RX Ring Buffer TCP/IP Socket Buffer rmem -= packet-size rmem overlimit? Reduce TCP Window rmem += packet-size wmem -= packet-size write()
  • 19. TCP Small Queues (net.ipv4.tcp_limit_output_bytes) ssh TX Ring Buffer Driver TCP/IP Socket Buffer write() Queuing Discipline torrent Socket Buffer write() TSQ: max 128Kb in flight per socket