0% found this document useful (0 votes)
3 views7 pages

A Performance Perspective on Web Optimized Protocol Stacks TCP+TLS+HTTP2 vs QUIC

This paper compares the performance of QUIC against an optimized TCP stack, highlighting that while QUIC generally outperforms TCP due to its reduced RTT design and ability to avoid head-of-line blocking, tuning TCP parameters can significantly enhance its performance. The study emphasizes the importance of using tuned TCP configurations for a fair comparison and demonstrates that QUIC's advantages are particularly pronounced in lossy network conditions. Additionally, the research introduces the Mahimahi framework for reproducible QUIC research and evaluates web performance metrics that better correlate with user perception.

Uploaded by

ettmustapha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views7 pages

A Performance Perspective on Web Optimized Protocol Stacks TCP+TLS+HTTP2 vs QUIC

This paper compares the performance of QUIC against an optimized TCP stack, highlighting that while QUIC generally outperforms TCP due to its reduced RTT design and ability to avoid head-of-line blocking, tuning TCP parameters can significantly enhance its performance. The study emphasizes the importance of using tuned TCP configurations for a fair comparison and demonstrates that QUIC's advantages are particularly pronounced in lossy network conditions. Additionally, the research introduces the Mahimahi framework for reproducible QUIC research and evaluates web performance metrics that better correlate with user perception.

Uploaded by

ettmustapha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

A Performance Perspective on Web Optimized

Protocol Stacks: TCP+TLS+HTTP/2 vs. QUIC


Konrad Wolsing, Jan Rüth, Klaus Wehrle, Oliver Hohlfeld∗
RWTH Aachen University, Germany
{wolsing,rueth,wehrle,hohlfeld}@comsys.rwth-aachen.de
ABSTRACT TLS 1.3 early-data [24] or TCP Fast Open [23]), the ossifica-
Existing performance comparisons of QUIC and TCP com- tion around the initial designs challenges their deployment.
pared an optimized QUIC to an unoptimized TCP stack. By QUIC [15] (as used in HTTP/3) combines the concepts of
arXiv:1906.07415v1 [cs.NI] 18 Jun 2019

neglecting available TCP improvements inherently included TCP, TLS, HTTP/2, tightly coupled into a new protocol that
in QUIC, comparisons do not shed light on the performance enables to utilize cross-layer information and to evolve with-
of current web stacks. In this paper, we can show that tuning out ossification. While it fixes some of TCP’s shortcomings
TCP parameters is not negligible and directly yields signif- like head-of-line blocking when used with HTTP, its design,
icant improvements. Nevertheless, QUIC still outperforms in the first place, should enable evolution.
even our tuned variant of TCP. This performance advantage A number of studies showed that QUIC outperforms the
is mostly caused by QUIC’s reduced RTT design during con- classical TCP-based stack [2, 7, 8, 13, 17, 30]—that is by com-
nection establishment, and, in case of lossy networks due to paring QUIC to an unoptimized TCP-based stack; a limitation
its ability to circumvent head-of-line blocking. that we address in this paper. Current QUIC implementations
were specifically designed and parameterized for the Web. In
CCS CONCEPTS contrast, stock TCP implementations, as in the Linux kernel,
are not specialized and are built to perform well on a large
• Networks → Network measurement;
set of devices, networks, and workloads. However, we have
ACM Reference Format: shown [26] that large content providers fine-tune their TCP
Konrad Wolsing, Jan Rüth, Klaus Wehrle, and Oliver Hohlfeld. 2019. stacks (e.g., by tuning the initial window size) to improve
A Performance Perspective on Web Optimized Protocol Stacks: content delivery. All studies known to us neglect this fact
TCP+TLS+HTTP/2 vs. QUIC. In ANRW ’19: Applied Networking and indeed compare an out-of-the-box TCP with a highly-
Research Workshop (ANRW ’19), July 22, 2019, Montreal, QC, Canada.
tuned QUIC Web stack and show that the optimized version
ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/334030
is superior. Furthermore, they often utilize simple Web per-
1.3341123
formance metrics like page load time (PLT) to reason about
the page loading speed, even though it is long known that
1 INTRODUCTION PLT does not correlate to user-perceived speeds [3, 14, 31].
The advancement of Web application and services resulted In this paper, we seek to close this gap by parameteriz-
in an ongoing evolution of the Web protocol stack. Driving ing TCP similar to QUIC to enable a fair comparison. This
reasons are security and privacy or the realization of latency- includes increasing the initial congestion window, enabling
sensitive Web services. Today, the typical Web stack involves pacing, setting no slow start after idle, and tuning the kernel
using HTTP/2 over TLS over TCP, making it practically one buffers to match QUIC’s defaults. We further enable BBR
(ossified) protocol. While parts of the protocols have been instead of the CUBIC as the congestion control algorithm in
designed to account for the others, this protocol stacking still one scenario. We show that this previously neglected tuning
suffers from inefficiencies, e.g., head-of-line blocking. Even of TCP impacts its performance. We find that for broadband
though protocol extensions promise higher efficiency (e.g., access, QUIC’s RTT-optimized connection establishment in-
∗ Is
deed increases the loading speed, but otherwise compares to
now at Brandenburg University of Technology
TCP. If optimizations such as TLS 1.3 early-data or TCP Fast
Open were deployed, QUIC and TCP would compare well. In
ANRW ’19, July 22, 2019, Montreal, QC, Canada
© 2019 Copyright held by the owner/author(s). Publication rights licensed
lossy networks, QUIC clearly outperforms the current Web
to ACM. stack, which we mainly attribute to its ability to progress
This is the author’s version of the work. It is posted here for your personal streams independently of head-of-line blocking. Our com-
use. Not for redistribution. The definitive Version of Record was published parison is based on visual Web performance metrics that
in ANRW ’19: Applied Networking Research Workshop (ANRW ’19), July 22, better correlate to human perception than traditionally used
2019, Montreal, QC, Canada, https://doi.org/10.1145/3340301.3341123.
loading times. To evaluate real-world websites, we extend
ANRW ’19, July 22, 2019, Montreal, QC, Canada Wolsing and Rüth, et al.

the Mahimahi framework to utilize the Google QUIC Web by a single server only [2, 7, 17]. To study realistic Web sites,
stack to perform reproducible comparisons between TCP and the Mahimahi framework [21] was designed to replicate this
QUIC on a large scale of settings. This work does not raise multi-server nature of current websites into a testbed (see
any ethical issues and makes the following contributions: Section 2.2). Nepomuceno et al. [20] perform a study with
• We provide the first study that performs an eye-level com- Mahimahi but find that QUIC is outperformed by TCP which
parison of TCP+TLS+HTTP/2 and QUIC. does not coincide with our and related work. We believe
• Our study highlights that QUIC can indeed outperform this is due to the use of the Caddy QUIC server, which is
TCP in a variety of settings but so does a tuned TCP. known to not (yet) perform very well [19]. Also, they did not
• Tuning TCP closes the gap to QUIC and shows that TCP configure any bandwidth limitations.
is still very competitive to QUIC.
• Our study further highlights the immense impact of choice 2.1 Web Performance Metrics
of congestion control, especially in lossy environments. We aim to evaluate the performance of a different protocol
• We add QUIC support to Mahimahi to enable reproducible stack on a broad set of standard Web performance metrics.
QUIC research. It replays real-world websites in a testbed Besides network characteristics like goodput or link utiliza-
subject to different protocols and network settings. tion as measured in [7, 30], Page Load Time (PLT) is the most
Structure. Section 2 examines related work, highlights the used metric. But PLT does not always match user-perceived
evaluation metrics and introduces to the Mahimahi frame- performance [3, 14, 31], e.g., it includes the loading perfor-
work. Section 3 explains our testbed, network configuration, mance of below-the-fold content that is not displayed and
and protocol considerations. Section 4 shows the results of thus not reflected in end-user perception. This is why we de-
the measurement. Finally, Section 5 concludes this paper. cide to focus more closely on state-of-the-art visual metrics
2 RELATED WORK AND BACKGROUND that are known to better correlate with human perception.
These metrics are derived from video recordings of the pages
QUIC is subject to a body of studies [2, 7, 8, 13, 17, 20, 30],
loading process above-the-fold as recommended by [5, 9].
most compare QUIC against some combination of TCP+TLS+-
Metrics of interest are the time of the First Visual Change
HTTP/1.1 or HTTP/2. But to the best of our knowledge, all
(FVC), Last Visual Change (LVC), and time the website reaches
use stock TCP configurations measuring a likely unoptimized
visual completeness of a desired threshold in percent. In our
TCP version to a QUIC version that inherently contains avail-
case, Visual Complete 85 (VC85), which corresponds to the
able TCP optimizations. Yu et al. [30] is the only study on
point in time measured from the navigation start when the
the impact of packet pacing for QUIC as a tuning option.
currently rendered website’s above-the-fold matches to 85%
However, no further comparison to TCP is made.
the final website picture. Only navigation start can be used
Generally, the related work can be divided into two cate-
as start point since visual metrics are derived from video
gories depending on their measurement approach. One body
recordings only (see Section 3.2 how we deal with DNS im-
of research [8, 17, 27] measures against websites hosted on
pacting the measurement). Lastly, we also take into account
public servers utilizing both QUIC and TCP—however, usu-
the Speed Index (SI) [11].
ally operated by Google. Thus, they do not have any access
to the servers, which makes tuning the protocol impossi-
2.2 Website Replay with Mahimahi
ble and the configurations in use are unknown. The second
body [2, 7, 13, 20] uses self-hosted servers, in principle al- Mahimahi [21] is a framework designed to replicate real-
lowing for tuning, however, none of them does so. world websites with their multi-server structure in a testbed.
One critical difference between TCP and QUIC is their It uses HTTP traffic recordings that are later replayed. Mahi-
connection establishment since QUIC by design needs fewer mahi preserves the multi-server nature with the help of
RTTs than traditional TCP+TLS until actual website payload virtualized Web servers. Mahimahi is built upon multiple
can be exchanged. Cook et al. [8] already take into account shell commands that can be stacked to create a virtual net-
that there is a difference between first and repeated con- work. Each shell allows for modifying a single aspect of the
nections that require each one less RTT for both protocols. traversing network flow, e.g., generating loss or limiting the
Nevertheless, QUIC still has a one RTT advantage in both bandwidth. Mahimahi yields realistic conditions for perfor-
connections, repeated as well as first, and again this fact is mance measurements [21]. This way, it enables repeatable
not dealt with any further. and controllable studies with real-world websites.
Since today’s websites consist of various resources hosted
by several providers, many connections to different servers 3 TESTBED SETUP
are established even for fetching a single website. Many stud- We will now continue to explain how we design our testbed
ies consider websites with varying resources but deployed to perform eye-level comparisons of TCP and QUIC.
A Performance Perspective on Web Optimized
Protocol Stacks: TCP+TLS+HTTP/2 vs. QUIC ANRW ’19, July 22, 2019, Montreal, QC, Canada
Protocol Description
6 30
Size [MB]

TCP Stock TCP (Linux): IW10, Cubic

IPs [#]
4 20
IW 32, Pacing, Cubic, tuned buffers,
2 10 TCP+
no slow start after idle
0 0
apache.org
TCP+BBR TCP+, but with BBR as congestion control

pinterest.com
youtube.com
wordpress.com

researchgate.net
w3.org

google.com
ed.gov

gov.uk

msn.com

intel.com
gnu.org

spotify.com
bit.ly
imgur.com
etsy.com

imdb.com

facebook.com
reddit.com

github.com

nytimes.com
telegraph.com

phpbb.com

gravatar.com

statcounter.com

dotdash.com

nature.com

harvard.edu

sciencemag.org
opera.com

joomla.com

vtm.be
sciencedirect.com

academia.edu

demorgen.be

columbia.edu
canvas.be
wikipedia.org

QUIC Stock Google QUIC: IW 32, Pacing, Cubic


QUIC+BBR QUIC, but with BBR as congestion control
Table 1: Protocol configuration used in our tests.
Figure 1: This figure depicts the download size of the
replayed websites (blue) and the number of unique IP Enforcing QUIC or TCP+TLS1.3+H2. We want to be sure
addresses that need to be contacted for resources (red). that only QUIC or TCP is used. On the one hand, we accom-
plish this using Chrome flags, to enforce QUIC, we set “–
3.1 Selecting and Recording Websites enable-quic –origin-to-force-quic-on=*” and “–disable-quic”
Websites. We want to choose websites that replicate a real- for TCP respectively. On the other hand, the QUIC and NG-
world picture of commonly used websites. The goal is to INX servers never run at the same time. In the TCP case,
obtain a small set of domains diverse in size, resources, and each request is performed over TLS1.3 and HTTP/2. There
involved servers. As there is no standard test set of such are no resources that get transmitted unencrypted.
website, we use the domain collection from [29] consisting Protocol Tuning. To allow for a fair comparison between
of 40 different websites from which we had to exclude two. TCP and QUIC, we tune the stock TCP stack of a Linux ker-
One domain is a private project website and the other failed nel to more closely match QUIC’s defaults. This is done by
to record and reply properly. The domains originate from increasing the initial window to 32 segments, enabling pac-
the Alexa [1] and Moz [18] ranking lists and were chosen in ing, setting no slow start after idle and tuning the kernel
a way to obtain a good distribution of page size and resource buffers. QUIC by default also uses an initial window of 32 and
counts [29], see Figure 1. The bars in red suggest the ma- pacing. Since we expect the employed congestion control
jority of our tested sites to use multi-server infrastructures, algorithm to significantly impact the measured performance,
highlighting the relevance of replicating it with Mahimahi. we incorporated one scenario for TCP and QUIC each utiliz-
Recording. Downloading of the websites was not performed ing BBR [6] instead of CUBIC [12]. An overview of the five
with the tool provided by Mahimahi. Instead, we utilize Mitm- protocol configurations is shown in Table 1.
proxy with a custom script that dumps the raw HTTP re- TCP Fast Open [23] and TLS1.3 early-data [24] are two
sponses of the server to disk. According to the Google QUIC possible options to tune TCP/TLS even further. We decided
server specification [10] the transfer-encoding header must against both techniques because of the following reasons.
be removed if its value is chuncked. The same holds for the TLS1.3 early-data was not supported by the Chrome browser
alternate-protocol header for any value. Other than that, the at the time of the measurement and as it is prone to replay
recorded HTTP responses remain unchanged. attacks requires idempotency which further challenges its
In post-processing, few resource files needed to be down- widespread use. TCP Fast Open is not widely deployed on
loaded additionally, since, e.g., the header of the github.com the Internet today [16, 22]. Moreover, we always measure the
website loads a random image from a fixed collection via website performance with a fresh browser and clean caches,
JavaScript. thus QUIC has to perform an extra RTT for connection es-
tablishment as well and does not use 0-RTT connections.
3.2 Replaying with Mahimahi Network Settings. For network emulation, the built-in tools
Mahimahi. To support a state-of-the-art QUIC in Mahimahi, from Mahimahi are used. We stack the following three net-
we include Google’s QUIC server from the Chromium sources work parameters from server to client with Mahimahi shells.
utilizing QUIC Version 43. For TLS1.3 and HTTP/2, we re- First, a packet gets delayed in either direction, both adding
place the Mahimahi default Apache server with NGINX. All up to the desired minimum latency. Second, the link shell
NGINX servers forward the requests to a single uWSGI proxy implements a drop-tail buffer limiting the throughput per
server that provides the previously recorded HTTP responses direction. Finally, the loss shell drops packets at random for
from main memory. Similarly, the QUIC servers use their both directions equally. The loss is configured, such that
built-in feature loading all responses from a folder into mem- the chance for two packets, e.g., request and response, get-
ory. Finally, we create a self-signed certificate authority (CA) ting transmitted successfully equals 1 − p with p being the
and incorporate it to the Chrome browser’s list of trusted desired loss rate. The implemented values are shown in Ta-
CAs to circumvent any authentication errors. ble 2. Bandwidth and delay values for DSL and LTE are taken
ANRW ’19, July 22, 2019, Montreal, QC, Canada Wolsing and Rüth, et al.

Network Uplink Downlink Delay Loss number of runs/videos manageable. We utilize the Browser-
DSL 5 Mbps 25 Mbps 24 ms 0.0 % time [28] framework to instrument the browser. It records
LTE 2.8 Mbps 10.5 Mbps 74 ms 0.0 % videos of the loading process that we subsequently evaluate
DA2GC .468 Mbps .468 Mbps 262 ms 3.3 % for the visual metrics. For each run, Browsertime opens up a
MSS 1.89 Mbps 1.89 Mbps 760 ms 6.0 % fresh Chromium browser Version 70.0.3538.77. In total, this
Table 2: Network configurations. Queue size is set to leads to 760 configurations (38 domains, 4 network, and 5
200 ms except for DSL with 12 ms. protocol settings). We validated that each run completed
successfully by reviewing the video recordings manually.

QUIC 10MB
NGINX 10MB
4 QUIC VS. TCP PERFORMANCE
QUIC 1MB We evaluate the performance difference with all metrics in
NGINX 1MB the different network settings (across all tested websites)
QUIC 10KB
by means of a performance gain. The following equation
NGINX 10KB
QUIC 2B explains the calculation of the performance gain between a
NGINX 2B reference protocol, e.g., TCP and a protocol to compare with
0 100 200 300 400 500 like QUIC. X correspond to the mean over the 31 runs.
Download time [ms]
Figure 2: Boxplot of server download speeds in our X QU IC − X T C P
performance gainTQU IC =
CP
testbed (31 repetitions and no bandwidth limitation). XTCP
If not stated otherwise, numbers provided in the text are
from [4], we assume no additional loss here. The last two mean performance gains over all websites for SI. Besides com-
networks emulate slow links measured from in-flight WLAN paring means we also utilize an ANOVA test to tell whether
services [25]. Except for the DSL link with 12ms maximal there is a statistically significant difference in the distribution
queueing delay, we assume rather bloated buffers of 200 ms. of the 31 runs of two protocols. If the ANOVA test for two
Thus, our configured delay is the minimum delay and queu- settings is p < 0.01 (significance level), we count the setting
ing further adds jitter up to the configured buffer size. with the lower mean as significantly faster otherwise no con-
Validation. Before conducting measurements, we validate clusion can be drawn. The results of our measurements are
the implemented testbed regarding the network and protocol depicted in Figure 3. We show the CDFs of the performance
parameters ensuring the correct protocol choice. We found gain of the different metrics comparing stock TCP to the
that the Chromium browser’s DNS timeout of 5 s signifi- other protocol stacks. LVC is left out in this figure because
cantly distorts a measurement when a DNS packet is lost in contrast to PLT there is no relevant difference visible.
and thus moved the DNS server such that no traffic shaping DSL and LTE. For the lossless DSL and LTE scenarios, the
is applied to DNS traffic. Moreover, Figure 2 shows that both protocols separate into two groups both yielding similar per-
server variants yield similar performance for files ≤ 1 MB. formance gains. TCP+ (DSL: -0.05TCP+ TCP , LTE: -0.08TCP ) and
TCP+

This suggests that our results are not biased by the servers’ TCP+BBR (DSL: -0.05TCP , LTE: -0.09TCP ) perform al-
TCP+BBR TCP+BBR

implementations. For this test, we repeated 31 downloads most indistinguishable but against TCP, there is a notice-
of a single file with the Chromium browser under static net- able improvement visible throughout all metrics. Similarly,
work conditions—only 10 ms minimum delay, no loss, and no QUIC (DSL: -0.09QUIC QUIC
TCP+ , LTE: -0.14TCP+ ) and QUIC+BBR (DSL:
bandwidth limits. The gap between NGINX and QUIC server QUIC+BBR QUIC+BBR
-0.09TCP+BBR , LTE: -0.13TCP+BBR ) perform equally but are still
emerging at a file size of 10 MB is not relevant since our quite a bit faster than the two tuned TCP variants. For these
website sizes are much smaller (see Figure 1). Independent two networks, the congestion control choice does not make a
resources are even smaller, the largest being 4 MB. significant difference, which is likely due to the small queue.
Stock TCP indeed lags behind all other protocols show-
3.3 Performing Measurements ing that stock TCP should not be used to compare against
The actual measurements are performed inside a virtual ma- QUIC here. QUIC achieves to decrease the average SI by
chine equipped with 6 cores and 8 GB of memory running -131.3 msQUIC QUIC
TCP (DSL) and -344.9 msTCP (LTE), but also against
Arch Linux kernel Version 4.18.16. To measure a single set- TCP+ by still -87.1 msTCP+ (DSL) and -215.9msQUIC
QUIC
TCP+ (LTE).
ting consisting of one website, network, and protocol config- In a second step, we take a look at the ANOVA test re-
uration, a Mahimahi replay shell with the described network sults focussing on DSL (LTE yields equivalent results). When
stack is used. A single setting gets measured over 31 runs to comparing the runs of TCP+ and QUIC in DSL with PLT
gain statistical significance and at the same time keep the as the metric with each other, 30 of the 38 websites yield
A Performance Perspective on Web Optimized
Protocol Stacks: TCP+TLS+HTTP/2 vs. QUIC ANRW ’19, July 22, 2019, Montreal, QC, Canada
Network DSL Network MSS
1.00 FVC SI VC85 PLT 1.00 FVC SI VC85 PLT
0.75 0.75
CDF

CDF
0.50 0.50
0.25 0.25
0.00 0.00
-.5 -.25 0 -.5 -.25 0 -.5 -.25 0 -.5 -.25 0 -.6 -.3 0 .3 -.6 -.3 0 .3 -.6 -.3 0 .3 -.6 -.3 0 .3
Network LTE Network DA2GC
1.00 FVC SI VC85 PLT 1.00 FVC SI VC85 PLT
0.75 0.75
CDF

CDF
0.50 0.50
0.25 0.25
0.00 0.00
-.5 -.25 0 -.5 -.25 0 -.5 -.25 0 -.5 -.25 0 -.6 -.3 0 .3 -.6 -.3 0 .3 -.6 -.3 0 .3 -.6 -.3 0 .3
QUIC QUIC+BBR TCP TCP+ TCP+BBR

Figure 3: CDF of the performance gain over all websites with TCP as reference protocol. If the performance gain
is < 0 (left side of plot) then the compared protocol is faster than TCP.

a significant improvement with QUIC. For the remaining 8 and mostly better than TCP+BBR. But as the loading pro-
websites, none was significantly faster than TCP+. For SI cess commences QUIC+BBR outperforms QUIC slightly, e.g.,
even 32 are faster and only 6 show no significant difference. -1828.3 msQUIC+BBR
QUIC better SI. QUIC with CUBIC, nevertheless,
Similar results can be seen when comparing QUIC+BBR with is reasonably fast being still a legit option to use. The shape
TCP+BBR this way. For TCP+ and TCP in the same scenario of the performance gain CDFs of QUIC+BBR and TCP+BBR
with PLT as the metric, 25 websites are faster with TCP+, are very similar especially for PLT highlighting the influence
for 12 there is no significant difference and only 1 website of the congestion control once again. We believe that QUIC
was significantly slower. Again when comparing TCP+BBR with CUBIC is still competitive due to QUIC’s ability to cir-
with TCP+ and similarly QUIC+BBR with QUIC for DSL and cumvent head-of-line blocking and its large SACK ranges.
LTE throughout all metrics, we find for the majority of the For the MSS network, QUIC reduces the SI by -8364.8 msQUIC TCP+
websites no difference. These results line up with the results (avg.) compared to TCP+ and by -2091.5 msQUIC+BBR
TCP+BBR when tak-
shown in Figure 3. Moreover, the steep incline of the CDFs ing both BBR protocols into account.
for QUIC and TCP+ indicate that the website size or struc- The last network, DA2GC, also has a high loss rate (3.3 %)
ture seems to have little influence on the achievable gain. but a much lower bandwidth. This is the only scenario where
Only looking at SI and VC85, we see a small percentage of we observe no significant difference for most websites among
measurements where QUIC has a significantly higher gain. all TCP configurations even with the ANOVA test. We also
In-Flight Wifi. For the networks MSS and DA2GC, the see that in a small fraction of our measurements stock TCP
overall picture is quite similar—meaning QUIC as well as outperforms QUIC and the tuned TCP variants. Nevertheless,
QUIC+BBR, are usually faster than TCP+ (MSS: -0.36QUIC TCP+ , again the QUIC variants are generally significantly faster
DA2GC: -0.14QUICTCP+ ) and TCP+BBR (MSS: -0.18 QUIC+BRR
TCP+BBR , DA2GC: with a higher performance gain at the FVC (e.g., -0.14QUIC TCP+ )
-0.10QUIC+BBR
TCP+BBR ). But there are some important differences, for that persists towards the PLT (e.g., -0.16QUIC
TCP+ ). The choice of
the MSS link with the highest loss rate (6 %), TCP+BBR op- the congestion control algorithm does not seem to have a
erates much better than TCP+ (-0.26TCP+BBR
TCP+ ). Since BBR does significant impact here likely due to the low bandwidth. Only
not use loss as a congestion signal it increases its rate re- for PLT we find QUIC with CUBIC to be slightly superior
gardless of this random loss. This means that in this case, over QUIC with BBR. There is not a single website where
the choice in congestion control has a greater impact on the QUIC+BBR yields a significantly faster performance. The
performance than the protocol choice itself. At the time of SI decreases with QUIC by -2632.5 msQUIC TCP+ vs. TCP+ and by
the FVC, TCP+BBR is already -2866.2 ms (avg.) quicker than -1372.5 msQUIC+BBR
TCP+BRR for BBR.
TCP+ but with each later metric, the gap widens so that at Discussing Metrics. Some of the websites exhibit very poor
PLT, TCP+BBR can keep up the pace even against QUIC and performance regarding the visual metrics VC85 and SI. We
is 11395.4 ms (0.21×) quicker. This shows that TCP with BBR observe this behavior especially for the DA2GC network with
needs some time to catch up and thus affects the FVC much performance gains of up to +1.0 compared to stock TCP (not
more than the later PLT. For the QUIC protocols, the picture shown, plots cropped for readability). The reason for these
is similar. At first, QUIC and QUIC+BBR are similarly fast outliers is that the protocol choice has such a substantial
ANRW ’19, July 22, 2019, Montreal, QC, Canada Wolsing and Rüth, et al.

Net Website Metric [ms] [RTT]


DSL gnu.org FVC 0.5 0.020
DSL wikipedia.org FVC -8.2 -0.341
DSL gnu.org PLT 1.6 0.066
DSL wikipedia.org PLT -3.1 -0.128
LTE gnu.org FVC 0.6 0.008
LTE wikipedia.org FVC -40 -0.538
LTE gnu.org PLT -30 -0.412
LTE wikipedia.org PLT -13 -0.175
MSS gnu.org FVC -196 -0.258
MSS wikipedia.org FVC -412 -0.542
MSS gnu.org PLT -1100 -1.447
Figure 4: Screenshot during the loading process of the MSS wikipedia.org PLT -529 -0.696
nytimes.com website. Left TCP right QUIC. QUIC in DA2GC gnu.org FVC -130 -0.497
comparison delays a top banner leading to bad scores DA2GC wikipedia.org FVC -1384 -5.283
DA2GC gnu.org PLT 39 0.150
in visual metrics compared to the final website. DA2GC wikipedia.org PLT -1005 -3.834
MSS gnu.org FVC -404 -0.532
MSS wikipedia.org FVC -143 -0.189
MSS gnu.org PLT -477 -0.628
MSS wikipedia.org PLT 451 0.593
impact on some websites that their resources load in different
orders resulting in very distinct rendering sequences. Table 3: Difference between the means over the 31
Figure 4 shows such a scenario exemplary for the ny- runs of QUIC and TCP+ when subtracting one RTT.
times.com website in the DA2GC network. Here TCP reaches Values <0 denote that QUIC was faster. The lower MSS
VC85 after ∼48s whereas QUIC needs ∼124s even though the table compares QUIC+BBR and TCP+BBR.
PLT for QUIC (∼141s) is much faster than for TCP (∼170s).
For TCP the upper part of the website loads comparably early impact, we consider also BBR here. Overall in MSS with BBR,
such that the lower elements are already rendered at their fi- the difference is also below of one RTT and for wikipedia.org
nal positions. In contrast to that QUIC manages to receive the and PLT even TCP+BBR is faster. Instead with DA2GC, the
lower contents first. Later, when the upper banner completes outcome is clearly for QUIC for the wikipedia.org website.
loading it shifts the whole website down. Therefore, VC85 Table 3 shows nicely that QUIC’s RTT reducing design clearly
fails to express this setting given the large shift. Similarly, improves the performance. Even though, TCP Fast Open and
SI is affected since it integrates over visual completeness TLS 1.3 early-data would close the gap, especially Fast Open
over time. Thus, it critically depends on the website, the remains challenging to deploy. Furthermore, having no head-
browser’s loading order, and a user’s preference for how a of-line blocking could still be a reason why in the majority
website should load to know which metric to use. of the cases QUIC is still slightly faster, especially, when the
Protocol Design Impact. Within our testbed, any TCP con- networks are lossy. We expect further improvements when
figuration needs to fulfill two complete RTTs before the using 0-RTT connection establishment with QUIC.
actual HTTP request can be sent out to the server—TCP
handshake plus TLS setup. In contrast, QUIC requires only 5 CONCLUSION
one RTT to do so—the first CHLO gets rejected by the server Comparisons between TCP and QUIC have often been bi-
since the server certificates are unknown to the client. We ased up until now. In this paper, we extended the Mahimahi
are interested in whether this 1 RTT difference can explain framework to support QUIC and perform reproducible per-
the remaining performance gap between QUIC and TCP+. formance measurements of 38 websites under different proto-
However, the complex interactions with multiple servers col and network scenarios. We show that tuning TCP param-
complicate an analysis since these connections are inter- eters has a tremendous impact on the results for performance
leaved simply subtracting 1 RTT is not possible. We, there- comparisons which can not be neglected when comparing
fore, take a look at two websites served only via a single IP TCP and QUIC. Yet, in many settings, QUIC’s performance
(see Figure 1): wikipedia.org and gnu.org. We subtract one is still superior but the gap gets narrower. Moreover, we find
RTT from the FVC, as the earliest metric and one RTT from that QUIC’s higher performance is caused mostly due to its
the PLT as the latest completing metric. Table 3 shows the superior design during the connection establishment. We
results in the different network settings for TCP+ and QUIC assume that besides the RTT reducing design, features like
and additionally for MSS using the BBR variants of both. no head-of-line blocking increase QUIC’s performance, es-
For DSL and LTE the corrected difference is below one pecially in lossy networks. In those lossy networks, we also
RTT and there are three cases where even TCP+ is slightly find that the choice of the congestion control algorithm has
faster now. For MSS in all cases with CUBIC as the congestion a much larger impact than the protocol itself. In our opinion,
control, QUIC is faster but only to a maximum of 1.4× RTT. QUIC is still the preferred protocol for the future Web since
Since within this network congestion control has a huge it paves the way for continuous evolution.
A Performance Perspective on Web Optimized
Protocol Stacks: TCP+TLS+HTTP/2 vs. QUIC ANRW ’19, July 22, 2019, Montreal, QC, Canada
ACKNOWLEDGMENTS Design and Internet-Scale Deployment. In ACM SIGCOMM. https:
//doi.org/10.1145/3098822.3098842
This work has been funded by the DFG as part of the CRC [16] Anna Maria Mandalari, Marcelo Bagnulo, and Andra Lutu. 2015. TCP
1053 MAKI and SPP 1914 REFLEXES. Fast Open: Initial Measurements. In ACM Conference on emerging
Networking EXperiments and Technologies (CoNEXT) Student Workshop.
REFERENCES https://www.simula.no/publications/tcp-fast-open-initial-measurem
ents.
[1] Alexa. 2019. Alexa Top 500 Global Sites. https://www.alexa.com/tops
[17] P. Megyesi, Z. Krämer, and S. Molnár. 2016. How quick is QUIC?.
ites.
In IEEE International Conference on Communications (ICC). https:
[2] Prasenjeet Biswal and Omprakash Gnawali. 2016. Does QUIC Make the
//doi.org/10.1109/ICC.2016.7510788
Web Faster?. In IEEE Global Communications Conference (GLOBECOM).
[18] Moz. 2019. Top Sites: The 500 Most Important Websites on the Internet.
https://doi.org/10.1109/GLOCOM.2016.7841749
https://moz.com/top500.
[3] Enrico Bocchi, Luca De Cicco, Marco Mellia, and Dario Rossi. 2017. The
[19] Ferdinand Mütsch. 2017. Caddy - a modern web server (vs. nginx).
Web, the Users, and the MOS: Influence of HTTP/2 on User Experience.
https://ferdinand-muetsch.de/caddy-a-modern-web-server-vs-
In Springer Passive and Active Measurement (PAM). https://doi.org/10
nginx.html.
.1007/978-3-319-54328-44
[20] K. Nepomuceno, I. N. d. Oliveira, R. R. Aschoff, D. Bezerra, M. S. Ito,
[4] Breitbandmessung. 2018. Breitbandmessung Ergebnisse als interaktive
W. Melo, D. Sadok, and G. Szabó. 2018. QUIC and TCP: A Performance
Darstellung. https://web.archive.org/web/20181115105855/https:
Evaluation. In IEEE Symposium on Computers and Communications
//breitbandmessung.de/interaktive-darstellung.
(ISCC). https://doi.org/10.1109/ISCC.2018.8538687
[5] Jake Brutlag, Zoe Abrams, and Pat Meenan. 2011. Above the Fold
[21] Ravi Netravali, Anirudh Sivaraman, Somak Das, Ameesh Goyal, Keith
Time: Measuring Web Page Performance Visually. In Velocity: Web
Winstein, James Mickens, and Hari Balakrishnan. 2015. Mahimahi:
Performance and Operations Conference. http://conferences.oreilly.co
Accurate Record-and-Replay for HTTP. In USENIX Annual Technical
m/velocity/velocity-mar2011/public/schedule/detail/18692.
Conference (ATC). https://www.usenix.org/conference/atc15/technical-
[6] N. Cardwell, Y. Cheng, C. S. Gunn, S. H. Yeganeh, and V. Jacobson.
session/presentation/netravali.
2016. BBR: Congestion-Based Congestion Control. ACM Queue 14, 5
[22] Christoph Paasch. 2016. Network support for TCP Fast Open. Presen-
(2016). https://doi.org/10.1145/3012426.3022184
tation at NANOG 67 (2016).
[7] Gaetano Carlucci, Luca De Cicco, and Saverio Mascolo. 2015. HTTP
[23] Sivasankar Radhakrishnan, Yuchung Cheng, Jerry Chu, Arvind Jain,
over UDP: An Experimental Investigation of QUIC. In ACM Symposium
and Barath Raghavan. 2011. TCP Fast Open. In ACM Conference on
on Applied Computing (SAC). https://doi.org/10.1145/2695664.2695706
emerging Networking EXperiments and Technologies (CoNEXT). https:
[8] S. Cook, B. Mathieu, P. Truong, and I. Hamchaoui. 2017. QUIC: Better
//doi.org/10.1145/2079296.2079317
for what and for whom?. In IEEE International Conference on Commu-
[24] E. Rescorla. 2018. The Transport Layer Security (TLS) Protocol Version
nications (ICC). https://doi.org/10.1109/ICC.2017.7997281
1.3. RFC 8446. RFC Editor. http://www.rfc-editor.org/rfc/rfc8446.txt
[9] Qingzhu Gao, Prasenjit Dey, and Parvez Ahammad. 2017. Perceived
[25] John P. Rula, James Newman, Fabián E. Bustamante, Arash Molavi
Performance of Top Retail Webpages In the Wild: Insights from Large-
Kakhki, and David Choffnes. 2018. Mile High WiFi: A First Look At
scale Crowdsourcing of Above-the-Fold QoE. In ACM Workshop on
In-Flight Internet Connectivity. In IW3C2 World Wide Web Conference
QoE-based Analysis and Management of Data Communication Networks
(WWW). https://doi.org/10.1145/3178876.3186057
(Internet-QoE). https://doi.org/10.1145/3098603.3098606
[26] Jan Rüth and Oliver Hohlfeld. 2018. Demystifying TCP Initial Window
[10] Google. 2019. Playing with QUIC - The Chromium Projects. https:
Configurations of Content Distribution Networks. In IFIP/IEEE Network
//www.chromium.org/quic/playing-with-quic.
Traffic Measurement and Analysis Conference (TMA). https://doi.org/
[11] Google. 2019. Speed Index. https://sites.google.com/a/webpagetest.o
10.23919/TMA.2018.8506549
rg/docs/using-webpagetest/metrics/speed-index.
[27] Michael Seufert, Raimund Schatz, Nikolas Wehner, Bruno Gardlo, and
[12] S. Ha, I. Rhee, and L. Xu. 2008. CUBIC: A New TCP-friendly High-
Pedro Casas. 2019. Is QUIC becoming the New TCP? On the Potential
speed TCP Variant. ACM SIGOPS Operating Systems Review (OSR) 42,
Impact of a New Protocol on Networked Multimedia QoE. In IEEE
5 (2008). https://doi.org/10.1145/1400097.1400105
International Conference on Quality of Multimedia Experience (QoMEX).
[13] Arash Molavi Kakhki, Samuel Jero, David Choffnes, Cristina Nita-
[28] sitespeed.io. 2019. Browsertime - Your browser, your page, your scripts!
Rotaru, and Alan Mislove. 2017. Taking a Long Look at QUIC: An
https://github.com/sitespeedio/browsertime.
Approach for Rigorous Evaluation of Rapidly Evolving Transport
[29] Maarten Wijnants, Robin Marx, Peter Quax, and Wim Lamotte. 2018.
Protocols. In ACM Internet Measurement Conference (IMC). https:
HTTP/2 Prioritization and Its Impact on Web Performance. In IW3C2
//doi.org/10.1145/3131365.3131368
World Wide Web Conference (WWW). https://doi.org/10.1145/3178876.
[14] Conor Kelton, Jihoon Ryoo, Aruna Balasubramanian, and Samir R.
3186181
Das. 2017. Improving User Perceived Page Load Times Using Gaze. In
[30] Y. Yu, M. Xu, and Y. Yang. 2017. When QUIC meets TCP: An Ex-
USENIX Symposium on Networked Systems Design and Implementation
perimental Study. In IEEE International Performance Computing and
(NSDI). https://www.usenix.org/conference/nsdi17/technical-
Communications Conference (IPCCC). https://doi.org/10.1109/PCCC.2
sessions/presentation/kelton.
017.8280429
[15] Adam Langley, Alistair Riddoch, Alyssa Wilk, Antonio Vicente, Charles
[31] Torsten Zimmermann, Benedikt Wolters, and Oliver Hohlfeld. 2017.
Krasic, Dan Zhang, Fan Yang, Fedor Kouranov, Ian Swett, Janardhan
A QoE Perspective on HTTP/2 Server Push. In ACM Workshop on
Iyengar, Jeff Bailey, Jeremy Dorfman, Jim Roskind, Joanna Kulik, Patrik
QoE-based Analysis and Management of Data Communication Networks
Westin, Raman Tenneti, Robbie Shade, Ryan Hamilton, Victor Vasiliev,
(Internet-QoE). https://doi.org/10.1145/3098603.3098604
Wan-Teh Chang, and Zhongyi Shi. 2017. The QUIC Transport Protocol:

You might also like