Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype Implementation

Application-Aware Acceleration for Wireless
Data Networks: Design Elements
and Prototype Implementation
Zhenyun Zhuang, Student Member, IEEE, Tae-Young Chang, Student Member, IEEE,
Raghupathy Sivakumar, Senior Member, IEEE, and Aravind Velayutham, Member, IEEE
Abstract—A tremendous amount of research has been done toward improving transport-layer performance over wireless data
networks. The improved transport layer protocols are typically application-unaware. In this paper, we argue that the behavior of
applications can and does dominate the actual performance experienced. More importantly, we show that for practical applications,
application behavior all but completely negates any improvement achievable through better transport layer protocols. In this context,
we motivate an application-aware, but application transparent, solution suite called A3
(application-aware acceleration) that uses a set
of design principles realized in an application-specific fashion to overcome the typical behavioral problems of applications. We
demonstrate the performance of A3
through both emulations using realistic application traffic traces and implementations using the
NetFilter utility.
Index Terms—Wireless networks, application-aware acceleration.
Ç
1 INTRODUCTION
Asignificant amount of research has been done toward
the development of better transport layer protocols
that can alleviate the problems Transmission Control
Protocol (TCP) exhibits in wireless environments [12],
[16], [17], [26]. Such protocols, and several more, have
novel and unique design components that are indeed
important for tackling the unique characteristics of wireless
environments. However, in this paper, we ask a somewhat
orthogonal question in the very context the above protocols
were designed for: How does the application’s behavior impact
the performance deliverable to wireless users?
Toward answering this question, we explore the impact
of typical wireless characteristics on the performance
experienced by the applications for very popularly used
real-world applications including File Transfer Protocol
(FTP), the Common Internet File Sharing (CIFS) protocol [1],
the Simple Mail Transfer Protocol (SMTP), and the
Hypertext Transfer Protocol (HTTP). Through our experi-
ments, we arrive at an impactful result: Except for FTP,
which has a simple application layer behavior, for all other
applications considered, not only is the performance
experienced when using vanilla TCP-NewReno much
worse than for FTP, but the applications see negligible or
no performance enhancements even when they are made to
use the wireless-aware protocols.
We delve deeper into the above observation and identify
several common behavioral characteristics of the applica-
tions that fundamentally limit the performance achievable
when operating over wireless data networks. Such char-
acteristics stem from the design of the applications, which is
typically tailored for operation in substantially higher
quality local area network (LAN) environments. Hence, we
pose the question: if application behavior is a major cause for
performance degradation as observed through the experiments,
what can be done to improve the end-user application performance?
In answering the above question, we present a new
solution called Application-Aware Acceleration (A3
, pro-
nounced as “A-cube”), which is a middleware that offsets
the typical behavioral problems of real-life applications
through an effective set of principles and design elements.
A3
’s design has five underlying design principles including
transaction prediction, prioritized fetching, redundant and
aggressive retransmissions, application-aware encoding,
and infinite buffering. The design principles are derived
explicitly with the goal of addressing the aforementioned
application layer behavioral problems. We present A3
as a
platform solution requiring entities at both ends of the end-
to-end communication, but also describe a variation of A3
called A3
(pronounced as “A-cube dot”), which is a point
solution but is not as effective as A3
. One of the keystone
aspects of the A3
design is that it is application-aware, but
application transparent.
The rest of the paper is organized as follows: Section 2
presents the motivation results for A3
. Section 3 presents the
key design elements underlying the A3
solution. Section 4
IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009 1
. Z. Zhuang is with the College of Computing, Georgia Institute of
Technology, 350333 Georgia Tech Station, Atlanta, GA 30332.
E-mail: zhenyun@cc.gatech.edu.
. T.-Y. Chang is with Xiocom Wireless, 3505 Koger Boulevard, Suite 400,
Duluth, GA 30096. E-mail: tchang@xiocom.com.
. R. Sivakumar is with the School of Electrical and Computer Engineering,
Georgia Institute of Technology, 5164 Centergy, 75 Fifth Street NW,
Atlanta, GA 30308. E-mail: siva@ece.gatech.edu.
. A. Velayutham is with Asankya, Inc., 75 Fifth Street NW, Atlanta, GA
30308. E-mail: vel@asankya.com.
Manuscript received 9 Feb. 2008; revised 19 Nov. 2008; accepted 10 Feb. 2009;
published online 19 Feb. 2009.
For information on obtaining reprints of this article, please send e-mail to:
tmc@computer.org, and reference IEEECS Log Number TMC-2008-02-0045.
Digital Object Identifier no. 10.1109/TMC.2009.52.
1536-1233/09/$25.00 ß 2009 IEEE Published by the IEEE CS, CASS, ComSoc, IES, SPS

describes the realization of A3
for specific applications.
Section 5 evaluates A3
and presents a proof-of-concept
prototype of A3
using the NetFilter utility. Section 6
discusses related works and Section 7 concludes the paper.
2 MOTIVATION
The focus of this work is entirely on applications that
require reliable and in-sequence packets delivery. In other
words, we consider only applications that are traditionally
developed with the assumption of using the TCP transport
layer protocol.
2.1 Evaluation Model
We now briefly present the setting and methodology
employed for the results presented in the rest of the section.
2.1.1 Applications
For the results presented in this section, we consider four
different applications: FTP, CIFS, SMTP, and HTTP.
. CIFS: Common Internet File System is a platform-
independent network protocol used for sharing files,
printers, and other communication abstractions
between computers. While originally developed by
Microsoft, CIFS is currently an open technology that
is used for all Windows workgroup file sharing, NT
printing, and the Linux Samba server.1
. SMTP: Simple Mail Transfer Protocol is used for the
exchange of e-mails either between mail servers or
between a client and its server. Most e-mail systems
that use the Internet for communication use SMTP.
. HTTP: Hypertext Transfer Protocol is the underlying
protocol used by the World Wide Web (WWW).
2.1.2 Traffic Generator
We use IxChariot [18] to generate accurate application-
specific traffic patterns. IxChariot is a commercial tool for
emulating most real-world applications. It consists of the
IxChariot console (for control), performance end points (for
traffic generation and reception), and IxProfile (for char-
acterizing performance).
2.1.3 Testbed
We use a combination of a real testbed and emulation to
construct the testbed for the results presented in the section.
Since IxChariot is a software tool that generates actual
application traffic, it is hosted on the sender and the receiving
machines as shown in Fig. 10. The path from the sender to the
receiver goes through a node running the Network Simulator
(NS2) [27] in emulation mode. The network emulator is
configured to represent desired topologies including the
different types of wireless technologies. More information on
the testbed is presented in Section 5.
2.1.4 Transport Protocols
Since we consider wireless LANs (WLANs), wireless WANs
(WWANs), and wireless satellite area networks (SATs), we
use transport layer protocols proposed in related literature
for each of these environments. Specifically, we use New-
Reno with Explicit Loss Notification (TCP-ELN) [12], Wide
area Wireless TCP (WTCP) [26], and Satellite Transport
Protocol (STP) [16] as enhanced transport protocols for
WLANs, WWANs, and SATs, respectively.
2.1.5 Parameters
We use average RTT values of 5, 200, and 1,000 ms, average
loss rates of 1, 8, and 3 percent, and average bandwidths of
5, 0.1, and 1 Mbps for WLANs, WWANs, and SATs,
respectively. We simulate wireless channels by introducing
various link parameters to packet-level traffic with NS2
emulation. The default Ethernet LAN MAC protocol is
used. The purpose for such a simplified wireless setup is to
examine the impact of application behaviors better by
isolating the effect of complicated wireless MAC protocols.
We use application-perceived throughput as the key metric
of interest. Each data point is taken as an average of
10 different experimental runs.
2.2 Quantitative Analysis
Fig. 1a presents the performance results for FTP under
varying loss conditions in WLANs, WWANs, and SAT
environments. The tailored protocols uniformly show con-
siderable performance improvements. The results illustrate
that the design of the enhancement protocols such as TCP-
ELN, WTCP, and STP, is sufficient enough to deliver
considerable improvements in performance for wireless
data networks, when using FTP as the application. In the rest
of the section, we discuss the impact of using such protocols
for other applications such as CIFS, SMTP, and HTTP.
Figs. 1b, 1c, and 1d show the performance experienced
by CIFS, SMTP, and HTTP, respectively, under varying loss
conditions for the different wireless environments. It can be
observed that the performance improvements demonstrated by
the enhancement protocols for FTP do not carry over to these three
applications. It can also be observed that the maximum
performance improvement delivered by the enhancement
protocols is less than 5 percent across all scenarios.
While the trend evident from the results discussed above
is that the enhanced wireless transport protocols do not
provide performance improvements for three very popu-
larly used applications, we argue in the rest of the section
that this is not due to any fundamental limitations of the
transport protocols themselves, but due to the specifics of
the behavior of the three applications under consideration.
2.3 Impact of Application Behavior
We now explain the lack of performance improvements
when using enhanced wireless transport protocols with
applications such as CIFS, SMTP, and HTTP. We use the
conceptual application traffic pattern for the three applica-
tions in Fig. 2 for most of our reasonings.
2.3.1 Thin Session Control Messages
All three applications, as observed in Fig. 2, use thin session
control message exchanges before the actual data transfer
occurs, and thin request messages during the actual data
transfer phase as well. We use the term “thin” to refer to the
fact that such messages are almost always contained in a
single packet of maximum segment size (MSS).
2 IEEE TRANSACTIONS ON MOBILE COMPUTING, VOL. 8, NO. X, XXXXXXX 2009
1. Samba uses SMB on which CIFS is based.

The observation above has two key consequences as
follows:
. When a loss occurs to a thin message, an entire
round-trip time (RTT) is taken to recover from such a
loss. When the round-trip time is large like in
WWANs and SATs, this can result in considerably
inflating the overall transaction time for the applica-
tions. Note that a loss during the data phase will not
have such an adverse impact, as the recovery from
that loss can be multiplexed with other new data
transmissions whereas for thin message losses, no
other traffic can be sent anyway.
. Most protocols, including TCP, rely on the arrival of
out-of-order packets to infer packet losses, and hence,
trigger loss recovery. In the case of thin messages,
since there are no packets following the lost message,
the only means for loss detection is the expiry of the
retransmission timer. Retransmission timers typically
have coarse minimum values to keep overheads low.
TCP, for example, typically uses a minimum Retrans-
mission Time Out (RTO) value of one second.2
2.3.2 Block-Based Data Fetches
Another characteristic of the applications, especially CIFS
and HTTP, is that although the total amount of data to be
fetched can be large, the data transfer is performed in
blocks, with each block including a “request-response”
exchange. CIFS uses its request-data-block message to send
the block requests, with each request typically requesting
only 16 to 32 KB of data.
Such a block-based fetching of data has two implications
to performance: 1) When the size of the requested data is
smaller than the Bandwidth Delay Product (BDP), there is a
gross underutilization of the available resources. Hence,
when the SAT network has a BDP of 128 KB and CIFS uses a
16 KB request size, the utilization is only 12.5 percent.
ZHUANG ET AL.: APPLICATION-AWARE ACCELERATION FOR WIRELESS DATA NETWORKS: DESIGN ELEMENTS AND PROTOTYPE... 3
Fig. 2. Application traffic patterns. (a) CIFS. (b) SMTP. (c) HTTP (single connection case).
Fig. 1. Impact of wireless environment characteristics on application throughput. (a) FTP (WLAN, WWAN, and SAT). (b) CIFS (WLAN, WWAN, and
SAT). (c) SMTP (WLAN, WWAN, and SAT). (d) HTTP (WLAN, WWAN, and SAT).
2. While newer Linux releases have lower minimum RTO values, they
still are in the order of several hundred milliseconds.

2) Independent of the size of each requested data block, one
rtt is spent in sending the next request once the current
requested data arrives. When the RTT of the path is large like
in WWANs and SATs, this can inflate the overall transaction
time, and hence, lower throughput performance.
2.3.3 Flow-Control Bottlenecked Operations
Flow control is an important function in communication
that helps in preventing the source from overwhelming the
receiver. In a mobile/wireless setting, flow control can kick
in and prove to be the bottleneck for the connection
progress due to two reasons: 1) If the application on the
mobile device reads slowly or is temporarily halted for
some other reason, the receiver buffer fills up and the
source is eventually frozen till the buffer empties. 2) When
there are losses in the network and the receiver buffer size
is of the same order as the BDP (which is typically true),
flow control can prevent new data transmissions even
when techniques such as fast recovery are used due to
unavailability of buffer space at the receiver. With fast
recovery, the sender inflates the congestion window to
compensate the new ACKs received. However, this infla-
tion may be curbed by the flow-control mechanism if there
is no buffer space on the receiver side.
2.3.4 Other Reasons
While the above discussed reasons are behavioral “acts of
commission” by the applications that result in lowered
performance, we now discuss two more reasons that can be
seen as behavioral “acts of omission.” These are techniques
that the applications could have used to address conditions
in a wireless environment, but do not.
Nonprioritization of data. For all three applications
considered, no explicit prioritization of data to be fetched is
performed, and hence, all the data to be fetched are given
equal importance. However, for certain applications prior-
itizing data in a meaningful fashion can have a profound
impact on the performance experienced by the end system
or user. For example, consider the case of HTTP used for
browsing on a small-screen PDA. When a Web page URL
request is issued, HTTP fetches all the data for the Web
page with equal importance. However, the data corre-
sponding to the visible portion of the Web page on the
PDA’s screen are obviously of more importance and will
have a higher impact on the perceived performance by the
end user. Thus, leveraging some means of prioritization
techniques can help deliver better performance to the user.
With such nonprioritization of data, HTTP suffers perfor-
mance as defined by the original data size and the low
bandwidths of the wireless environment.
Nonuse of data reduction techniques. Finally, another
issue is applications not using knowledge specific to their
content or behavior to employ effective data reduction
techniques. For example, considering the SMTP application,
“email vocabulary” of users has evolved over the last
couple of decades to be very independent of traditional
“writing vocabulary” and “verbal vocabulary” of the users.
Hence, it is an interesting question as to whether SMTP can
use e-mail vocabulary-based techniques to reduce the actual
content transferred between SMTP servers, or an SMTP
server and a client. Not leveraging such aspects proves to be
of more significance in wireless environments, where the
baseline performance is poor to start with.
3 DESIGN
Since we have outlined several behavioral problems with
applications in Section 2, an obvious question to ask is:
“Why not change the applications to address these problems?”
We believe that is indeed one possible solution. Hence, we
structure the presentation of the A3
solution into two
distinct components: 1) the key design elements or
principles that underlie A3
and 2) the actual realization of
the design elements for specific applications in the form of
an optimization middleware that is application-aware, but
application transparent. The design elements generically
present strategies to improve application behavior and
can be used by application developers to improve perfor-
mance by incorporating changes to the applications
directly. In the rest of this section, we outline the design
of five principles in the A3
solution.
3.1 Transaction Prediction (TP)
TP is an approach to deterministically predict future
application data requests to the server and issue them ahead
of time. Note that this is different from techniques such as
“opportunistic pre-fetching,” where content is heuristically
fetched to speed up later access but is not guaranteed to be
used.3
In TP, A3
is fully aware of application semantics and
knows exactly what data to fetch and that the data will be
used. TP will aid in conditions, where the BDP is larger than
the default application block fetch size and the RTT is very
large. Under both cases, the overall throughput will improve
when TP is used. Fig. 3a shows the throughput performance
of CIFS when fetching files of varying sizes in a 100 Mbps
3. We further discuss on this issue in Section 7.
Fig. 3. Motivation for TP and RAR. (a) Throughput of FTP and CIFS. (b) Number of requests of CIFS. (c) Throughput of SMTP.

LAN network. It can be seen that the performance is
substantially lower than that of FTP and this is due to the
block-based fetching mechanism described in Section 2.
Fig. 3b shows the number of transactions it takes CIFS to
actually fetch a single file, and it can be observed that the
number of transactions increases linearly with the file size.
Under such conditions, TP will “parallelize” the transac-
tions, and hence, improve throughput performance. Good
examples of applications that will benefit from using TP
include CIFS and HTTP for reasons outlined in Section 2.
3.2 Redundant and Aggressive Retransmissions
(RARs)
RAR is an approach to protect thin session control and
data request messages better from losses. The technique
involves recognizing thin application messages, using a
combination of packet level redundancy, and aggressive
retransmissions to protect such messages. RAR will help
address both issues with thin messages identified in
Section 2. The redundant transmissions reduce the prob-
ability of message losses and the aggressive retransmis-
sions that operate on tight RTT granularity time-outs
reduce the loss recovery time. The key challenge in RAR is
to recognize thin messages in an application-aware
fashion. Note that only thin messages require RAR because
of reasons outlined in Section 2. Regular data messages
should not be subjected to RAR both because their loss
recovery can be masked in the overall transaction time by
performing the recovery simultaneously with other data
packet transmissions, and because the overheads of
performing RAR will become untenable when applied to
large volume messages such as the data. Fig. 3c shows the
throughput performance of SMTP under lossy conditions
in a WWAN setup. The dramatic effect of a 35 percent
drop in throughput performance for a loss-rate increase
from 0 to 7 percent is much higher than the 15 percent
drop in performance in the FTP performance for the same
corresponding loss-rate increase. Typical applications that
can benefit from RAR include CIFS, SMTP, and HTTP.
3.3 Prioritized Fetching (PF)
PF is an approach to prioritize subsets of data to be fetched as
being more important than others and to fetch the higher
priority data faster than the lower priority data. A simple
approach to achieve the dual-rate fetching is to use default
TCP-like congestion control for the high-priority data, but
use congestion control like in TCP-LP [19] for low-priority
data. An important consideration in PF is to devise a strategy
to prioritize data intelligently and on the fly. Fig. 4a shows
the average transfer sizes per screen as well as the entire Web
page for the top 50 accessed Web pages on the World Wide
Web [2]. It can be seen that nearly 80 percent of the data
(belonging to screens 2 and higher) are not directly impacting
response time experienced by the user, and hence, can be
deprioritized in relation to the data pertaining to the first
screen. Note that the results are for a 1;024 Â 768 resolution
laptop screen, and will, in fact, be better for smaller screen
devices such as PDAs. Good examples of applications that
can benefit from PF include HTTP and SMTP.
3.4 Infinite Buffering (IB)
IB is an approach that prevents flow control from throttling
the progression of a network connection terminating at the
mobile wireless device. IB prevents flow control from
impacting performance by providing the sender the
impression of an infinite buffer at the receiver. Secondary
storage is used to realize such an infinite buffer, with the
main rationale being that reading from the secondary storage
will be faster than fetching it from the sender over the wireless
network when there is space created in the actual connection buffer
at a later point. With typical hard disk data transfer rates
today being at around 250 Mbps [5], the above-mentioned
rationale is well justified for wireless environments. Note
that the trigger for using IB can be both due to application
reading slowly or temporarily not reading from the
connection buffer, and due to losses on the wireless path.
Figs. 4b and 4c show the throughput performance of SMTP
under both conditions. Note that the ideal scenarios
correspond to an upper bound of the throughput.4
It can
be observed that for both scenarios, the impact of flow
control drastically lowers performance compared to what is
achievable. Due to lack of space, in the rest of the paper, we
focus on IB specifically in the context of the more traditional
trigger for flow control—application reading bottleneck.
Typical applications that can benefit from IB include CIFS,
SMTP, and HTTP—essentially, any application that may
attempt to transfer more than a BDP worth of data.
3.5 Application-Aware Encoding (AE)
AE is an approach that uses application-specific informa-
tion to better encode or compress data during commu-
nication. Traditional compression tools such as zip operate
on a given content in isolation without any context for the
Fig. 4. Motivation for PF (a) and IB (b, c). (a) Transfer size per screen. (b) Impact of application reading rate. (c) Impact of loss increase.
4. In Fig. 4c, the throughput drop is caused by both flow-control and
congestion-control-related mechanisms, and the flow-control mechanism
contributes significantly.

application corresponding to the content. AE, on the other
hand, explicitly uses this contextual information to achieve
better performance. Note that AE is not a better compres-
sion algorithm. However, it is a better way of identifying
data sets that need to be operated on by a given
compression algorithm. Table 1 shows the average e-mail
vocabulary characteristics of 10 different graduate students
based on 100 e-mails sent by each person during two
weeks. It is interesting to see the following characteristics
in the results: 1) the e-mail vocabulary size across the
10 people is relatively small—a few thousand words and
2) even a simple encoding involving this knowledge will
result in every word being encoded with only 10 to 12 bits,
which is substantially lower than using 40 to 48 bits
required using standard binary encoding. In Section 5, we
show that such vocabulary-based encoding can consider-
ably outperform other standard compression tools such as
zip as well. Moreover, further benefits can be attained if
more sophisticated compression schemes such as Huffman
encoding are employed instead of a simple BINARY
encoding. Typical applications that can benefit from using
AE include SMTP and HTTP.
4 SOLUTION
4.1 Deployment Model and Architecture
The A3
deployment model is shown in Fig. 5. Since A3
is a
platform solution, it requires two entities at either end of the
communication session that are A3
-aware. At the mobile
device, A3
is a software module that is installed in user
space. At the server side, while A3
can be deployed as a
software module on all servers, a more elegant solution
would be to deploy a packet processing network appliance
that processes all content flowing from the servers to the
wide area network. We assume the latter model for our
discussions. However, note that A3
can be deployed in
either fashion as it is purely a software solution.
This deployment model will help in any communication
between a server behind the A3
server and the mobile
device running the A3
module. However, if the mobile
device communicates with a non-A3
-enabled server, two
options exist: 1) As we discuss later in the paper, A3
can be
used as a point-solution with lesser effectiveness or 2) the
A3
server can be brought closer to the mobile device,
perhaps within the wireless network provider’s access
network. In the rest of the paper, we do not delve into the
latter option. However, we do revisit the point-solution
mode of operation of A3
.
We present an A3
implementation that resides in user
space, and uses the NetFilter utility in Linux for the
capturing of traffic outgoing and incoming at the mobile
device. NetFilter is a Linux-specific packet capture tool that
has hooks at multiple points in the Linux kernel. The A3
hooks are registered at the Local-In and Local-Out stages of
the chain of hooks in NetFilter. While our discussions are
Linux-centric, our discussions can be mapped on the
Windows operating system through the use of the Windows
Packet Filtering interface, or wrappers such as PktFilter that
are built around the interface. Fig. 6a shows the A3
deployment on the mobile device using NetFilter.
The A3
software architecture is shown in Fig. 6b. Since
the design elements in A3
are to a large extent independent
of each other, a simple chaining of the elements in an
appropriate fashion results in an integrated A3
architecture.
The specific order in which the elements are chained in the
A3
realization is TP, RAR, PF, IB, and AE. While RAR
TABLE 1
Statistics of 100 E-Mails Sent by 10 Users
Fig. 5. Deployment model.
Fig. 6. A3
deployment model with NetFilter and software architecture. (a) Deployment with Netfilter. (b) Software architecture.

protects the initial session control exchanges and the data
requests, it operates on traffic after TP, given that TP can
generate new requests for data. PF manipulates the priority
with which different requests are served and IB ensures
that data responses are not throttled by flow control.
Finally, AE compresses any data outgoing and decom-
presses any data incoming.
4.2 Application Overviews
Since we describe the actual operations of the mechanisms
in A3
in the context of one of the three applications, we now
briefly comment on the specific message types involved in
typical transactions by those applications. We then refer to
the specific message types when describing the operations
of A3
subsequently.
Due to lack of space, instead of presenting all message
types again, we refer readers back to Fig. 2 to observe the
message exchanges for the three applications. The labels such
as CIFS-x refer to particular message types in CIFS and will be
referred to in the A3
realization descriptions that follow.
CIFS, also sometimes known as Server Message Block
(SMB), is a platform-independent protocol for file sharing.
The typical message exchanges in a CIFS session are as
shown in Fig. 2a. Overall, TP manipulates the CIFS-11
message, RAR operates on CIFS-1 through CIFS-11, and IB
aids in CIFS-12.
SMTP is Internet’s standard host-to-host mail transport
protocol and traditionally operates over TCP. The typical
message exchanges in an SMTP session are shown in Fig. 2b.
Overall, RAR operates on SMTP-1 through SMTP-8 and
SMTP-12 through SMTP-14, IB and AE operate on SMTP-9
and SMTP-10.
The HTTP message exchange standard is relatively
simple, and typically consists of the messages shown in
Fig. 2c. A typical HTTP session consists of multiple objects
as well as the main HTML file, and hence, appears as a
sequence of overlapping exchanges of the above format.
Overall, RAR operates on HTTP-1,2,3,5 and PF operates on
HTTP-3,5.
4.3 A3
Realization
In the rest of the section, we take one design element at a
time and walk through the algorithmic details of the
element with respect to a single application. Note that A3
is an application-aware solution, and hence, its operations
will be application specific. Since we describe each element
in isolation, we assume that the element resides between the
application and the network. In an actual usage of A3
, the
elements will have to be chained as discussed earlier.
4.3.1 Transaction Prediction
Fig. 7a shows the flowchart for the implementation of TP for
CIFS at the A3
client. When A3
receives a message from the
application, it checks to see if the message is CIFS-9 and
records state for the file transfer in its File-TP-States data
structure. It then passes through the message. If the
message was a request, TP checks to see if the request is
for a locally cached block, or for a new block. If the latter, it
updates the request for more blocks, stores information
about the predicted requests generated in the Predicted-
Request-States data structure, and forwards the requests.
In the reverse direction, when data come in from the
network, TP checks to see if the data are for a predicted
request. If yes, it caches the data in secondary storage and
updates its state information, and forwards the data to the
application otherwise.
The number of additional blocks to request is an
interesting design decision. For file transfer scenarios, TP
generates requests asking for the entire file.5
The file size
information can be retrieved from the CIFS-10 message. If
the incoming message is for an earlier retrieved block, TP
retrieves the block from secondary storage and provides it
to the application.
While CIFS servers accept multiple data requests from the
same client simultaneously, it is possible that for some
applications, the server might not be willing to accept
multiple data requests simultaneously. In such an event, the
A3
server will let only one of the client requests go through to
the server at any point in time, and will send the other
requests one at a time once the previous requests are served.
4.3.2 Redundant and Aggressive Retransmissions
Fig. 7b shows the flowchart for the implementation of RAR
for CIFS. When A3
receives a message from the application,
it checks to see if it is a thin message. The way A3
performs
the check is to see if the message is one of the messages
between CIFS-1 and CIFS-11. All such messages are
interpreted as thin messages.
Fig. 7. TP and RAR (shaded blocks are storage space and timer, white blocks are operations). (a) Transaction prediction. (b) Redundant and
aggressive retransmissions.
5. We further discuss on this issue in Section 7.

If the incoming message is not a thin one, A3
will let it
through as-is. Otherwise, A3
will create redundant copies of
the message, note the information about current time, start
retransmission alarm, and send out the copies in a staggering
fashion. When a response arrives, A3
checks the time stamp
for the corresponding request and updates its estimated
RTT. A3
then passes on the message to the application. If the
alarm expires for a particular thin message, the message is
again subjected to the redundant transmissions. A3
server is
responsible for filtering the successful arrivals of redundant
copies of the same message.
The key issues of interest in the RAR implementation
are: 1) How many redundant transmissions are performed?
Since packet loss rates in wireless data networks rarely
exceed 10 percent, even a redundancy factor of two (two
additional copies created) reduces the effective loss-rate to
about 0.1 percent. Hence, A3
uses a redundancy factor of
two. 2) How should the redundant messages be staggered?
The answer to this question lies in the specific channel
characteristics experienced by the mobile device. However,
at the same time, the staggered delay should not exceed the
round-trip time of the connection, as otherwise the
mechanism would lose its significance by unnecessarily
delaying the recovery of losses. Hence, A3
uses a stagger-
ing delay of RTT
10 between any two copies of the same
message. This ensures that within 20 percent of the RTT
duration, all messages are sent out at the mobile device.
3) How is the aggressive time-out value determined? Note
that while the aggressive time-out mechanism will help
under conditions when all copies of a message are lost, the
total message overhead by such an aggressive loss recovery
is negligible when compared to the overall size of data
transferred by the application. Hence, A3
uses a time-out
value of the RTTavg þ , where is a small guard constant
and RTTavg is the average RTT observed so far. This
simple setting ensures that the time-out values are tight,
and at the same time, the mechanism adapts to changes in
network characteristics.
4.3.3 Prioritized Fetching
Fig. 8a shows the flowchart for the implementation of PF in
the context of HTTP. Once again, the key goal in PF for
HTTP is to retrieve Web objects that are required for the
display of the visible portion of the Web page quickly at the
expense of the objects on the page that are not visible.
Unlike in the other mechanisms, PF cannot be imple-
mented without some additional interactions with the
application itself. Fortunately, browser applications have
well-defined interfaces for querying state of the browser
including the current window focus, scrolling information,
etc. Hence, the implementation of PF relies on a separate
module called the application state monitor (ASM) that is
akin to a browser plug-in to coordinate its operations.
When a message comes in from the application, PF
checks to see if the message is a request. If it is not, it is let
through. Otherwise, PF checks with the ASM to see if the
requested contents are immediately required. ASM classi-
fies the objects requested as being of immediate need (i.e.,
visible portion of Web page) or as those that are not
immediately required. PF then sends out fetch requests
immediately for the first category of objects and uses a low-
priority fetching mechanism for the remaining objects.
Since A3
is a platform solution, all PFs have to inform the
A3
server that certain objects are of low priority through
A3
-specific piggybacked information. The A3
server then
deprioritizes the transmission of those objects in preference
to those that are of higher priority. Note that the relative
prioritization is used not only between the content of a
single end device, but also across end devices as well to
improve overall system performance. Approaches such as
TCP-LP [19] are candidates that can be used for the relative
prioritization between TCP flows, although A3
currently
uses a simple priority queuing scheme within the same
TCP flow at the A3
server.
Note that while the ASM might classify objects in a
particular fashion, changes in the application (e.g.,
scrolling down) will result in a reprioritization of the
objects accordingly. Hence, the ASM has the capability of
gratuitously informing PF about priority changes. Such
changes are immediately notified to the A3
server through
appropriate requests.
4.3.4 Infinite Buffering
Fig. 8b shows the flowchart for the implementation of IB in
the context of SMTP. IB keeps track of TCP connection
status and monitors all ACKs that are sent out by the TCP
connection serving the SMTP application for SMTP-9 and
SMTP-10. If the advertised window in the ACK is less than
the maximum possible, IB immediately resets the adver-
tised window to the maximum value and appropriately
Fig. 8. PF and IB (shaded blocks are storage space and timer, white blocks are operations). (a) Prioritized fetching. (b) Infinite buffering.

updates its current knowledge of the connection’s buffer
occupancy and maximum in-sequence ACK information.
Hence, IB prevents anything less than the maximum
buffer size from being advertised. However, when data
packets arrive from the network, IB receives the packets and
checks to see if the connection buffer can accommodate
more packets. If the condition is true, IB delivers the packets
to the application directly. If the disk cache is nonempty,
which means that the connection buffer is full, the incoming
packet is directly added to the cache. In this case, IB
generates a proxy ACK back to the server. Then, if the
connection buffer has space in it, packets are retrieved from
the disk cache and given to the application till the buffer
becomes full again. When the connection sends an ACK for
a packet already ACKed by IB, IB suppresses the ACK.
When the connection state is torn down for the SMTP
application, IB resets the state accordingly.
4.3.5 Application-Aware Encoding
Fig. 9 shows the flowchart for the implementation of AE for
SMTP. When AE receives data (SMTP-9) from the SMTP
application, it uses its application vocabulary table to
compress the data, marks the message as being compressed,
and forwards it to the network. The marking is done to
inform the A3
server about the need to perform decom-
pression. Similarly, when incoming data arrive for the
SMTP server and the data are marked as compressed, AE
performs the necessary decompression.
The mechanisms used for the actual creation and
manipulation of the vocabulary tables are of importance
to AE. In A3
, the SMTP vocabulary tables are created and
maintained purely on a user pairwise basis. Not only are the
tables created in this fashion, but the data sets over which
the vocabulary tables are created are also restricted to this
pairwise model. In other words, if A is the sender and B is
the receiver, A uses its earlier e-mails to B as the data set on
which the A-B vocabulary table is created, and then uses
this table for encoding. B, having the data set already (since
the e-mails were sent to B), can exactly recreate the table on
its side, and hence, decode any compressed data. This
essentially precludes the need for exchanging tables fre-
quently and also takes advantage of changes in vocabulary
sets that might occur based on the recipient. Though the
tables are created on both sides implicitly and synchronized
in most cases, a backup mechanism used to explicitly
synchronize the tables is also needed. The synchronization
action is triggered by a mismatch of table hashes on both
sides, and the hash is sent along each new e-mail and
updated when the table changes.
4.4 A3
Point Solution—A3

While the A3
deployment model assumed so far is a
platform model requiring participation by A3
-enabled
devices at both the client and server ends, in this section,
we describe how A3
can be used as a point-solution, albeit
with somewhat limited capabilities. We refer to the point-
solution version of A3
as A3
.
Of the five design elements in A3
, the only design
element for which the platform model is mandatory is the
application-aware encoding mechanism. Since compression
or encoding is an end-to-end process, A3
cannot be used
with AE. However, each of the other four principles can be
employed with minimal changes in A3
.
TP involves the generation of predictive data requests,
and hence, can be performed in A3
as long as the
application server can accept multiple simultaneous re-
quests. For CIFS and HTTP, the servers do accept
simultaneous requests. IB is purely a flow-control avoid-
ance mechanism and can be realized in A3
. RAR
involves redundant transmissions of messages, and hence,
can be implemented in A3
as long as application servers
are capable of filtering duplicate messages. If the applica-
tion servers are not capable of doing so (e.g., HTTP
servers, which would respond to each request), the
redundant transmissions will have to be performed at
the granularity of transport layer segments as opposed to
application layer messages, since protocols such as TCP
provide redundant packet filtering. Finally, PF can be
accomplished in A3
in terms of classifying requests and
treating the requests differently. However, the slow
fetching of data not required immediately has to be
realized through coarser receiver-based mechanisms such
as delayed requests as opposed to the best possible
strategy of slowing down responses as in A3
.
5 EVALUATION
In this section, we evaluate the performance of A3
. The
evaluation is performed with application-specific traffic
generators, which are modeled based on traffic traces
generated by the IxChariot emulator and documented
standards for the application protocols. Since each applica-
tion protocol in the study has various software implementa-
tions and different implementations may differ in certain
aspects of protocol standards, we believe that such simula-
tions with abstracted traffic generators can help capture the
trend of performance enhancement delivered by A3
.
In addition to the emulation, we also build a proof-of-
concept prototype of A3
. The prototype implements all five of
A3
design principles and works with the following applica-
tions: Secure Copy (SCP), Internet Explorer, Samba, and
SendMail. The primary goal of building such a prototype is to
prove that the proposed A3
architecture does indeed work
with real applications. We also use the prototype to obtain
system-level insights into A3
implementation.
Fig. 9. Application-aware encoding.

5.1 Setup
5.1.1 Emulation
The experimental setup for the emulation is shown in Fig. 10.
The setup consists of three desktop machines running the
Fedora Core 4 operating system with the Linux 2.6 kernel. All
the machines are connected using 100 Mbps LAN.
An application-emulator (AppEm) module runs on both
the two end machines. The AppEm module is a custom-
built user-level module that generates traffic patterns and
content for three different application protocols: CIFS,
SMTP, and HTTP. The AppEm module also generates
traffic content based on both real-life input data sets (for
e-mail and Web content) and random data sets (File
transfer).6
The traffic patterns shown in Fig. 2 are
representative of the traffic patterns generated by AppEm.
The system connecting the two end systems runs the
emulators for both A3
(A3
Em) and the wireless network
(WNetEm). Both emulators are implemented within the
framework of the ns2 simulator, and ns2 is running in the
emulation mode. Running ns2 in its emulation mode allows
for the capture and processing of live network traffic. The
emulator object in ns2 taps directly into the device driver of
the interface cards to capture and inject real packets into the
network. All five A3
mechanisms are implemented in the
A3
-Em module and each mechanism can be enabled either
independently or in tandem with the other mechanisms.
The WNetEm module is used for emulating different
wireless network links representing the WLAN, WWAN,
and SAT environments. The specific characteristics used to
represent wireless network environments are the same as
those presented in Section 2.
The primary metrics monitored are throughput, re-
sponse time (for HTTP), and confidence intervals for the
throughput and response time. Each data point is the
average of 20 simulation runs, and in addition, we show
the 90 percent confidence intervals. The results of the
evaluation study are presented in two stages. We first
present the results of the performance evaluation of A3
principles in isolation. Then, we discuss the combined
performance improvements delivered by A3
.
5.1.2 Proof-of-Concept Prototype
The prototype runs on a testbed consisting of five PCs
connected in a linear topology. The first four PCs run
Fedora Core 5 with 2.6.15 Linux kernel. The fifth PC has
dual OS of both Fedora Core 5 and Windows 2000. All
machines are equipped with 1 GHz CPU and 256 MB
memory. The implementation utilizes NetFilter [7] Utility
for Linux platform. The first PC works as the application
server for the SMTP (Sendmail server), CIFS (Samba server),
and SCP (SCP server). The A3
server module and client
module are installed on the second and fourth PCs,
respectively. The third PC works as a WAN emulator and
the fifth PC has the e-mail client, sambaclient, SCP client,
and Internet Explorer running on it.
The prototype implementation makes use of two netfilter
libraries: libnfnetlink (version 0.0.16) and libnetfilter_queue
(version 0.0.12) [7]. The registered hook points that we use are
NF_IP_FORWARD. For all the hooks, the registered target is
the NF_QUEUE, which queues packets and allows user-
space processing. After appropriate processing, the queued
packets will be passed, dropped, or altered by the modules.
5.2 Transaction Prediction
We use CIFS as the application traffic for evaluating the
performance of Transaction Prediction. The results of the TP
evaluation are shown in Fig. 11. The x-axis of each graph
shows the size of the transferred file in megabytes and the
y-axis the application throughput in megabits per second.
The results show several trends as follows:
1. Using wireless-aware transport layer protocols (such
as ELN, WTCP, and STP), the increase in throughput
is very negligible. This trend is consistent with the
results in Section 2.
2. Transaction Prediction improves CIFS application
throughput significantly. In the SAT network, for
instance, TP improves CIFS throughput by more
than 80 percent when transferring a 10-Mbytes file.
3. The improvement achieved by TP increases with
increase in file size. This is because TP is able to
reduce more the number of request-response inter-
actions with increasing file size.
4. TPachievesthehighestimprovementinSATnetwork.
5.2.1 Proof-of-Concept Results
The prototype works with a Smbclient and a Samba server.
The scenario considered in the prototype is that of the
Smbclient requesting files of various sizes from the Samba
server. The implementations for CIFS are working with
SMB protocols running directly above TCP (i.e., by
connecting to port 445 of Samba servers) instead of over
NetBIOS sessions. The Samba server version is 3.0.23 and
smbclient version is 2.0.7.
One of the nontrivial issues faced while implementing
the prototype is the TCP sequence manipulation. The issue
is caused by the TP acceleration requests generated by A3
.
SMB sessions use TCP to request/send data, thus, the
Samba server is always expecting TCP packets with correct
TCP sequence numbers. The acceleration request has to
predict not only the block offset for SMB sessions but also
the TCP sequence numbers, failing which the Samba server
would see a TCP packet with an incorrect TCP sequence
number and behave unexpectedly. The prototype imple-
mentation addresses this problem by also keeping track of
TCP state and using the appropriate TCP sequence
numbers. This is an indication of the application awareness
Fig. 10. Simulation network.
6. While the IxChariot emulator can generate representative traffic traces,
it does not allow for specific data sets to be used for the content, and hence,
the need for the custom built emulator.

of A3
potentially needing to be extended to include
transport layer awareness as well. Since some of the
principles in A3
are directly transport layer dependent
(e.g., infinite buffering), we believe that this extension still
falls within the scope of A3
.
The proof-of-concept results are shown in Fig. 13a. TP
helps deliver more improvement for larger files, and the
throughput improvement achieved when requesting a 5-
Mbytes file is up to 500 percent.
5.3 Redundant and Aggressive Retransmissions
We evaluate the effectiveness of RAR using the CIFS
application protocol. The results of the RAR evaluation
are presented in Fig. 12. The x-axis in the graphs is the
requested file size in megabyte and the y-axis is the CIFS
application throughput in megabits per second. We observe
that RAR delivers better performance when compared to
both TCP-NewReno and the tailored transport protocols,
delivering up to 80 percent improvement in throughput
performance for SATs. RAR is able to reduce the chances of
experiencing a time-out when a wireless packet loss occurs.
The reduction of TCP time-outs leads to better performance
using RAR.
The prototype implementation of RAR includes compo-
nents for performing the following functions: recognizing
session control messages, retransmission control, and
redundancy removal. On the sender side, the retransmis-
sion control component maintains current RTT values, sets
timers, and retransmits the possibly lost messages when
timers expire. The transmission of the redundant messages
is done using raw sockets. On the receiver side, the
redundancy removal component identifies redundant mes-
sages when the retransmission is a false alert, i.e., the
original message being retransmitted was not lost, but the
RAR aggressively performs retransmission.
The prototype is built with a SendMail server and an e-
mail client. An e-mail of 5.2 KB is sent to SendMail server
over the network of 200 ms RTT, 100 Kbps bandwidth, and
varying loss rates. Throughput with and without RAR is
shown in Fig. 13b. We observe a 250 percent throughput
improvement when loss rate is 8 percent. More interest-
ingly, the throughput of RAR is not affected much by the
loss rate since RAR effectively hides the losses.
5.4 Infinite Buffering
The effectiveness of IB is evaluated using CIFS traffic and the
results are shown in Fig. 14. The x-axis is the requested file
size in megabytes and the y-axis is the application
throughput in megabits per second. We can see that:
1) Transferring larger data size with IB achieves higher
throughput. This is because IB helps most during the actual
data transfer phase, and will not help when the amount of
data to be transferred is less than a few times the BDP of the
network. 2) IB performs much better in an SAT network than
the other two networks, delivering almost a 400 percent
improvement in performance. Again, the results are as
expected because IB’s benefits are higher when the BDP of
the network is higher.
Fig. 13. Prototype results of TP and RAR. (a) TP (Samba server).
(b) RAR (SendMail).
Fig. 12. Simulation results of redundant and aggressive retransmissions (CIFS). (a) WLAN. (b) WWAN. (c) SAT.
Fig. 11. Simulation results of transaction prediction (CIFS). (a) WLAN. (b) WWAN. (c) SAT.

We choose an SCP implementation to build the prototype of
IB. The IB component on the client side provides virtual
buffers, i.e., local storage, in user space. It stores data on
behalf of the data sender (i.e., SCP server). On the other
hand, it supplies stored data to the data receiver (i.e., SCP
client) whenever it receives TCP ACKs from it.
In the experiments, a 303-Kbytes file is sent from the SCP
server to SCP client over the network of 100 Mbps
bandwidth and varying RTTs. The tests are performed to
learn the impact of RTT on the performance improvement.
The results are shown in Fig. 16a. We see that considerable
improvements are achieved by IB. An interesting observa-
tion is that IB delivers more improvements with small RTT
values. As RTT increases, the actual throughput of SCP also
decreases even with IB enabled.
5.5 Prioritized Fetching
The performance of PF is evaluated with HTTP traffic and
the results are shown in Fig. 15. We consider the Top 50
Web Sites as representatives of typical Web pages and
measure their Web characteristics. We then use the obtained
Web statistics to generate the workload. The x-axis in the
graphs is the requested Web page size in kilobytes and the
y-axis is the response time in seconds for the initial screen.
In the figure, it can be seen that as a user accesses larger
Web pages, the response time difference between default
content fetching and PF increases. PF consistently delivers a
15 to 30 percent improvement in the response time
performance. PF reduces aggressive traffic volumes by
deprioritizing the out-of-sequence fetching of the offscreen
objects. Note that PF, while improving the response time,
does not improve raw throughput performance. In other
words, only the effective throughput, as experienced by the
end user, increases when using PF.
PF is a client-side solution, which does not require any
modification at the server side, but requires the integration
with the application at the client side. In the prototype, we
use WinAPI with Internet Explorer 6.0 on the Windows
operating system (running on PC-5).
The PF prototype consists of three main components. The
first component is the location-based object prioritization.
The current prototype initially turns off the display option
of multimedia objects by changing the associated registry
values. After the initial rendering is completed without
downloading multimedia objects, it calculates the location
of all the objects. The second component is the priority-
based object fetching and displaying. The current prototype
uses the basic on-off model, which fetches the high-priority
objects first and then fetches the other objects. If the pixel-
size information of the object is inconsistent with the
definition in the main document file, the prototype per-
forms the reflow process that renders the entire document
layout again. The third component is the reprioritization.
When a user moves the current focus in the application
window, PF detects the movement of the window and
performs the reprioritization for the objects that are
supposed to appear in the newly accessed area.
Fig. 14. Emulation results of infinite buffering (CIFS). (a) WLAN. (b) WWAN. (c) SAT.
Fig. 16. Prototype results of IB and AE. (a) IB (SCP). (b) AE (SendMail).
Fig. 15. Emulation results of prioritized fetching (HTTP). (a) WLAN. (b) WWAN. (c) SAT.

The Web clients are connected to the Internet and access
two Web sites: www.amazon.com and www.cnn.com. To
highlight the features of PF, we show the results of both
transferred size and response time for the first screen in
Figs. 17a and 17b, respectively. Note that PF can reduce the
response time by prioritizing data on the Web pages and
transferring only high-priority data first. PF sees about
30 percent improvements on both of these two metrics.
5.6 Application-Aware Encoding
AE is designed primarily to accelerate e-mail delivery
using SMTP, and hence, we evaluate the effectiveness of
AE for SMTP traffic. In the evaluation, e-mails of sizes
ranging from 1 to 10 Kbytes (around 120 to 1,200 words)
are used. We show the results in Fig. 18, where the x-axis
is the e-mail size in kilobytes and the y-axis is the
application throughput in megabits per second. Varying
degrees of throughput improvements are achieved, and in
WWAN, an increase of 80 percent is observed when
transferring a 10-Kbytes e-mail. We can see that AE
achieves the highest improvement in WWAN due to its
relatively low bandwidth.
We also show the effectiveness of AE in terms of
compression ratio in Fig. 20. In the figure, the results of
10 persons’ e-mails using three compression estimators
(WinRAR, WinZip, and AE) are shown. We can see that
WinRAR and WinZip can compress an e-mail by a factor of
2 to 3, while AE can achieve a compression ratio of about 5.
The prototype of AE maintains a coding table on either
side, and these two tables are synchronized in order to
provide encoding and decoding functions. AE monitors the
DATA message in SMTP protocol to locate the e-mail
contents. The e-mail content is textual in nature and is
expressed using the US-ASCII standard. AE uses the
Extended ASCII Codes to provide encoding. We employ a
simplified Huffman-style coding mechanism for the sake of
operation complexity. The total coding space size is 5,008.
The AE component scans an incoming e-mail, and if a
word is contained in the coding table, it is replaced by the
corresponding tag. If several consecutive words are covered
by coding tables, their codes will be concatenated, and
necessary padding will be added to the codes to form full
bytes. For the words that are not covered by the coding
tables, they will stay unchanged with their ASCII repre-
sentations. Since the e-mail vocabulary of a user may
change with time, AE incorporates a table updating
mechanism. AE periodically performs updating operations
for every 500 e-mails. To maintain table consistency
between the client and the server, a table synchronization
mechanism is employed. Since a user’s e-mail vocabulary is
expected to change slowly, the AE performs incremental
synchronization rather than copying the entire table.
SendMail is used to build the prototype. Purely text-
based e-mails of various sizes are sent from an e-mail
client to SendMail server, and the network is configured
with 100 ms RTT and 50 Kbps bandwidth. The results are
shown in Fig. 16b. Every data point is the average value of
five e-mails of similar sizes. The throughput is improved
by 80 percent with AE.
5.7 Integrated Performance Evaluation
We now present the results of the combined effectiveness of
all applicable principles for the three application protocols,
CIFS, SMTP and HTTP. We employ RAR, TP, and IB on the
CIFS traffic in the emulation setup. For SMTP, the RAR, AE,
and IB principles are used. For HTTP, the A3
principles
applied are RAR, PF, and IB. As expected, the throughput
of the applications (CIFS and SMTP) when using the
integrated A3
principles is higher than when any individual
principle is employed in isolation, while the response time
of HTTP is lower than any individual principle. The results
are shown in Fig. 19, with A3
delivering performance
improvements of approximately 70, 110, and 30 percent for
CIFS, SMTP, and HTTP, respectively.
6 RELATED WORK
6.1 Wireless-Aware Middleware and Applications
The Wireless Application Protocol (WAP) is a protocol
developed to allow efficient transmission of WWW content
to handheld wireless devices. The transport layer protocols
in WAP consists of the Wireless Transaction Protocol and
Fig. 18. Simulation results of application-aware encoding (SMTP). (a) WLAN. (b) WWAN. (c) SAT.
Fig. 17. Prototype results of prioritized fetching (Internet Explorer).
(a) Transferred size. (b) Response time.

Wireless Datagram Protocol, which are designed for use
over narrow band bearers in wireless networks and are not
compatible with TCP. WAP is highly WWW-centric and
does not aim to optimize any of the application behavioral
patterns identified earlier in the paper.
Browsers such as Pocket Internet Explorer (PIE) [8] are
developed with capabilities that can address resource
constraints on mobile devices. However, they do not
optimize communication performance, which is the focus
of A3
. The work of Mohomed et al. [20] aims to save
bandwidth and power by adapting the contents based on
user semantics and contexts. The adaptations, however, are
exposed to the end applications and users. This is different
from the A3
approach, which is application-transparent.
The Odyssey project [21] focuses on system support for
collaboration between the operating system and individual
applications by letting them both be aware of the wireless
environment, and thus, adapt their behaviors. Compara-
tively, A3
does not rely on the redesign of the OS or protocol
stack for its operation and is totally transparent both to the
underlying OS and the applications. The Coda file system
[25] is based on the Andrew File System (AFS), but supports
disconnected operations for mobile hosts. When the client is
connected to the network, it hoards files for later use during
disconnected operations. During disconnections, Coda
emulates the server, serving files from its local cache.
Coda’s techniques are specific to file systems and require
applications to have changed semantics for the data that
they use.
6.2 Related Design Principles
Some related works in literature have been proposed to
accelerate applications with various mechanisms [14], [15].
We present a few of them here and identify the differences
vis-a-vis A3
.
1. TP-related: Czerwinski and Joseph [13] propose to
“upload” clients’ task to the server side, thus,
eliminating many RTTs required for applications
like SMTP. This approach is different from the A3
approach in terms of application protocols applied
and the overall mechanism.
2. RAR-related: Mechanisms like Forward Error Control
(FEC) use error control coding for digital commu-
nication systems. A link-layer retransmission ap-
proach to improve TCP performance is proposed in
[22]. Another work [28] proposes aggressive retrans-
mission mechanism to encourage legitimate clients
to behave more aggressively in order to fight attack
against servers. Compared to these approaches, A3
only applies RAR to control messages in application
protocols and it does so by retransmitting the control
messages when a maintained timer expires. We
present arguments earlier in the paper as to why
protecting control message exchanges is a major
factor affecting application performance.
3. PF-related: To improve the Web-access performance,
tremendous works have been done [24], [8], [6], [11].
Work in [20] proposes out-of-order transmission of
HTTP objects above UDP and breaks the in-order
delivery of an object. However, unlike the A3
framework, it requires the cooperation of both client
and server sides.
4. IB-related: The mechanisms such as that given in [23],
TCP Performance Enhancing Proxy (TCP PEP) [9]
are proposed to shield the undesired characteristics
of various networks, particularly wireless networks.
IB is different from these approaches and aims at
fully utilizing the network resources by removing
the buffer length constraint. IB specifically applies to
applications with bulk data transfer, such as FTP,
and is meant to counter the impact of flow control.
Some works also observe that applications suffer
from poor performance over high latency links due
to flow control. For example, Rapier and Bennett [23]
propose to change the SSH implementations to
remove the bottleneck caused by receive buffer.
5. AE-related: Companies like Converged [3] provide
application-aware compression solutions through
compressing the data for some applications based
on priority and application nature. These mechanisms
share the property of being application-aware, meaning
that only a subset of applications will be compressed.
Fig. 19. Integrated A3
results in WWAN. (a) CIFS. (b) SMTP. (c) HTTP.
Fig. 20. Effectiveness of AE.

However, AE has the property of being user-aware,
that is, taking into consideration user-specific infor-
mation, and thus, can achieve better performances.
6.3 Commercial WAN Optimizers
Several companies, such as Riverbed [10] and Juniper [4],
sell WAN-optimization application-acceleration products.
However, 1) Almost all the commercial solutions are
proprietary ones; 2) The A3
principles such as RAR, IB,
AE, and PF are not seen in commercial solutions; and
3) Many of the techniques used in commercial solutions,
such as bit-level caching and compression, are hardware-
based approaches and require large amounts of storage.
These properties render the commercial solutions inap-
plicable for environments, where easy deployment is
required. A3
is a middleware approach requiring small
amounts of storage.
7 CONCLUSIONS AND DISCUSSION
In this paper, we motivate the need for application
acceleration for wireless data networks, and present the
A3
solution that is application-aware, but application
transparent. We further discuss a few issues in the rest of
the section.
7.1 Insights into A3
Principles
In this work, we present a set of five A3
principles. We
realize that this set is not an exclusive set of all A3
principles. Hence, we further explore the design space of a
general application acceleration framework. Specifically, we
argue that the general A3
framework consists at least five
orthogonal dimensions of principles, namely Provisioning,
Protocol Optimization, Prediction, Compression, and QoS.
In this context, RAR and IB belong to the dimension of
Protocol Optimization, TP belongs to Prediction dimension,
AE belongs to Compression dimension, and PF belongs to
QoS dimension. More principles, as well as more dimen-
sions, are left as part of our future work.
The principles of RAR, IB, and AE are application
independent, meaning that they can be used to accelerate
any application; while PF and TP are application specific
and can only help certain applications. We believe that such
classifications can help gain more insights into the A3
design so that the A3
principles can be incorporated into the
design of new applications.
7.2 TP versus Opportunistic Prefetching
TP is designed to do deterministic prefetching rather than
opportunistic prefetching. Opportunistic prefetching tech-
niques aggressively request data that might be used by the
end user in the future. For example, some Web-access
products (e.g., Web browsers) prefetch data by requesting
Web contents based on some hints. Our design goal of TP
is to do deterministic prefetching since otherwise the
design will incur overhead incurred by requesting
unnecessary contents.
Ensuring deterministic prefetching is nontrivial. We now
present several approaches to this problem and will explore
other approaches in future work. One simple approach is to
apply TP only to file transfer operations, where users
always request a file in its entirety. The second approach is
to let TP be fully aware of the application software being
accelerated and only prefetch data that are definitely
needed. In other words, TP can be designed to be
sufficiently intelligent so that it can recognize the specific
application implementations and avoid the unnecessary
data fetching. For example, CIFS protocol may have various
software implementations. Some software may support the
range-locking functions, but others may not. If TP is aware
of these differences, it can act correspondingly to ensure its
deterministic behaviors. But surely, the downside of this
approach is the associated design overhead required for
such intelligence. In practical deployment, the overhead can
be affordable only if the benefits gained are larger than the
cost of the overhead. An alternative approach is to relax the
strictness of deterministic prefetching by tolerating some
degree of opportunistic prefetching. The corresponding
solution is “constrained acceleration.” With constrained
acceleration, instead of prefetching the entire file, TP
prefetches a “chunk,” which is larger than a block. Thus,
even if some portion of the prefetched chunk is not used,
the cost is constrained. The chunk size is defined by
acceleration degree, the design of which requires further
work. In our proof-of-concept prototype, we adopted such
an approach with fixed value of acceleration degree.
7.3 Preliminary Complexity Analysis
One of the important issues when considering deployment
of a technique is the complexity. A3
can be deployed/
realized in multiple ways. For instance, it can be realized
in either user space or kernel space, and can be deployed
as either a full platform model or a point model (i.e.,
A3
). Different deployment or realization models are
associated with different degrees of complexity and
performance trade-off.
We now perform certain preliminary complexity analysis
in term of lines of codes, memory usage, and computation
overhead. Our prototype implements A3
framework in user
space and is deployed as a platform solution. 1) The
prototype is implemented with about 4.5 K lines of c codes.
Specifically, PF and TP each has about 1,000 lines of codes,
and other elements each has about 600 to 900 lines. 2) The
memory usage varies with different A3
elements. Specifi-
cally, TP uses more memory than other elements since it
needs to temporarily hold the returned data corresponding
to the accelerated requests, hence, the memory size is a
function of the acceleration degree and the receiver’s
consumption rate. IB also stores application data temporally
to compensate the receiver’s TCP buffer and the memory
size depends on the receiver buffer size and receiver’s
reading rate. AE needs to allocate a space to store the
coding table, so the memory usage is proportional to the
table size. RAR and PF use relatively less memory than
other three elements since they maintain little application
data and state. 3) In terms of the computation overhead, we
observe little change on the CPU usage when running the
prototype. Specifically, AE uses relatively more CPU since it
needs to perform data compression. For PF, the CPU usage
is higher at the moment of user scrolling up or down since
PF needs to reprioritize the objects.

ACKNOWLEDGMENTS
An earlier version of this paper appeared at ACM
MobiCom 2006 [29].
REFERENCES
[1] CIFS: A Common Internet File System, http://www.microsoft.com/
mind/1196/cifs.asp, 2009.
[2] Comscore Media Metrix Top 50 Online Property Ranking, http://
www.comscore.com/press/release.asp?press=547, 2009.
[3] Converged Access Wan Optimization, http://www.convergedaccess.
com/, 2008.
[4] Juniper Networks, http://www.juniper.net/, 2009.
[5] Linux Magazine, http://www.linux-magazine.com/issue/15/,
2009.
[6] Minimo, a Small, Simple, Powerful, Innovative Web Browser for Mobile
Devices, http://www.mozilla.org/projects/minimo/, 2009.
[7] Netfilter Project, http://www.netfilter.org/, 2009.
[8] Pocket Internet Explorer, http://www.microsoft.com/windows
mobile/, 2009.
[9] Rfc 3135: Performance Enhancing Proxies Intended to Mitigate Link-
Related Degradations, http://www.ietf.org/rfc/rfc3135.txt, 2009.
[10] Riverbed Technology, http://www.riverbed.com/, 2009.
[11] T. Armstrong, O. Trescases, C. Amza, and E. de Lara, “Efficient
and Transparent Dynamic Content Updates for Mobile Clients,”
Proc. Fourth Int’l Conf. Mobile Systems, Applications and Services
(MobiSys ’06), pp. 56-68, 2006.
[12] H. Balakrishnan and R. Katz, “Explicit Loss Notification and
Wireless Web Performance,” Proc. IEEE Conf. Global Comm.
(GLOBECOM ’98) Global Internet, Nov. 1998.
[13] S. Czerwinski and A. Joseph, “Using Simple Remote Evaluation to
Enable Efficient Application Protocols in Mobile Environments,”
Proc. First IEEE Int’l Symp. Network Computing and Applications,
2001.
[14] E. de Lara, D. Wallach, and W. Zwaenepoel, “Puppeteer:
Component-Based Adaptation for Mobile Computing (Poster
Session),” SIGOPS Operating Systems Rev., vol. 34, no. 2, p. 40,
2000.
[15] E. de Lara, D.S. Wallach, and W. Zwaenepoel, “Hats: Hierarchical
Adaptive Transmission Scheduling,” Proc. Multimedia Computing
and Networking Conf. (MMCN ’02), 2002.
[16] T. Henderson and R. Katz, “Transport Protocols for Internet-
Compatible Satellite Networks,” IEEE J. Selected Areas in Comm.
(JSAC ’99), vol. 17, no. 2, pp. 345-359, Feb. 1999.
[17] H.-Y. Hsieh, K.-H. Kim, Y. Zhu, and R. Sivakumar, “A Receiver-
Centric Transport Protocol for Mobile Hosts with Heterogeneous
Wireless Interfaces,” Proc. ACM MobiCom, pp. 1-15, 2003.
[18] IXIA, http://www.ixiacom.com/, 2009.
[19] A. Kuzmanovic and E.W. Knightly, “Tcp-lp: Low-Priority Service
via End-Point Congestion Control,” IEEE/ACM Trans. Networking,
vol. 14, no. 4, pp. 739-752, Aug. 2006.
[20] I. Mohomed, J.C. Cai, S. Chavoshi, and E. de Lara, “Context-
Aware Interactive Content Adaptation,” Proc. Fourth Int’l Conf.
Mobile Systems, Applications and Services (MobiSys ’06), pp. 42-55,
2006.
[21] B.D. Noble, M. Satyanarayanan, D. Narayanan, J.E. Tilton, J. Flinn,
and K.R. Walker, “Agile Application-Aware Adaptation for
Mobility,” Proc. 16th ACM Symp. Operating System Principles, 1997.
[22] S. Paul, E. Ayanoglu, T.F.L. Porta, K.-W.H. Chen, K.E. Sabnani,
and R.D. Gitlin, “An Asymmetric Protocol for Digital Cellular
Communications,” Proc. IEEE INFOCOM, vol. 3, p. 1053, 1995.
[23] C. Rapier and B. Bennett, “High Speed Bulk Data Transfer Using
the ssh Protocol,” Proc. 15th ACM Mardi Gras Conf. (MG ’08), pp. 1-
7, 2008.
[24] P. Rodriguez, S. Mukherjee, and S. Rangarajan, “Session Level
Techniques for Improving Web Browsing Performance on Wire-
less Links,” Proc. 13th Int’l Conf. World Wide Web (WWW ’04),
pp. 121-130, 2004.
[25] M. Satyanarayanan, J.J. Kistler, P. Kumar, M.E. Okasaki, E.H.
Siegel, and D.C. Steere, “Coda: A Highly Available File System for
a Distributed Workstation Environment,” IEEE Trans. Computers,
vol. 39, no. 4, pp. 447-459, Apr. 1990.
[26] P. Sinha, N. Venkitaraman, R. Sivakumar, and V. Bharghavan,
“Wtcp: A Reliable Transport Protocol for Wireless Wide Area
Networks,” Proc. ACM MobiCom, pp. 231-241, 1999.
[27] The Network Simulator—ns-2, http://www.isi.edu/nsnam/ns,
2009.
[28] M. Walfish, H. Balakrishnan, D. Karger, and S. Shenker, “Dos:
Fighting Fire with Fire,” Proc. Fourth ACM Workshop Hot Topics in
Networks (HotNets ’05), 2005.
[29] Z. Zhuang, T.-Y. Chang, R. Sivakumar, and A. Velayutham, “A3
:
Application-Aware Acceleration for Wireless Data Networks,”
Proc. ACM MobiCom, pp. 194-205, 2006.
Zhenyun Zhuang received the BE degree in
information engineering from the Beijing Uni-
versity of Posts and Telecommunications and
the MS degree in computer science from
Tsinghua University, China. He is currently
working toward the PhD degree in the College
of Computing at the Georgia Institute of Tech-
nology. His research interests are wireless
networking, distributed systems, Web-based
systems, and application acceleration techni-
ques. He is a student member of the IEEE.
Tae-Young Chang received the BE degree in
electronic engineering in 1999, the MS degree in
telecommunication system technology in 2001
from Korea University, and the PhD degree from
the School of Electrical and Computer Engineer-
ing at Georgia Institute of Technology in 2008.
He works at Xiocom Wireless and his research
interests are in wireless networks and mobile
computing. He is a student member of the IEEE.
Raghupathy Sivakumar received the BE de-
gree in computer science from Anna University,
India, in 1996, and the MS and PhD degrees in
computer science from the University of Illinois
at Urbana-Champaign in 1998 and 2000, re-
spectively. Currently, he is an associate profes-
sor in the School of Electrical and Computer
Engineering at the Georgia Institute of Technol-
ogy. He leads the GNAN Research Group,
where he and his students do research in the
areas of wireless networking, mobile computing, and computer net-
works. He is a senior member of the IEEE.
Aravind Velayutham received the BE degree
in computer science and engineering from
Anna University, India, in 2002, and the MS
degree in electrical and computer engineering
from Georgia Institute of Technology in 2005.
Currently, he is the director of development at
Asankya, Inc. His research interests are wire-
less networks and mobile computing. He is a
member of the IEEE.
. For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.

Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype Implementation

More Related Content

What's hot

Viewers also liked

Similar to Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype Implementation

More from Zhenyun Zhuang

Recently uploaded

Application-Aware Acceleration for Wireless Data Networks: Design Elements and Prototype Implementation