Locality Characteristics
of Web Streams Revisited
Aniket Mahanti, Anirban Mahanti, and
Carey Williamson
University of Calgary, Canada
International Symposium on Performance Evaluation of Computer and
Telecommunication Systems, Philadelphia, 2005
Introduction
• Motivation
• ADF framework
• Objectives
Introduction
• Web caching proxies are an effective
means of reducing network traffic
• Web caches are widely deployed by ISPs
• Caches improve performance by
exploiting workload characteristics such
as locality of reference
• Workload characterisation of locality
structure can provide insight into the
design and performance of the Web
3
Motivation
• Locality characteristics can be used by
caching policies when making decisions
to evict or retain documents in the cache
• Most prior Web caching work focused on
analysing Web streams in isolation
• [Fonseca et al. 2005] proposed a
“system level” view called the ADF
framework for analysing transformations
of a Web reference stream
4
ADF Framework [Fonseca, 2005]
A: Aggregation
D: Disaggregation
D
D F: Filtering
F
F
A
A
Client Proxy Cache End Server
Reference: R. Fonseca et al. (2005), Locality in a Web of Streams, In: Communications of the ACM, 48(1):82—88. 5
Research Objectives
• Study locality properties in Web request
streams using the ADF framework:
What impact do locality characteristics
have on caching performance?
What are the locality characteristics of
Web request streams after the
aggregation of filtered streams?
6
Background
• Flow of requests
• Locality of reference
Flow of Requests
Clients Servers
Proxy
Caches
8
Image reproduced from: R. Fonseca et al. (2003), Locality in a Web of Streams, Technical Report, Department of Computer Science, Boston University.
Locality of Reference
• Popularity: An object is simply more
popular than other objects
…….XABXXCXDXXXEFXX…….
• Temporal locality: References to an
object occur in a correlated manner
…….AAHIJAAAUOLYPJKAA…….
9
Metrics Used
Performance Metrics
• Document hit ratio: Percentage of total
requests satisfied by Web proxy cache
• Byte hit ratio: Percentage of total byte volume
of data satisfied by Web proxy cache
• Cumulative reference measure: Fraction of
total requests accounted for by the top 10%
of the most popular documents
• Inter-reference measure: Probability of
referencing document again within M
intervening requests (e.g., M=1000)
11
Filtering Model
• Model description
• Simulation results
Filtering Model
Web
Servers
…….
Filtered stream (misses)
Filtering Web Proxy
Server
Input stream
…….
Web
Clients
13
Workload and System Parameters
• WebTraff: synthetic Web proxy workloads
• Two traces differing only in temporal locality
– Trace1 (weak) and Trace 2 (strong)
• Trace characteristics:
1.5 million requests
495,000 unique documents
14 GB total bytes of Web content
• Cache replacement policies:
LRU, LFU-Aging, GDS, RAND, FIFO
• Cache size:
1 MB – 16 GB 14
Caching Performance (1 of 3)
Trace1: Weak temporal locality
Document Hit Ratio 15
Caching Performance (2 of 3)
Trace2: Strong temporal locality
Document Hit Ratio
16
Caching Performance (3 of 3)
Trace1: Weak temporal locality Trace2: Strong temporal locality
Document Hit Ratio
17
Popularity Characteristics
Trace1: Weak temporal locality Trace2: Strong temporal locality
Cumulative Reference Measure
for Filtered Request Stream
(after the cache) 18
Temporal Locality Characteristics
Trace1: Weak temporal locality Trace2: Strong temporal locality
Inter-reference Measure
for Filtered Request Stream
(after the cache) 19
Aggregation Model
• Model description
• Simulation results
Aggregation Model
Web
Servers
…….
Aggregated Parent Web
filtered stream Proxy
Server
Filtered stream Child Web
Filtering ……. Proxy
Server
Input stream
Web
……. Clients …….
21
System Model and Parameters
• Two-level hierarchal web proxy configuration
• Aggregated streams from:
N = 1, 2, 4, 8 child proxies
• Caching policy:
LRU at child proxies
• Cache size:
1 MB – 256 MB
• Degree of overlap:
No overlap, partial overlap
22
Popularity Characteristics
No Overlap Partial Overlap
Cumulative Reference Measure
(Strong temporal locality)
23
Temporal Locality Characteristics
No Overlap Partial Overlap
Inter-reference Measure
(Strong temporal locality)
24
No Overlap: Temporal Locality
• Temporal locality decreases with increasing N
• Phenomenon consistent over various cache
sizes and degree of temporal locality
• Design of the no overlap scenario
N=2
Child Proxy 1: 1A1,1U1 ,1U2,…….,1U50,1A1
Child Proxy 2: 2A1,2U1 ,2U2 ,…….,2U50,2A1
Aggregated filtered stream: 1A1,2A1,1U1,2U1,………, 1U50,2U50,1A1,2A1
• New stream has twice as many documents
,
between 1A1 and 2A1
25
Partial Overlap: Temporal Locality
• Temporal locality increases with increasing N
• Due to 50% overlap among all traces
N=4
Child Proxy 1: A, B,1U1 ,…….,1U50, A, B
Child Proxy 2: A, B,2U1 ,…….,2U50, A, B
Child Proxy 3: A, B,3U1 ,…….,3U50, A, B
Child Proxy 4: A, B,4U1 ,…….,4U50, A, B
Aggregated filtered stream: A, A, A, A, B, B, B, B,1U1 ,2U1 ,3U1 ,4U1 ,
…….,1U50,2U50,3U50 ,4U50, A, A, A, A, B, B, B, B
• References
, of A from other proxies clustered
26
Conclusions
Conclusions
• Caching policies should exploit temporal
correlation and popularity of documents
• LRU and FIFO exploit temporal locality
• GDS insensitive to changes in temporal locality
• Structural change in temporal locality for
aggregated streams depends on the degree
of overlap in the workloads
• These results imply limited advantages of
using caching hierarchies
28