0% found this document useful (0 votes)
25 views

Graph Stream Algorithms - A Survey

Uploaded by

liushaoyusz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Graph Stream Algorithms - A Survey

Uploaded by

liushaoyusz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Graph Stream Algorithms: A Survey ∗

Andrew McGregor†
University of Massachusetts
[email protected]

ABSTRACT constraints capture various challenges that arise when


Over the last decade, there has been considerable in- processing massive data sets, e.g., monitoring network
terest in designing algorithms for processing massive traffic in real time or ensuring I/O efficiency when pro-
graphs in the data stream model. The original moti- cessing data that does not fit in main memory. Related
vation was two-fold: a) in many applications, the dy- questions that arise include how to trade-off size and ac-
namic graphs that arise are too large to be stored in the curacy when constructing data summaries and how to
main memory of a single machine and b) considering quickly update these summaries. Techniques that have
graph problems yields new insights into the complexity been developed to the reduce the space use have also
of stream computation. However, the techniques devel- been useful in reducing communication in distributed
oped in this area are now finding applications in other systems. The model also has deep connections with a
areas including data structures for dynamic graphs, ap- variety of areas in theoretical computer science includ-
proximation algorithms, and distributed and parallel com- ing communication complexity, metric embeddings, com-
putation. We survey the state-of-the-art results; iden- pressed sensing, and approximation algorithms.
tify general techniques; and highlight some simple al- The data stream model has become increasingly pop-
gorithms that illustrate basic ideas. ular over the last twenty years although the focus of
much of the early work was on processing numerical
data such as estimating quantiles, heavy hitters, or the
1. INTRODUCTION number of distinct elements in the stream. The earli-
Massive graphs arise in any application where there est work to explicitly consider graph problems was the
is data about both basic entities and the relationships influential by paper by Henzinger et al. [36] which con-
between these entities, e.g., web-pages and hyperlinks; sidered problems related to following paths in directed
neurons and synapses; papers and citations; IP addresses graphs and connectivity. Most of the work on graph
and network flows; people and their friendships. Graphs streams has occurred in the last decade and focuses on
have also become the de facto standard for representing the semi-streaming model [27, 52]. In this model the
many types of highly-structured data. However, analyz- data stream algorithm is permitted O(n polylog n) space
ing these graphs via classical algorithms can be chal- where n is the number of nodes in the graph. This is
lenging given the sheer size of the graphs. For exam- because most problems are provably intractable if the
ple, both the web graph and models of the human brain available space is sub-linear in n, whereas many prob-
would use around 1010 nodes and IPv6 supports 2128 lems become feasible once there is memory roughly pro-
possible addresses. portional to the number of nodes in the graph.
One approach to handling such graphs is to process In this document we will survey the results known
them in the data stream model where the input is de- for processing graph streams. In doing so there are nu-
fined by a stream of data. For example, the stream could merous goals including identifying the state-of-the-art
consist of the edges of the graph. Algorithms in this results for a variety of popular problems and identifying
model must process the input stream in the order it ar- general algorithmic techniques. It will also be natural
rives while using only a limited amount memory. These to discuss some important summary data structures for

graphs, such as spanners and sparsifiers. Throughout,
Database Principles Column. Column editor: Pablo Bar- we will present various simple algorithms that illustrate
celó. Department of Computer Science, University of Chile,
Santiago, Chile. E-mail: [email protected] basic ideas and would be suitable for teaching in an un-

Support. The author is supported in part by NSF awards dergraduate or graduate classroom setting.
CCF-0953754, IIS-1251110, and CCF-1320719 and a Google
Research Award. Notation. Throughout this document we will use n and
Insert-Only Insert-Delete Sliding Window (width w)
Connectivity Deterministic [27] Randomized [5] Deterministic [22]
Bipartiteness Deterministic [27] Randomized [5] Deterministic [22]
Cut Sparsifier Deterministic [2, 8] Randomized [6, 31] Randomized [22]
Randomized Randomized
Spectral Sparsifier Deterministic [8, 46]
Õ(n5/3 ) space [7] Õ(n5/3 ) space [22]
Only multiple pass √
(2t − 1)-Spanners O(n1+1/t ) space [11, 23] O( wn(1+1/t) ) space [22]
results known [6]
(1 + )-approx. [5]
Min. Spanning Tree Exact [27] (1 + )-approx. [22]
Exact in O(log n) passes [5]
2-approx. [27] Only multiple pass
Unweighted Matching (3 + )-approx. [22]
1.58 lower bound [42] results known [3, 4]
Only multiple pass
Weighted Matching 4.911-approx. [25] 9.027-approx. [22]
results known [3, 4]

Table 1: Single-Pass, Semi-Streaming Results: Algorithms use O(n polylog n) space unless noted otherwise.
Results for approximating the frequency of subgraphs discussed in Section 2.3.

m to denote the number of nodes and edges in the graph 2.1 Connectivity, Trees, and Spanners
under consideration. For any natural number k, we use One of early motivations for considering the semi-
[k] to denote the set {1, 2, . . . , k}. We write a = b ± c streaming model is that Θ̃(n) space is necessary and suf-
to denote b − c ≤ a ≤ b + c. Many of the algorithms ficient to determine whether a graph is connected. The
are randomized and we refer to events occurring with sufficiency follows from the following simple algorithm
high probability if the probability of the event is at least that constructs a spanning forest: we maintain a set of
1−1/ poly(n). We use Õ(·) to indicate that logarithmic edges H and add the next edge in the stream {u, v} to
factors have been omitted. H if there is currently no path from u to v in H.
2. INSERT-ONLY STREAMS Spanners. A simple extension of the above algorithm
In this section, we consider streams consisting of a also allows us to approximate the distance between any
sequence of unordered pairs e = {u, v} where u, v ∈ two nodes by constructing a spanner.
[n]. Such a stream,
D EFINITION 1 (S PANNER ). Given a graph G, we
S = he1 , e2 , . . . , em i say that a subgraph H is an α-spanner for G if for all
u, v ∈ V ,
naturally defines an undirected graph G = (V, E) where
V = [n] and E = {e1 , . . . , em }. See Figure 1. For dG (u, v) ≤ dH (u, v) ≤ α · dG (u, v) .
simplicity, we will assume that all stream elements are
distinct and therefore the resulting graph is not a multi- where dG (·, ·) and dH (·, ·) are lengths of the shortest
graph1 . We will also consider weighted graphs where paths in G and H respectively.
now each element of the stream, (e, w(e)), defines both
While the connectivity algorithm only added an edge
an edge of the graph and its weight.
if it did not complete a cycle, the algorithm for con-
structing a spanner will add an edge if it does not com-
1 3 4 plete a short cycle.
2

Algorithm 1: Spanner
Figure 1: The graph on four nodes defined by the 1 H ← ∅;
stream S = h{1, 2}, {2, 3}, {1, 3}, {3, 4}i. It will be 2 for each {u, v} ∈ S do
used to illustrate various definitions in later sections. 3 If dH (u, v) > α then H ← H ∪ {{u, v}};
4 return H
1
Although many of the algorithms discussed immediately ex-
tend to the multigraph setting. Other problems such as esti-
mating the number of triangles or distinct paths of length two The fact that the resulting graph is an α-spanner fol-
require new ideas when edges have multiplicity [21, 38]. lows because for each edge (u, v) ∈ G \ H, there must
have already been a path of length at most α in H. Hence, LH ∈ Rn×n where
for any path in G of length d, including a shortest path, (P
there is a corresponding path of length at most αd in H. {i,k}∈E w(i, k) if i = j
LH (i, j) =
The algorithm needs to store at most O(n1+1/t ) edges −w(i, j) otherwise
when α = 2t − 1 for integral t. This follows because
and w(i, j) is the weight of the edge between nodes i
the shortest cycle in H has length 2t + 1 and any such
and j. If there is no such edge, let w(i, j) = 0.
graph has at most O(n1+1/t ) edges [15]. A naive im-
plementation of the above algorithm would be slow and D EFINITION 4 (S PECTRAL S PARSIFICATION ). We
more recent work has focused on developing faster al- say that a weighted subgraph H is a (1 + ) spectral
gorithms [11, 23]. Other work [24] has considered con- sparsification of a graph G if,
structing (α, β)-spanners where H is required to satisfy
xT LH x = (1 ± )xT LG x , ∀x ∈ Rn , (2)
dG (u, v) ≤ dH (u, v) ≤ α · dG (u, v) + β .
where LG and LH are the Laplacians of H and G.
Minimum Spanning Tree. Another generalization of Note that if we replace ∀x ∈ Rn in Equation 2 by
the basic connectivity algorithm is to maintain a min- ∀x ∈ {0, 1}n then we recover Equation 1. Hence, given
imum spanning tree (or spanning forest if the graph is a spectral sparsification of G, we can approximate the
not connected). weight of all cuts in G. We can also approximate other
“spectral properties” of G including the eigenvalues (via
Algorithm 2: Minimum Spanning Tree the Courant-Fischer Theorem), the effective resistances
1 H ← ∅; in the analogous electrical network, and various prop-
2 for each {u, v} ∈ S do erties of random walks. Obviously, any graph G has a
3 H ← H ∪ {(u, v)}; spectral sparsifier since G is a spectral sparsifier of it-
4 If H includes a cycle, remove the largest weight self. What is surprising is that there exists a (1 + )
edge in the cycle from H. spectral sparsifier with at most O(−2 n) edges [12].
5 return H A Simple “Merge and Reduce” Approach. Not only
do small spectral sparsifiers exist but they can also be
constructed in the semi-streaming model [2, 46]. In this
By using the appropriate data structures, the above al-
section, we present a simple algorithm that demonstrates
gorithm can be implemented such that each update takes
the useful “merge and reduce” framework that has been
O(log n) time [60].
useful for other data stream problems [8].
2.2 Graph Sparsification The following algorithm uses, as a black box, any ex-
isting algorithm that returns a (1 + γ) spectral sparsi-
We next consider constructing graph sparsifiers in the fier. Let A be such an algorithm and let size(γ) be an
data stream model. Rather than just determining whether upper bound on the number of edges in the resulting
a graph is connected, these sparsifiers will allow us to sparsifier. As mentioned above, we may assume that
estimate a richer set of connectivity properties such as size(γ) = O(γ −2 n). We will also use the following
the size of all cuts in the graph. We will be interested in easily verifiable properties of a spectral sparsifier:
different types of sparsifier. First, Benczür and Karger [14]
introduced the notion of cut sparsification. • Mergeable: Suppose H1 and H2 are α spectral
sparsifiers of two graphs G1 and G2 on the same
D EFINITION 2 (C UT S PARSIFICATION ). We say that set of nodes. Then H1 ∪ H2 is an α spectral spar-
a weighted subgraph H is a (1 + ) cut sparsification of sifier of G1 ∪ G2 .
a graph G if
• Composable: If H3 is an α spectral sparsifier for
λA (H) = (1 ± )λA (G) , ∀A ⊂ V , (1) H2 and H2 is a β spectral sparsifier for H1 then
H3 is an αβ spectral sparsifier for H1 .
where λA (G) and λA (H) is the weight of the cut (A, V \
A) in G and H respectively. The algorithm is based on a hierarchical partitioning
of the stream. First we partition the input stream of
Spielman and Teng [59] introduced the more general edges into t = m/ size(γ) segments of length size(γ).
notion of spectral sparsification based on approximating For simplicity assume that t is a power of two. Let G0i
the Laplacian of a graph. be the graph corresponding to the i-th segment of edges.
For i ∈ {1, . . . , log2 t} and j ∈ {1, . . . , t/2i }, define
D EFINITION 3 (L APLACIAN ). The Laplacian of an
undirected weighted graph H = (V, E, w), is a matrix Gji = G2i−1
j−1
∪ Gj−1
2i .
For example, if t = 4, we have: presented an algorithm based on the following relation-
ship between T3 and the frequency moments of x, i.e.,
G11 = G01 ∪ G02 , G12 = G03 ∪ G04 ,
Fk = T xkT .
P

G21 = G11 ∪ G12 = G01 ∪ G02 ∪ G03 ∪ G04 = G . L EMMA 2.1. T3 = F0 − 1.5F1 + 0.5F2 .
For each Gji , we define a weighted subgraph Hij using Let T̃3 be the estimate of T3 that results by combin-
the sparsification algorithm A as follows: ing (1 + γ)-approximations of the relevant frequency
j−1
Hi0 = G0i and Hij = A(H2i−1 j−1
∪ H2i ) for j > 0 . moments with the above lemma. Then,
|T̃3 − T3 | < γ (F0 + 1.5F1 + 0.5F2 ) ≤ 8γmn
It follows from the mergeable and composeable proper-
log t
ties that H1 2 is an (1 + γ)log2 t sparsifier of G. If we where the last inequality follows since
set γ = /(2 log2 t) then this is a (1 + ) sparsifier. Fur- max(F0 , F2 /9) ≤ F1 = m(n − 2) .
log t
thermore, it is possible to compute H1 2 while only
storing at most It is possible to (1 + γ)-approximate each of these fre-
quency moments in Õ(γ −2 ) space and so, by setting
2 size(γ) log2 t = O(−2 n log3 n) γ = /(8mn), this implies a (1 + )-approximation al-
edges at any given time. This is because, as soon as we gorithm using Õ(−2 (mn/t)2 ) space.
have constructed Hij , we can forget H2i−1
j−1 j−1
and H2i . A more space-efficient approach proposed by Ahn et
j al. [6], is to use the `0 sampling technique [40]. An al-
Hence, at any given time we will only need to store Hi
gorithm for `0 sampling uses O(polylog n) space and
for at most two values of i for each j.
returns a random non-zero element from x. Let X ∈
2.3 Counting Subgraphs {1, 2, 3} be determined by picking a random non-zero
element of v and returning the associated value of this
Another problem that has received significant atten-
element. Let Y = 1 if X = 3 and Y = 0 otherwise.
tion is counting the number of triangles, T3 , in a graph.
Note that E [Y ] = T3 /F0 . By an application of the
This is closely related to the transitivity coefficient, the
Chernoff bound, the mean of Õ(−2 (mn/t)) indepen-
fraction of paths of length two that form a triangle, and
dent copies of Y equals (1 ± )T3 /F0 with high proba-
the clustering coefficient, i.e.,
bility. Multiplying this by an approximation of F0 yields
1 X T3 (v) a good estimate of T3 . Note that an earlier algorithm
n v deg(v)
 using similar space was presented by Buriol et al. [18]
2
but the above algorithm has the advantage that it is also
where T3 (v) is the number of triangles that include the applicable in the setting (discussed in a later section)
node v. Both statistics play an important role in the where edges can be inserted and deleted.
analysis of social networks. Unfortunately, it can be
shown that determining whether a graph is triangle-free Extensions and Other Approaches. Pavan et al. [54]
requires Ω(n2 ) space even with a constant number of developed the approach of Buriol et al. such that the de-
passes and more generally, Ω(m/T3 ) space is required pendence on n in the space used became a dependence
for any constant approximation [17]. Hence, research on the maximum degree in the graph, and a tighter anal-
has focused on designing algorithms whose space will ysis is possible. Pagh and Tsourakakis [53] presented
depend on a given lower bound t ≤ T3 . an algorithm based on randomly coloring the nodes and
counting the number of monochromatic triangles exactly.
Vector-Based Approach. A number of approaches for Algorithms have also been developed for the multi-pass
estimating the number of triangles have been based on model including a two-pass algorithm using Õ(m/t1/3 )
reducing the problem to a problem about vectors. Con- space [17] and an O(log n)-pass semi-streaming algo-
sider a vector x indexed by subsets T of [n] of size three. rithm [13]. Kutzkov and Pagh [49] and Jha et al. [37]
Each T represents a triple of nodes and the entry corre- also designed algorithms for estimating the clustering
sponding to T is defined to be, and transitivity coefficients directly.
xT = |{e ∈ S : e ⊂ T }| . Extending an approach used by Jowhari and Ghodsi
[39], another line of work [39, 41, 50] makes clever use
For example, in the stream S corresponding to the of complex-valued hash functions for counting longer
graph in Figure 1, the entries of the vector are: cycles and other subgraphs. Lower bounds for find-
ing short cycles were proved by Feigenbaum et al. [28].
x{1,2,3} = 3, x{1,2,4} = 1, x{1,3,4} = 2, x{2,3,4} = 2 .
Other related work includes approximating the size of
Note that the number of triangles T3 in G equals the cliques [35], independent sets [34], and dense compo-
number entries of x that equal 3. Bar-Yossef et al. [10] nents [9].
2.4 Matchings of 1 + (n − 1) whereas the optimal solution has weight
A matching in a graph G = (V, E) is a subset of n−1
edges M ⊆ E such that no two edges in M share an 1+(n−1)+1+(n−3)+1+(n−5)+. . . > ,
2
endpoint. Well-studied problems including computing
and hence decreasing  makes the approximation factor
the matching of maximum cardinality or maximum total
arbitrarily large.
weight.
Roughly speaking, the problem with setting γ = 0 is
Greedy Single-Pass Algorithms. A simple algorithm that the weight of the “trail” of edges that are inserted
that returns a 2-approximation for the unweighted prob- into M but subsequently removed can be much larger
lem is the following greedy algorithm. than the weight of the final edges in M . By setting γ >
0, we ensure the weights in this trail are geometrically
increasing. Specifically, let Te = C1 ∪ C2 ∪ . . . where
Algorithm 3: Greedy Matching C1 is the set of edges removed when e was added to M
1 M ← ∅; and Ci+1 is the set of edges removed when an edge in
2 for each e ∈ S do Ci was added to M . Then, it is easy to show that for any
3 If M ∪ {e} is a matching, M ← M ∪ {e}; edge e, the total weight of edges in Te satisfies,
4 return M
w(Te ) ≤ w(e)/γ .
By a careful charging scheme [51], the weight of the
The fact that the algorithm returns a 2-approximation optimal solution can be bounded in terms of the weight
follows from the fact that for every edge {u, v} in a max- of the final edges and the trails:
imum cardinality matching, M must include an edge X
with at least one of u or v as an endpoint. At present w(OPT) ≤ (1 + γ) (w(Te ) + 2w(e)) .
this is the best approximation known for the problem! e∈M
The strongest known lower bound is e/(e − 1) ≈ 1.58 Substituting in the bound on w(Te ) and optimizing over
which also applies when edges are grouped by endpoint γ yields a 5.828-approximation. The analysis can be
[30,42]. Konrad et al. [47] considered a relaxation of the extended to sub-modular maximization problems [20].
problem where the edges arrive in a random-order and, The above algorithm is optimal among determinis-
in this setting, they designed an algorithm that achieved tic algorithms that only remember the edges of a valid
a 1.98-approximation in expectation. matching at any time [61]. However, after a sequence
The greedy algorithm can easily be generalized to the of results [25, 26, 62] it is now known how to achieve a
weighted case as follows [27, 51]. Rather than only 4.91-approximation.
adding an edge if there are no “conflicting” edges, we
also add the edge if its weight is at least some factor Multiple-Pass Algorithms. The above algorithm can
larger than the weight of the (at most two) conflicting be extended to a multiple-pass algorithm that achieves
edges and then remove these conflicting edges. a (2 + )-approximation for weighted matchings. We
simply set γ = O() and take O(−3 ) passes over the
data where, at the start of a pass, M is initiated to the
Algorithm 4: Greedy Weighted Matching
matching returned at the end of the previous pass.
1 M ← ∅; Guruswami and Onak showed that finding the size
2 for each e ∈ S do of the maximum cardinality matching exactly given p
3 Let C = {e0 ∈ M : e0 ∩ e 6= ∅} ; passes requires n1+Ω(1/p) /pO(1) space. No exact al-
w(e)
4 If w(C) ≥ (1 + γ) then M ← M ∪ {e} \ C; gorithm is known with these parameters but it is pos-
sible to find an arbitrarily good approximation. Ahn and
5 return M
Guha [3, 4] showed that a (1 + )-approximation is pos-
sible using O(n1+1/p ) space and O(p/) passes. They
It would be reasonable to ask why we shouldn’t add also show a similar result for weighted matching if the
e if w(e) ≥ w(C), i.e., set γ = 0. However, consider graph is bipartite. Their results are based on adapting
what would happen if the stream consisted of edges linear programming techniques and a careful analysis of
the intrinsic adaptivity of primal-dual algorithms. In the
{1, 2}, {2, 3}, {3, 4}, . . . , {n − 1, n} node arrival setting where edges are grouped by end-
point, Kapralov
√ presented an algorithm that achieved a
arriving in that order where the weight of edge {i, i + 1/(1 − 1/ 2πp + o(1/p))-approximation ratio given p
1} is 1 + i for some small value  > 0. The above passes. This is achieved by a fractional load balancing
algorithm would return the last edge with a total weight approach.

2.5 Random Walks 4. Perform the remaining O( t) steps of the walk
A random walk in an unweighted graph from a node using the trivial algorithm.
u ∈ V is a random sequence of nodes v0 , v1 , v2 , . . . √
where v0 = u and vi is a random node from the set Analysis. First note that |U | is never larger than t be-
Γ(vi−1 ), the neighbors of vi−1 . For any fixed positive cause√|U | is only incremented when ` increases by at
integer t, we can consider the distribution of vt ∈ V . least t and we know that ` ≤ t. The total space re-
Call this distribution µt (u). √ to store the vertices T is Õ(n). When we sam-
quired
In this section, we present a semi-streaming algorithm ple t√edges incident on each node in U , this requires
by Das Sarma et al. [56] that returns a sample from Õ(|U | t) = Õ(t) space. Hence, the total space is
µt (u). Note that it is trivial to sample from µt (u) with Õ(n + t). For the number of passes, note that when we
t passes; in the i-th pass we randomly select vi from the √ a pass to sample edges incident on U , we
need to take
neighbors of the node vi−1 determined in the previous make O( t) hops of progress because either we reach √a
pass. Das Sarma et al. show√ that it is possible to reduce node with an unused short walk or the walk √ uses Ω( t)
the number of passes to O( t). They also present al- samples edges. Hence, including the O( t) passes used
gorithms that use less space at the expense of increasing at the start and
√ end of the algorithm, the total number of
the number of passes. passes is O( t).
Algorithm. As noted above, it is trivial to perform length
t walks in t passes. The main idea of the algorithm to
3. GRAPH SKETCHES
build up a length√t walk by “stitching” together short In this section, we consider dynamic graph streams
walks of length t. Each where edges can be both added and removed. The input
√ of these short walks can be
constructed in parallel in t passes and O(n log n) space. is a sequence
However, we will need to be careful to ensure that all the
S = ha1 , a2 , . . .i where ai = (ei , ∆i )
steps of the final walk are independent. Specifically, the
algorithm starts as follows: where ei encodes an undirected edge as before and ∆i ∈
{−1,
P 1}. The multiplicity of an edge e is defined as fe =
1. Let T (v) be a node sampled from µ√t (v).
i:ei =e ∆i . For simplicity, we restrict our attention to
the case where fe ∈ {0, 1} for all edges e.
2. Let v = T k (u) = T (. . . T (T (u)) . . .) where k is
maximal values such that the nodes in Linear Sketches. An important type of data stream al-
2
U = {u, T (u), T (u), . . . , T k−1
(u)} gorithms are linear sketches. Such algorithms maintain
√ a random linear projection, or “sketch”, of the input. We
are all distinct and k ≤ t. want to be able to a) infer relevant properties of the in-
put from the sketch and b) maintain the sketch in small
The reason that we insist that the nodes in U are disjoint space. The second property follows from the linearity
is because otherwise the next steps of the random walk of the sketch if the dimensionality of the projection is
will not be independent of the previous steps. So far√ we small. Specifically, suppose
have generated a sample v from µ` (u) where ` = k t.
n
We then enter the following loop: f ∈ {0, 1}( 2 )

3. While ` ≤ t is the vector with entries equalling the current values of
√ fe and let A(f ) ∈ Rd be the sketch of this vector where
(a) If v 6∈ U , let v ← T (v), ` ← t + `, U ←
we call d the dimensionality of the sketch. Then, when
U ∪ {v}
√ (e, ∆) arrives we can simple update A(f ) as follows:
(b) Otherwise, sample t edges with replacement
incident on each node in U . Find the maximal A(f ) ← A(f ) + ∆ · A(ie )
path from v such that on the i-th visit to node where ie is the vector whose only non-zero entry is a “1”
x, we take the i-th edge that was sampled for in the position corresponding to e. Hence, it suffices to
node x. The path terminates either√ when a store the current sketch and any random bits needed to
node in U is visited more than t times or compute the projection. The main challenge is therefore
we reach a node that is not in U . Reset v to to design low dimensional sketches.
be the final node of this path and increase `
by the length of the path. If we complete the Homomorphic Sketches. Many of the graph sketches
length t random walk during this process we that have been designed so far are built up from sketches
may terminate at this point and return the cur- of the rows of the adjacency matrix for the graph G.
rent node. Specifically, let f v ∈ {0, 1}n−1 be the vector f restricted
to coordinates that involve node v. Then, the sketches A useful feature of existing work [40] on `0 sam-
are formed by concatenating sketches of each f v , i.e., pling is that it can be performed via linear projec-
tions, i.e., for any string r there exists Mr ∈ Rk×d
A(f ) = A1 (f v1 ) ◦ A2 (f v2 ) ◦ . . . ◦ An (f vn ) .
such that the sample can be reconstructed from
Note that the random projections for different Ai need Mr x. For the process to be successful with con-
not be independent but that these sketches can still be stant probability k = O(log2 n) suffices. Conse-
updated as before. quently, given Mr x and Mr y we have enough in-
The algorithms discussed in subsequent sections all formation to determine a random sample from the
fit the following template. First, we consider a basic al- set {i : xi + yi 6= 0} since
gorithm for the graph problem in question. Second, we
Mr (x + y) = Mr x + Mr y .
design sketches Ai such that it is possible to emulate
the basic algorithm given only the projections Ai (f vi ).
For example, for the graph in Figure 1, we have
The challenge is to ensure that the sketches are homo-
morphic with respect to the operations of the basic al- a1 = ( 1 1 0 0 0 0 )
gorithm, i.e., for each operation on the original graph, a2 = ( −1 0 0 1 0 0 )
there is a corresponding operation on the sketches. a3 = ( 0 −1 0 −1 0 1 )
a4 = ( 0 0 0 0 0 −1 )
3.1 Connectivity
where the entries correspond to the sets {1, 2}, {1, 3},
We start with a simple algorithm for finding a span-
{1, 4}, {2, 3}, {2, 4}, {3, 4} in that order. Note that the
ning forest of a graph and then show how to emulate this
non-zero entries of
algorithm via sketches.
a1 + a2 = ( 0 1 0 1 0 0 )
Basic Non-Sketch Algorithm. The algorithm is based
on the following simple O(log n) stage process. In the correspond to {1, 3} and {2, 3} which are exactly the
first stage, we find an arbitrary incident edge for each edges across the cut ({1, 2}, {3, 4}).
node. We then collapse each of the resulting connected The resulting algorithm for connectivity is relatively
components into a “supernode”. In each subsequent stage, simple but makes use of linearity in an essential way:
we find an edge from every supernode to another su-
pernode (if one exists) and collapse the connected com- 1. In a single pass, compute the sketches: Choose
ponents into new supernodes. It is not hard to argue that t = O(log n) random strings r1 , . . . , rt and con-
this process terminates after O(log n) stages and that the struct the `0 -sampling projections Mrj ai for i ∈
set of edges used to connect supernodes in the differ- [n], j ∈ [t]. Then,
ent stages include a spanning forest of the graph. From Ai (f vi ) = Mr1 ai ◦ Mr2 ai . . . ◦ Mrt ai .
  
this we can obviously deduce whether the graph is con-
nected. 2. In post-processing, emulate the original algorithm:
Emulation via Sketches. There are two main steps to (a) Let V̂ = V be the initial set of “supernodes”.
constructing the sketches for the connectivity algorithm: (b) For iP= 1, . . . , t: for each supernode S ∈ V̂ ,
use i∈S Mrj ai = Mrj ( i∈S ai ) to sam-
P
1. An Appropriate Graph Representation. For each
n
node vi ∈ V , define a vector ai ∈ {−1, 0, 1}( 2 ) : ple an edge between S and another supern-
 ode. Collapse the connected supernodes to
1
 if i = j < k and {vj , vk } ∈ E form a new set of supernodes.
ai{j,k} = −1 if j < k = i and {vj , vk } ∈ E
 Since each sketch Ai has dimension O(polylog n) and
0 otherwise

there are n such sketches to be computed, the final con-
These vectors then have the useful property that nectivity algorithm uses O(n · polylog n) space.
for any subset of nodes {vi }i∈S , the non-zero en-
Extensions and Further Work. Note that the above
tries of i∈S ai correspond exactly to the edges
P
algorithm has O(polylog n) update time but a connec-
across the cut (S, V \ S).
tivity query may take Ω(n) time. This was addressed in
2. `0 -Sampling via Linear Sketches: As mentioned subsequent work by Kapron et al. [44].
earlier, the goal of `0 -sampling is to take a non- An easy corollary of the above the result is that it is
zero vector x ∈ Rd and return a sample j where also possible to test whether a graph is bipartite. This
(
1
follows by running the connectivity algorithm on the
if xj 6= 0 both G and the bipartite double cover of G. The bi-
Pr[sample equals j] = |F0 (x)| .
r 0 if xj = 0 partite double cover of a graph is formed by making two
copies u1 , u2 of every node u of G and adding edges discuss the offline algorithms for sparsification in more
{u1 , v2 }, {u2 , v1 } for every edge {u, v} of G. It can detail.
be shown that G is bipartite iff the number of connected
components in the double cover is exactly twice the num- Sparsification via Sampling. The results in this section
ber of connected components in G. are based on the following generic sampling algorithm:

3.2 k-Connectivity Algorithm 5: Generic Sparsification Algorithm


We next present an extension to test k-connectivity, 1 Sample each edge e with probability pe ;
i.e, determining whether every cut in the graph includes 2 Weight each sampled edge e by 1/pe ;
at least k edges. This algorithm builds upon ideas in the
previous section and exploits the linearity of the sketches
to an even greater extent. It is obvious that the size of any cut is preserved in
expectation by the above process. However, if pe is suf-
Basic Non-Sketch Algorithm. The starting point for ficiently large it can be shown that a range of properties,
the algorithm is the following basic k phase algorithm: including the size of cuts are approximately preserved
with high probability. In particular, Karger [45] showed
1. For i =S1 to k: Let Fi be a spanning forest of that for some constant c1 , if
i−1
(V, E \ j=1 Fj )
pe ≥ q := min 1, c1 λ−1 −2 log n

2. Then (V, F1 ∪ F2 ∪ . . . ∪ Fk ) is k-edge-connected
iff G = (V, E) is at least k-edge-connected. where λ is the size of the minimum cut of the graph then
the resulting graph is a cut sparsifier with high probabil-
The correctness is easy to show by arguing that for any ity. Fung et al. [29] strengthened this to show that the
cut, every Fi contains an edge across this cut or sampling probability need only scale with λ−1 e where
λe is the size of the minimum cut that separates the end
F1 ∪ . . . ∪ Fi−1 points of e. Specifically, they showed that for some con-
already contains all the edges across the cut. Hence, if stant c2 , if
F1 ∪ F2 ∪ . . . ∪ Fk does not contain all the edges across pe ≥ min 1, c2 λ−1
 −2
log2 n
e 
the cut, it includes at least k of them. We call a set of
edges with this property a k-skeleton. then the resulting graph is a cut sparsifier with high prob-
ability. Spielman and Srivistava [58] showed2 that the
Emulation via Sketches. As with the connectivity al- resulting graph is a spectral sparsifier if
gorithm, we compute the entire set of sketches and then
pe ≥ min 1, c3 re −2 log n

emulate the algorithm on the compressed form. The im-
portant observation is that if we have computed a sketch for some constant c3 where re is the effective resistance
A(G), but subsequently need the sketch A(G − F ) for of e. The effective resistance of an edge {u, v} is the
some set of edges F we have discovered, then this can voltage difference that would need to be applied be-
be computed as A(G − F ) = A(G) − A(F ). tween u and v for 1 amp to flow between u and v in
1. In a single pass, compute the sketches: Let A1 (G), the electrical network formed by replacing each edge by
A2 (G), . . . , Ak (G) be k independent sketches for a 1 ohm resister. The effective resistance re is a more
finding a spanning forest. nuanced quantity than λe in the sense that λe only de-
pends on the number of edge-disjoint paths between the
2. In post-processing, emulate the original algorithm: endpoints of e whereas the lengths of these paths are
For i ∈ [k], construct a spanning forest Fi of (V, E\ also relevant when calculating the effective resistance
F1 ∪ . . . ∪ Fi−1 ) using re . However, the two quantities are related by the fol-
i−1
X lowing inequality [7],
i i
A (G−F1 −F2 . . .−Fi−1 ) = A (G)− Ai (Fj ) . λ−1 2/3 −1
e ≤ re = O(n )λe . (3)
j=1

Since computing each spanning forest sketch used O(n· Minimum Cut. As a warm-up, we show how to esti-
polylog n), the total space used by the algorithm for k- mate the minimum cut λ of a dynamic graph [6]. To do
connectivity is O(k · n · polylog n). this we use the algorithm for constructing k-skeletons
described in the previous section in conjunction with
3.3 Min-Cut and Sparsification Karger’s sampling result. In addition to computing a
In this section we revisit graph sparsification in the 2
Note that their result is actually proved for a slightly different
context of dynamic graphs. To do this we will need to sampling with replacement procedure.
skeleton on the entire graph, we also construct skeletons 4. SLIDING WINDOW
for subsampled graphs. Specifically, let Gi be the graph In this section, we consider processing graphs in the
formed from G by including each edge with probability sliding window model. In this model we consider an
1/2i and let infinite stream of edges he1 , e2 , . . .i but at time t we only
Hi = skeletonk (Gi ) , consider the graph whose edge set consists of the last w
edges,
be a k-skeleton of Gi where k = 3c1 −2 log n. Then,
for W = {et−w+1 , . . . , et } .
We call these the active edges and we will consider the
j = min{i : mincut(Hi ) < k} ,
case where w ≥ n. The results in this section were
we claim that proved by Crouch et al. [22]. Note that some of sampling-
based algorithms for counting small subgraphs are also
2j mincut(Hj ) = (1 ± )λ . (4)
applicable in this model.
For i ≤ blog2 1/qc, Karger’s result implies that all cuts
are approximately preserved and, in particular,
4.1 Connectivity
We first consider testing whether the graph is k-edge
2i · mincut(Hi ) = (1 ± ) mincut(Gi ) . connected for a given k ∈ {1, 2, 3, . . .}. Note that k = 1
However, for i = blog2 1/qc, corresponds to testing connectivity. To do this, it is suf-
ficient to maintain a set of edges F ⊆ {e1 , e2 , . . . , et }
E [mincut(Hi )] ≤ 2−i λ ≤ 2qλ ≤ 2c1 −2 log n along with the time-of-arrival toa(e) for each e ∈ F
such that for any cut, F contains the most recent k edges
and hence, by an application of the Chernoff bound,
across the cut (or all the edges across the cut if there are
we have that mincut(Hi ) < k with high probability.
less than k of them). Then, we can easily tell whether
Hence, j ≤ blog2 1/qc with high probability and Equa-
the graph of active edges is k-connected by checking
tion 4 follows.
whether F would be k-connected once we remove all
Sparsification. To construct a sparsifier, the basic idea edges e ∈ F where toa(e) ≤ t − w. This follows
is to sample edges with probability qe = min{1, t/λe } because if there are k or more edges among the last w
for some value of t. If t = Θ(−2 log2 n) then the edges across a cut, F will include the k most recent of
resulting graph is a combinatorial sparsifier by appeal- these edges.
ing to the aforementioned result of Fung et al. [29]. If The following simple algorithm maintains the set
t = Θ(−2 n2/3 log n) then the resulting graph can be F = F1 ∪ F2 ∪ . . . ∪ Fk
shown to be a spectral sparsifier by combining Equation
3 with the aforementioned sampling result of Spielman where the Fi are disjoint and each is acyclic. We add
and Srivistava [58]. In this section, we briefly outline the new edge e to F1 . If it completes a cycle, we remove
how to perform such sampling. We refer the reader to the oldest edge in this cycle and add that edge to F2 . If
Ahn et al. [6, 7] for details regarding independence is- we now have a cycle in F2 , we remove the oldest edge
sues and how to reweight the edges. in this cycle and add that edge to F3 . And so forth.
The challenge is that we do not know the values of Therefore, it is possible to test k-connectivity in the
λe ahead of time. To get around this we take a very sliding window model using O(kn log n) space. Fur-
similar approach to that used above for estimating the thermore, by reducing other problems to k-connectivity,
minimum cut. Specifically, let Gi be defined as above as discussed in the previous sections, this also implies
and let Hi = skeleton3t (Gi ). For simplicity, we assume the existence of algorithms for testing bipartiteness and
λe ≥ t. We claim that constructing sparsifiers.

Pr [e ∈ H0 ∪ H1 ∪ . . . ∪ H2 log n ] ≥ t/λe . 4.2 Matchings


This follows because the above probability is at least We next consider the problem of finding large match-
Pr [e ∈ Hj ] for j = blog λe /tc. But the expected size ings in the sliding-window model. We will focus on the
of the minimum cut separating e = {u, v} in Hj is at unweighted case and describe a (3 + )-approximation.
most 2t and appealing to the Chernoff bound, it has size It is also possible to get a 9.027-approximation for the
at most 3t with high probability. Hence, weighted case by combing this algorithm with a ran-
domized rounding technique by Epstein et al. [25].
Pr [e ∈ Hj ] ≈ Pr [e ∈ Gj ]
Algorithm. The approach for estimating the size of the
since Hj was a 3t-skeleton. The claim follows since maximum cardinality matching is based on the “smooth
Pr [e ∈ Gj ] ≥ t/λe . histograms” technique of Braverman and Ostrovsky [16].
The algorithm maintains maximal matchings over vari- Therefore, either W = B1 and m̂(B1 ) is a 2-approximation
ous “buckets” B1 , . . . , Bk where each bucket comprises for m(W ), or we have
of the edges in some suffix of the stream that have ar-
m(B1 )
rived so far. The buckets will always satisfy m(B1 ) ≥ m(W ) ≥ m(B2 ) ≥ m̂(B2 ) ≥
3+
B1 ⊇ W ) B2 ) · · · ) Bk (5)
and thus m̂(B2 ) is a (3 + )-approximation of m(W ).
where W is the set of active edges. Equation 5 implies The fact that the algorithm does not use too much
that space follows from the way that the algorithm deletes
buckets. Specifically, we ensure that for all i ≤ k −2 we
m(B1 ) ≥ m(W ) ≥ m(B2 ) ≥ . . . ≥ m(Bk ) ,
have m̂(Bi+2 ) < (1 − β)m̂(Bi ). Since the maximum
where m(·) denotes the size of the maximum matching matching has size at most n, this ensures that the num-
on a sequence of edges. ber of buckets is O(−1 log n). Hence, the total num-
Within each bucket B, we construct a greedy match- ber of bits used to maintain all k greedy matchings is
ing M̂ (B) whose size we denote by m̂(B). There is po- O(−1 n log2 n).
tentially a bucket for each of the w suffixes and keeping
a matching for each suffix would use too much space. 5. CONCLUSIONS AND DIRECTIONS
To reduce the space usage, whenever two non-adjacent
There is now a large body of work on the design and
buckets have greedy matchings whose matching size is
analysis of algorithms for processing graphs in the data
within a factor of 1 − β where β = /4, we will delete
stream model. Problems that have received considerable
the intermediate buckets. Specifically, when a new edge
attention include estimating connectivity properties, ap-
e arrives, we update the buckets and matchings as fol-
proximating graph distances, finding approximate match-
lows:
ing, and counting the frequency of sub-graphs. The re-
Algorithm 6: Procedure for Updating Buckets sulting algorithms combine existing data stream tech-
niques with ideas from approximation algorithms and
1 Create a new empty bucket Bk+1 ; graph theory. By both identifying the state-of-the-art
2 Add e to each M̂ (Bi ) if possible; results and illustrating some of the techniques behind
3 for i = 1, . . . , k − 2 do these results, it is hoped that this survey will be useful
4 Find the largest j > i such that to both researchers that may want to use existing algo-
m̂(Bj ) ≥ (1 − β)m̂(Bi ) rithms and to those that want to develop new algorithms
for different problems.
Discard intermediate buckets and renumber; There are numerous possible directions for future re-
5 If B2 = W , discard B1 . Renumber the buckets; search. Naturally, it would be interesting to improve
existing results. For example, does there exist a semi-
streaming algorithm for constructing a spectral sparsi-
Analysis. We will prove the invariant that for any i < k, fier when there are both edge insertions and deletions?
What is the optimal approximation ratio for estimating
m̂(Bi+1 ) ≥ m(Bi )/(3 + ) the size and weight of maximum matchings? Other spe-
or |Bi | = |Bi+1 | + 1 or both. If |Bi | =
6 |Bi+1 | + 1, then cific questions can be found at the wiki,
we must have deleted some bucket B such that Bi ( sublinear.info .
B ( Bi+1 . For this to have happened it must have been
the case that m̂(Bi+1 ) ≥ (1 − β)m̂(Bi ) at the time. More general, open-ended research directions include:
The next lemma shows that the optimal matching on the
1. Directed Graphs. Relatively little is known about
sequence of edges starting with Bi is not significantly
processing directed graphs and yet many natural
larger than the greedy matching we find if we start with
graphs are directed. For example, it is known that
only start with Bi+1 .
any semi-streaming algorithm testing s-t connec-
L EMMA 4.1. For any sequence of edges C, tivity requires Ω(log n) passes [33] but is this num-


 ber of passes sufficient? If we could estimate the
m(Bi C) ≤ 3 + m̂(Bi+1 C) , size of flows in directed graphs, this could lead to
1−β
better algorithms for approximating the size of bi-
where Bi C is the concatenation of Bi and C. partite matchings.
And hence we currently satisfy: 2. Communication Complexity. The recent results on
 
2β graph sketching imply surprisingly efficient com-
m(Bi ) ≤ 3 + m̂(Bi+1 ) ≤ (3 + )m̂(Bi+1 ) .
1−β munication protocols; if the rows of an adjacency
matrix are partitioned between n players, then the 6.[1] K.REFERENCES
J. Ahn. Analyzing massive graphs in the semi-streaming
connectivity properties of the graph can be inferred model. PhD thesis, University of Pennsylvania, Philadelphia,
from a O(polylog n) bit message from each player. Pennsylvania, Jan. 2013.
In contrast, if the partition of the entries is arbi- [2] K. J. Ahn and S. Guha. Graph sparsification in the
semi-streaming model. In International Colloquium on
trary, the players needs to send Ω̃(n) on average Automata, Languages and Programming, pages 328–338, 2009.
[55]. What other graph problems can be solved us- [3] K. J. Ahn and S. Guha. Access to data and number of iterations:
ing only short messages? What if each player also Dual primal algorithms for maximum matching under resource
constraints. CoRR, abs/1307.4359, 2013.
knows the neighbors of the neighbors of a node?
[4] K. J. Ahn and S. Guha. Linear programming in the
From a different perspective, establishing reduc- semi-streaming model with application to the maximum
tions from communication complexity problems is matching problem. Inf. Comput., 222:59–79, 2013.
a popular approach for proving lower bounds in [5] K. J. Ahn, S. Guha, and A. McGregor. Analyzing graph
structure via linear measurements. In ACM-SIAM Symposium
the data stream model. But less is known about on Discrete Algorithms, pages 459–467, 2012.
graph stream lower bounds because it is often harder [6] K. J. Ahn, S. Guha, and A. McGregor. Graph sketches:
to decompose graph problems into multiple sim- sparsification, spanners, and subgraphs. In ACM Symposium on
Principles of Database Systems, pages 5–14, 2012.
pler “independent” problems and use existing com- [7] K. J. Ahn, S. Guha, and A. McGregor. Spectral sparsification of
munication complexity techniques. dynamic graph streams. In International Workshop on
Approximation Algorithms for Combinatorial Optimization
3. Stream Ordering. The analysis of stream algo- Problems, 2013.
rithms has traditionally been “doubly worst case” [8] M. Badoiu, A. Sidiropoulos, and V. Vaikuntanathan. Computing
in the sense that the contents of the stream and s-t min-cuts in a semi-streaming model. Manuscript.
[9] B. Bahmani, R. Kumar, and S. Vassilvitskii. Densest subgraph
the ordering of the stream are both chosen adver- in streaming and mapreduce. PVLDB, 5(5):454–465, 2012.
sarially. If we relax the second assumption and [10] Z. Bar-Yossef, R. Kumar, and D. Sivakumar. Reductions in
assume that the stream is ordered randomly (or streaming algorithms, with an application to counting triangles
in graphs. In ACM-SIAM Symposium on Discrete Algorithms,
that the stream is stochastically generated), can pages 623–632, 2002.
we design algorithms that take advantage of this? [11] S. Baswana. Streaming algorithm for graph spanners - single
Some recent work is already considering this di- pass and constant processing time per edge. Inf. Process. Lett.,
rection [19, 43, 47]. Alternatively, it may be in- 106(3):110–114, 2008.
[12] J. D. Batson, D. A. Spielman, and N. Srivastava.
teresting to further explore the complexity of vari- Twice-ramanujan sparsifiers. SIAM J. Comput.,
ous graph problems under specific edge orderings, 41(6):1704–1721, 2012.
e.g., sorted-by-weight, grouped-by-endpoint, or or- [13] L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. Efficient
algorithms for large-scale local triangle counting. TKDD, 4(3),
derings tailored to the problem at hand [57]. 2010.
[14] A. A. Benczúr and D. R. Karger. Approximating s-t minimum
4. More or Less Space. Research focusing on the cuts in Õ(n2 ) time. In ACM Symposium on Theory of
semi-streaming model has been very fruitful and Computing, pages 47–55, 1996.
many interesting techniques have been developed [15] B. Bollobás. Extremal Graph Theory. Academic Press, New
York, 1978.
that have had applications beyond stream compu-
[16] V. Braverman and R. Ostrovsky. Smooth histograms for sliding
tation. However, the model itself is not suited to windows. In IEEE Symposium on Foundations of Computer
process sparse graphs where m = Õ(n). While Science, pages 283–293, 2007.
many basic problems require Ω(n) space, this does [17] V. Braverman, R. Ostrovsky, and D. Vilenchik. How hard is
counting triangles in the streaming model? In International
not preclude smaller-space algorithms if we may Colloquium on Automata, Languages and Programming, pages
make assumptions about the input or only need to 244–254, 2013.
“property test” the input [32], i.e., we just need [18] L. S. Buriol, G. Frahling, S. Leonardi,
A. Marchetti-Spaccamela, and C. Sohler. Counting triangles in
to distinguish graphs with a given property from data streams. In ACM Symposium on Principles of Database
graphs that are “far” from having the property. Do Systems, pages 253–262, 2006.
such algorithms exist? Alternatively, what if we [19] A. Chakrabarti, G. Cormode, and A. McGregor. Robust lower
are permitted more than Õ(n polylog n) space? Var- bounds for communication and stream computation. In ACM
Symposium on Theory of Computing, pages 641–650, 2008.
ious lower bounds, such as those for approximate [20] A. Chakrabarti and S. Kale. Submodular maximization meets
unweighted matching, are very sensitive to the ex- streaming: Matchings, matroids, and more. CoRR,
act amount of space available and nothing is known arXiv:1309.2038, 2013.
[21] G. Cormode and S. Muthukrishnan. Space efficient mining of
if we may use O(n1.1 ) space for example. multigraph streams. In ACM Symposium on Principles of
Database Systems, pages 271–282, 2005.
Acknowledgements. Thanks to Graham Cormode, Sagar [22] M. S. Crouch, A. McGregor, and D. Stubbs. Dynamic graphs in
the sliding-window model. In European Symposium on
Kale, Hoa Vu, and an anonymous reviewer for numer- Algorithms, pages 337–348, 2013.
ous helpful comments. [23] M. Elkin. Streaming and fully dynamic centralized algorithms
for constructing and maintaining sparse spanners. ACM
Transactions on Algorithms, 7(2):20, 2011.
[24] M. Elkin and J. Zhang. Efficient algorithms for constructing 598–609, 2012.
(1 + , β)-spanners in the distributed and streaming models. [42] M. Kapralov. Better bounds for matchings in the streaming
Distributed Computing, 18(5):375–385, 2006. model. In ACM-SIAM Symposium on Discrete Algorithms,
[25] L. Epstein, A. Levin, J. Mestre, and D. Segev. Improved pages 1679–1697, 2013.
approximation guarantees for weighted matching in the [43] M. Kapralov, S. Khanna, and M. Sudan. Approximating
semi-streaming model. SIAM J. Discrete Math., matching size from random streams. In ACM-SIAM Symposium
25(3):1251–1265, 2011. on Discrete Algorithms, 2014.
[26] L. Epstein, A. Levin, D. Segev, and O. Weimann. Improved [44] B. M. Kapron, V. King, and B. Mountjoy. Dynamic graph
bounds for online preemptive matching. In STACS, pages connectivity in polylogarithmic worst case time. In ACM-SIAM
389–399, 2013. Symposium on Discrete Algorithms, pages 1131–1142, 2013.
[27] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. [45] D. R. Karger. Random sampling in cut, flow, and network
On graph problems in a semi-streaming model. Theoretical design problems. In ACM Symposium on Theory of Computing,
Computer Science, 348(2-3):207–216, 2005. pages 648–657, 1994.
[28] J. Feigenbaum, S. Kannan, A. McGregor, S. Suri, and J. Zhang. [46] J. A. Kelner and A. Levin. Spectral sparsification in the
Graph distances in the data-stream model. SIAM Journal on semi-streaming setting. Theory Comput. Syst., 53(2):243–262,
Computing, 38(5):1709–1727, 2008. 2013.
[29] W. S. Fung, R. Hariharan, N. J. A. Harvey, and D. Panigrahi. A [47] C. Konrad, F. Magniez, and C. Mathieu. Maximum matching in
general framework for graph sparsification. In ACM Symposium semi-streaming with few passes. In APPROX-RANDOM, pages
on Theory of Computing, pages 71–80, 2011. 231–242, 2012.
[30] A. Goel, M. Kapralov, and S. Khanna. On the communication [48] C. Konrad and A. Rosén. Approximating semi-matchings in
and streaming complexity of maximum bipartite matching. In streaming and in two-party communication. In International
ACM-SIAM Symposium on Discrete Algorithms, pages Colloquium on Automata, Languages and Programming, pages
468–485, 2012. 637–649, 2013.
[31] A. Goel, M. Kapralov, and I. Post. Single pass sparsification in [49] K. Kutzkov and R. Pagh. On the streaming complexity of
the streaming model with edge deletions. CoRR, abs/1203.4900, computing local clustering coefficients. In WSDM, pages
2012. 677–686, 2013.
[32] O. Goldreich. Introduction to testing graph properties. In [50] M. Manjunath, K. Mehlhorn, K. Panagiotou, and H. Sun.
O. Goldreich, editor, Studies in Complexity and Cryptography, Approximate counting of cycles in streams. In European
volume 6650 of Lecture Notes in Computer Science, pages Symposium on Algorithms, pages 677–688, 2011.
470–506. Springer, 2011. [51] A. McGregor. Finding graph matchings in data streams. In
[33] V. Guruswami and K. Onak. Superlinear lower bounds for APPROX-RANDOM, pages 170–181, 2005.
multipass graph processing. In IEEE Conference on [52] S. Muthukrishnan. Data Streams: Algorithms and Applications.
Computational Complexity, pages 287–298, 2013. Now Publishers, 2006.
[34] B. V. Halldórsson, M. M. Halldórsson, E. Losievskaja, and [53] R. Pagh and C. E. Tsourakakis. Colorful triangle counting and a
M. Szegedy. Streaming algorithms for independent sets. In mapreduce implementation. Inf. Process. Lett., 112(7):277–281,
International Colloquium on Automata, Languages and 2012.
Programming, pages 641–652, 2010. [54] A. Pavan, K. Tangwongsan, S. Tirthapura, and K.-L. Wu.
[35] M. M. Halldórsson, X. Sun, M. Szegedy, and C. Wang. Counting and sampling triangles from a graph stream. In
Streaming and communication complexity of clique International Conference on Very Large Data Bases, 2013.
approximation. In International Colloquium on Automata, [55] J. M. Phillips, E. Verbin, and Q. Zhang. Lower bounds for
Languages and Programming, pages 449–460, 2012. number-in-hand multiparty communication complexity, made
[36] M. R. Henzinger, P. Raghavan, and S. Rajagopalan. Computing easy. In ACM-SIAM Symposium on Discrete Algorithms, pages
on data streams. External memory algorithms, pages 107–118, 486–501, 2012.
1999. [56] A. D. Sarma, S. Gollapudi, and R. Panigrahy. Estimating
[37] M. Jha, C. Seshadhri, and A. Pinar. A space efficient streaming pagerank on graph streams. J. ACM, 58(3):13, 2011.
algorithm for triangle counting using the birthday paradox. In [57] A. D. Sarma, R. J. Lipton, and D. Nanongkai. Best-order
KDD, pages 589–597, 2013. streaming model. Theor. Comput. Sci., 412(23):2544–2555,
[38] M. Jha, C. Seshadhri, and A. Pinar. When a graph is not so 2011.
simple: Counting triangles in multigraph streams. CoRR, [58] D. A. Spielman and N. Srivastava. Graph sparsification by
arXiv:1310.7665, 2013. effective resistances. SIAM J. Comput., 40(6):1913–1926, 2011.
[39] H. Jowhari and M. Ghodsi. New streaming algorithms for [59] D. A. Spielman and S.-H. Teng. Spectral sparsification of
counting triangles in graphs. In COCOON, pages 710–716, graphs. SIAM J. Comput., 40(4):981–1025, 2011.
2005. [60] R. Tarjan. Data Structures and Network Algorithms. SIAM,
[40] H. Jowhari, M. Saglam, and G. Tardos. Tight bounds for lp Philadelphia, 1983.
samplers, finding duplicates in streams, and related problems. [61] A. B. Varadaraja. Buyback problem - approximate matroid
In ACM Symposium on Principles of Database Systems, pages intersection with cancellation costs. In International
49–58, 2011. Colloquium on Automata, Languages and Programming, pages
[41] D. M. Kane, K. Mehlhorn, T. Sauerwald, and H. Sun. Counting 379–390, 2011.
arbitrary subgraphs in data streams. In International [62] M. Zelke. Weighted matching in the semi-streaming model.
Colloquium on Automata, Languages and Programming, pages Algorithmica, 62(1-2):1–20, 2012.

You might also like