A -Approximation of Maximum Matching
in Sublinear Time
Abstract
We study the problem of estimating the size of a maximum matching in sublinear time. The problem has been studied extensively in the literature and various algorithms and lower bounds are known for it. Our result is a -approximation algorithm with a running time of .
All previous algorithms either provide only a marginal improvement (e.g., ) over the -approximation that arises from estimating a maximal matching, or have a running time that is nearly . Our approach is also arguably much simpler than other algorithms beating -approximation.
1 Introduction
Given a graph , a matching is a set of edges with no common endpoints, and the maximum matching problem asks for finding a largest such subset. Matching is a fundamental combinatorial optimization problem, and a benchmark for new algorithmic techniques in all major computational models. It also has a wide range of applications such as ad allocation, social network recommendations, and information retrieval, among others. Given that many of these applications need to handle large volumes of data, the study of sublinear time algorithms for estimating the maximum matching size has received considerable attention over the past two decades.111It is impossible to find the edges of any constant-approximate matching in time sublinear with respect to the size of the input [PR07]. A sublinear time algorithm is not allowed to read the entire graph, which would take time where ; instead it is provided oracle access to the input graph and must run in time. There are two main oracles for graph problems considered in the literature, which we also consider in this work.
-
β’
Adjacency list oracle. Here, the algorithm can query , where and , and the oracle reports the -th neighbor of the vertex in its adjacency list, or NULL if is larger than the number of βs neighbors.
-
β’
Adjacency matrix oracle. Here, the algorithm can query , where , and the oracle reports whether there exists an edge between and .
Earlier results on estimating the size of the maximum matching in sublinear time mostly focused on graphs with bounded degree , starting with the pioneering work of Parnas and Ron [PR07], and later works of [NO08, YYI09, RTVX11, ARVX12, LRY17, Gha22]. However, for these do not lead to sublinear time algorithms. Thus, later works focused on general graphs with arbitrary maximum degree and managed to obtain sublinear time algorithms for them [CKK20, KMNFT20, Beh21, BRRS23, BKS23b, BKS23a]. In particular, the state of the art results can be categorized into two regimes:
- β’
-
β’
Algorithms whose approximation factor is . The state of the art result in this category is the work of [BRRS23], whose running time is for an approximation factor of . However, the best approximation factor one can get using their trade-off is only 222More specifically, for , they get an algorithm with approximation factor of that runs in time . Indeed, they mention: We do not expect our techniques to lead to a better than, say, -approximation in time.
Thus, all known algorithms either give only a marginal improvement over the -approximation arising from estimating the size of a maximal matching (which can be done even in timeΒ [Beh21]), or have a running time that is nearly . Making a significant improvement on both fronts simultaneously has remained an elusive open question.
Our results.
In this work, we show how to beat the factor in time that is strongly sublinear. Specifically, we present an algorithm that runs in time and achieves an approximation factor of for estimating the size of the maximum matching.
It is worth noting that our algorithm is much simpler, both in terms of implementation and analysis, compared to the prior works that beat -approximationΒ [BRRS23].
Theorem 1.1.
There exists an algorithm that, given access to the adjacency list of a graph, estimates the size of the maximum matching with a multiplicative approximation factor of and runs in time with high probability.
Theorem 1.2.
There exists an algorithm that, given access to the adjacency matrix of a graph, estimates the size of the maximum matching with a multiplicative-additive approximation factor of and runs in time with high probability.
Moreover, our algorithm can be employed as a subroutine for Theorem 2 of [Beh23] to obtain an improved approximation ratio in the dynamic setting. More precisely, that result requires a subroutine for estimating the maximum matching size in time, for which it uses the -approximation ofΒ [Beh21]. Our algorithm can be used instead, resulting in a very slight improvement to the overall approximation guarantee forΒ [Beh23].
We note that the framework ofΒ [BKS23a] can also be used to obtain a similar result. Their algorithm, which performs a single iteration to find a constant fraction of augmenting paths of length three on top of a maximal matching, likewise yields a better-than-2 approximate matching with running time. However, the trade-off in this approach is worse in terms of both the approximation ratio and the running time.
Related work.
On the lower bound front, Parnas and Ron [PR07] demonstrated that any algorithm getting a constant approximation of the maximum matching size needs at least time. More recently, the work of [BRR23b] established that any algorithm providing a -multiplicative-additive approximation requires at least time. For sparse graphs, a lower bound of was shown for any additive approximation [BRR23a]. For dense graphs, [BRR24] showed a lower bound of for the runtime of algorithms achieving such additive approximations.
Paper organization.
In SectionΒ 2, we provide an overview of the challenges encountered while designing our algorithm and the techniques used to address them. We first develop an algorithm for bipartite graphs with a multiplicative-additive error in SectionΒ 4, avoiding additional challenges that arise from general graphs, trying to obtain multiplicative error (in the adjacency list model), or working with the adjacency matrix. In SectionΒ 5, we extend our algorithm to handle general graphs. In SectionΒ 6, we demonstrate how to achieve a multiplicative approximation guarantee. Finally, in SectionΒ 7, we present a simple reduction showing that our algorithm also works in the adjacency matrix model with a multiplicative-additive error.
2 Technical Overview
In this section, we provide an overview of the techniques used in this paper to design our algorithm. We begin with the two-pass semi-streaming algorithm of Konrad and NaiduΒ [KN21] for bipartite graphs. In the first pass, the algorithm constructs a maximal matching . In the second pass, it constructs a maximal -matching between vertices matched in and those unmatched in . More specifically, each vertex in has a capacity of , while each vertex in has a capacity of , where and is a large constant. The idea is that if is far from maximum, the -matching will contain many length-3 augmenting paths that can be used to augment . This algorithm obtains a -approximation.
Our goal is to develop a sublinear-time algorithm by translating this semi-streaming two-pass algorithm to the sublinear time model. When trying to do so, several challenges arise. In this section we describe them step by step, and show how to overcome them.
Challenge (1): constructing a maximal matching in sublinear time is not possible.
In fact, finding all edges of any constant-factor approximation of the maximum matching is impossible in sublinear time due to [PR07]. Dynamic algorithms for maximum matching [Beh23, BKSW23] use the following approach: they maintain a maximal matching and then apply the sublinear-time random greedy maximal matching (RGMM) algorithm of BehnezhadΒ [Beh21] to estimate the size of the maximal -matching. In our setting, we cannot afford to explicitly construct . However, we can obtain oracle access to using the sublinear-time RGMM algorithm of [Beh21]. More specifically, we can query whether a vertex is matched in or not in time. Therefore, a possible solution to address the first challenge is to design two nested oracles: the outer oracle attempts to build a maximal -matching, whereas the inner oracle checks the status of vertices (matched or not in ) to correctly filter edges and assign capacities to each vertex.
Challenge (2): two nested oracles require time.
The algorithm of [Beh21] runs in time, where denotes the average degree of the graph . Additionally, for the outer oracle, it requires time (i.e., queries to the inner oracle). Unfortunately, it is possible for both and to be as large as . For example, consider a graph with a vertex set , where . The edges within form a complete bipartite graph, while there is an -regular graph between and . After running the RGMM algorithm, most edges in the maximal matching belong to , and most vertices in are unmatched. Consequently, we have .
To address this issue, we sparsify the original graph while manually constructing a matchingΒ . In a preprocessing step, starting from an empty , for each unmatched vertex in the graph, we sample neighbors uniformly at random. If an unmatched neighbor exists, we match the two vertices and add this edge to . Using this preprocessing step, we show that after spending time, the induced subgraph of vertices that remain unmatched in has a maximum degree of with high probability. Moreover, since we explicitly materialize , we are able to check if any vertex is matched in in time, eliminating the need for costly oracle calls. Note that need not be maximal in ; therefore we next extend it to a maximal matching.
Let be a maximal matching in obtained by running the sublinear time RGMM algorithm of [Beh21]. Now is a maximal matching for . Inspired by the two-pass semi-streaming algorithm of [KN21], we attempt to augment the maximal matching in two possible ways (see also FigureΒ 1):
-
1.
Augment using a -matching between and . The algorithm then outputs the size of the augmented matching, plus the size of the previously constructed matching .
-
2.
Augment using a -matching between and . The algorithm then outputs the size of the augmented matching.
The key intuition here (think of the case when yields only a -approximation) is that either is sufficiently large, making larger than the approximation guarantee, or itself is large enough so that meets our approximation guarantee (note that is the approximation guarantee of [KN21]). Augmenting using a -matching is easier since we have explicit access to and only need to run a single RGMM oracle to estimate the size of the -matching. Our first estimate, which requires finding a -matching between and , is more challenging since we do not have explicit access to . To avoid the running time of the two nested oracles, we make crucial use of the aforementioned property that the subgraph of vertices unmatched in has low induced degree (at most ); this is the reason why we only try to augment rather than . We will discuss this in the next paragraphs.
Challenge (3): the algorithm does not have access to the adjacency list of .
After the sparsification step, the average degree of is at most . Hence, if the algorithm had access to the adjacency list of , it could run the nested oracles in time by executing two RGMM algorithms: inner oracle for computing and outer oracle for the -matching to augment . But, since the nested oracles may visit up to vertices, and retrieving the full adjacency list of a vertex in requires time, it seems that the overall running time of the algorithm could still be as high as .
Here, we leverage two key properties of the RGMM algorithm to refine the runtime analysis. The first property is that at each step, the algorithm requires only a random neighbor of the currently visited vertex. Intuitively, if a vertex has degree in , in expectation it takes samples from the adjacency list of the original graph to encounter a vertex from . Thus, if all vertices in had degree , one could easily argue that the running time of the algorithm is . However, vertex degrees can vary, and for a vertex with a constant degree, we would need samples from the adjacency list of to find a single neighbor in . To address this challenge, we utilize another property of the RGMM algorithm, recently proven by [MRTV25]. Informally, this result shows that during oracle calls for RGMM, a vertex is visited proportionally to its degree, implying that low-degree vertices are visited only a small number of times.
Challenge (4): outer oracle creates biased inner oracle queries.
The final main challenge we discuss here is that the simple bound, which we informally proved in the previous paragraph, relies on the tacit assumption that the inner oracle queries generated by the outer oracle correspond to uniformly random calls to the inner oracle. Indeed, the running time of the algorithm of [Beh21] is analyzed for a uniformly random query vertex; however, there may exist a vertex in the graph for which calling the inner oracle takes significantly more than time. Consequently, if all outer oracle calls end up querying , the running time could be significantly worse than . To overcome this issue, we use the result of [MRTV25] along with the fact that the maximum degree of is . We show that for any vertex , the outer oracle queries the inner oracle for at most times in expectation. This enables us to formally prove that the total running time of the algorithm remains at most .
General graphs and the adjacency matrix model.
There are additional challenges when dealing with general graphs as opposed to bipartite graphs, such as the fact that the sizes of the maximal matching and the -matching alone are insufficient to achieve a good approximation ratio. For general graphs, our algorithm estimates the maximum matching in the union of the maximal matching and the -matching, which requires using the -approximate local computation algorithm (LCA) by [LRY17] on the subgraph formed by this union, to which we only have oracle access. We encourage readers to refer to SectionΒ 5 for more details about the techniques used there.
Additionally, for more information on the extension of the algorithm that operates in the adjacency matrix model, we recommend readers to check SectionΒ 7.
3 Preliminaries
Given a graph , we use to denote its set of vertices and use to refer to its set of edges. For a vertex , we use to denote the degree of the vertex, i.e., the number of edges with one endpoint equal to . We use to denote the maximum degree over all vertices in the graph, and to denote the average degree of the graph. Further, we use to denote the size of the maximum matching in .
Given a graph , and a subset of vertices , is defined to be the induced subgraph consisting of all edges with both endpoints in . Further, given disjoint subsets of vertices, is defined to be the bipartite subgraph of consisting of all edges between and .
Given a matching in , an augmenting path is a simple path starting and ending at different vertices, such that the first and the last vertices are unmatched in , and the edges of the path alternate between not belonging to and belonging to .
Given a vector of integer capacities of dimension , a -matching in is a multi-set of edges in such that each vertex appears no more than times as an endpoint of an edge in .
Given a graph and a permutation over its edges, is used to refer to the unique matching obtained by the following process. We let be initialized as empty, and consider the edges of one by one according to the permutation . We add an edge to the matching if none of its endpoints are already matched in . A random greedy maximal matching, i.e., is the matching obtained by picking a permutation uniformly at random and outputting .
Proposition 3.1 ([Beh21]).
There exists an algorithm that, given adjacency list access to a graphΒ of average degree , for a random vertex and a random permutation , determines if is matched in in expected time.
Given the problem of maximizing a function defined over a domain , with optimal value , an -multiplicative-additive approximation of is a solution such that .
4 Algorithm for Bipartite Graphs
We begin by describing our algorithm for bipartite graphs. We focus on implementing an algorithm with a multiplicative-additive approximation guarantee. Also, we assume that we have access to the adjacency list of the graph. These assumptions will help us avoid certain complications and challenges that arise when working with general graphs, the adjacency matrix model, or when trying to obtain a multiplicative approximation guarantee. To lift these assumptions, we can leverage strong tools and methods from the literature, which, with slight modifications, can be applied here. This section contains the main novelties of our approach and proofs. Our algorithm for bipartite graphs can be seen as a translation and implementation of a two-pass streaming algorithm, which we discuss in the next subsection.
4.1 Two-Pass Streaming Algorithm for Bipartite Graphs
Our starting point is the two-pass streaming algorithm which is described in AlgorithmΒ 1. This algorithm, or its variations, has appeared in previous works on designing streaming or dynamic algorithms for maximum matching [KN21, BKSW23, Beh23]. In words, the first pass of the algorithm only finds a maximal matching . In the second pass, the algorithm finds a maximal -matching in , where is the set of vertices matched by . The capacities of vertices in and in for the -matching are and , respectively. Moreover, in the second pass of the algorithm, when an edge arrives in the stream, we add multiple copies of the edge to the subgraph , as long as doing so does not violate the capacity constraints.
Intuitively, the algorithm tries to find length-3 augmenting paths using the -matching that it finds in the second pass. The following lemma shows the approximation guarantee of AlgorithmΒ 1.
Lemma 4.1 (Lemma 3.3 inΒ [BKSW23]).
For any , the output of AlgorithmΒ 1 is a -approximation for maximum matching of .
4.2 Sublinear Time Implementation of AlgorithmΒ 1
In this section, we demonstrate how to implement a modification of AlgorithmΒ 1 in the sublinear time model, as outlined in SectionΒ 2.
Sparsification:
In order to be able to use two levels of recursive oracle calls, we need to sparsify the graph. We first sample edges and construct a maximal matching on the sampled edges to sparsify the induced subgraph of unmatched vertices. This sparsification step is formalized in AlgorithmΒ 2. In LemmaΒ 4.3, we show that if we sample enough edges, then vertices that remain unmatched after this phase have an induced degree of . This step is very similar to the algorithm ofΒ [CKK20, Appendix A] for approximating a maximal matching.
Claim 4.2.
AlgorithmΒ 2 runs in time.
Proof.
For each vertex in the graph, the algorithm samples vertices from the adjacency list of the vertex if it is not matched by the time the algorithm processes the vertex in Line 3. Thus, the total running time is at most . β
Lemma 4.3.
With high probability, we have .
Proof.
We will show that for every , the probability that and the degree of in is larger than is at most . The lemma then follows by a union bound over all .
Consider the moment before is processed. Assume that at this time, still and the degree of in is larger than . Then, each of the samples has a probability of at least to be one of the unmatched neighbors, in which case will become matched. Thus the probability that remains unmatched after it is processed is at most
β
Augmenting using nested oracles:
Now we are ready to present our sublinear algorithm. After sparsifying the graph by finding a partially maximal matching , we try to augment in two different ways that we have outlined in SectionΒ 2 and which are formalized in AlgorithmΒ 3. See also FigureΒ 1.
For simplicity, we pretend that throughout the paper. Since is an arbitrarily large constant, using instead of leads to an arbitrarily small error in the calculations. We also note that a maximal -matching can be viewed as maximal matching if we duplicate vertices multiple times.
First, we try to augment the matching by designing a maximal matching oracle for vertices and then another oracle for finding a -matching between the vertices newly matched using the new oracle and unmatched vertices. Let be the maximal matching of that can be obtained by the oracle. We try to augment it with a -matching .
However, it is also possible that is small compared to , which implies that in the previous case, the -matching does not help to find many augmenting paths, as the size of the maximal matching that we try to augment is too small. To account for this case, the algorithm also finds a -matching between and .
Note that because the algorithm finds the initial matching explicitly, checking whether a vertex belongs to or not can be done in time.
Implementation details of the algorithm:
There are some technical details in the implementation of the algorithm that are not included in the pseudocode:
-
β’
Access to the adjacency list of an induced subgraph: Both in AlgorithmΒ 4 and AlgorithmΒ 5, we run the algorithm of PropositionΒ 3.1 for some induced subgraph of (for example, line 5 of AlgorithmΒ 5). However, PropositionΒ 3.1 works with access to the adjacency list of the input graph. To address this issue, we leverage an important property of the algorithm in PropositionΒ 3.1, namely that it only needs to find a random neighbor of a given vertex at each step of its execution. Now, whenever the algorithm requires a random neighbor of vertex in a subgraph , it queries random neighbors in the original graph until it finds one that belongs to . This increases the running time of the algorithm, as it may take time to locate a valid neighbor in , which we will formally bound in our runtime analysis.
-
β’
Nested oracles in line 8 of AlgorithmΒ 4: Unlike , we do not explicitly construct the maximal matching in AlgorithmΒ 4. Moreover, the edges of the subgraph connect vertices matched by with those that remain unmatched in either or . Hence, to verify whether an edge belongs to , we need to determine whether its endpoints are matched or unmatched in by accessing the algorithm of PropositionΒ 3.1. This again increases the algorithmβs runtime, which we will also formally bound in our runtime analysis.
4.3 Analysis of the Approximation Ratio
The following lemma, an analogue of ObservationΒ 3.1 inΒ [BKSW23], substantiates the soundness of the estimates and produced in AlgorithmΒ 3.
Lemma 4.4.
Let , , and be as in the description of AlgorithmΒ 3. Then
-
β’
,
-
β’
.
Proof.
Since is bipartite, by integrality of the bipartite matching polytope, it is enough to exhibit a fractional matching of the appropriate value . For case 1, we set
Note that , and are pairwise disjoint, and has no edge to . Therefore it is easy to verify that the degree constraints for are satisfied. For case 2, we similarly set
β
The following lemma states the -approximation guarantee of the βmaximal matching plus -matchingβ approach obtained in prior work, for both bipartite and general graphs. We will invoke it for appropriate subgraphs of to obtain our guarantee.
Lemma 4.5.
Let be a graph, be any maximal matching in , and be a maximal -matching in for vertex capacities for vertices in and for vertices in , where and . Then:
-
β’
for bipartite , we have ,
-
β’
for general , if is a random greedy maximal -matching, we still have .
Proof.
The first statement is the same as LemmaΒ 4.1, and shown as Lemma 3.3 inΒ [BKSW23]. The second statement is shown as Claim 5.5 in [ABR24]. β
The following lemma is the crux of our approximation ratio analysis.
Lemma 4.6.
In a bipartite graph , let , , and be as in the description of AlgorithmΒ 3. Then
Proof.
Fix a maximum matching in , and partition its edges as follows (see FigureΒ 2):
-
β’
are those with both endpoints in ,
-
β’
are those with one endpoint in and the other in ,
-
β’
are those with both endpoints in ,
-
β’
are those with one endpoint in and the other in ,
-
β’
are those with one endpoint in and the other in .
Since is maximal, cannot have edges with no endpoints in . We thus have
(1) |
Also, by maximality of and simple counting,
(2) |
We will use LemmaΒ 4.5 to analyze both cases in AlgorithmΒ 3. For case 1, we can use ; in this graph, is a maximal matching, and is a -matching as in the statement of LemmaΒ 4.5 (see FigureΒ 2), thus we have
since is a matching in . With 2 this gives
(3) |
For case 2, we can instead use ; in this graph, is a maximal matching, and is a -matching as in the statement of LemmaΒ 4.5 (see FigureΒ 2), thus
(4) |
since is a matching in . Using 3 andΒ 4, we can bound the left-hand side of this lemmaβs statement as
We bound the maximum by a weighted average with weights and for some to be determined ():
We want to hold for some (the approximation ratio) as large as possible. For to hold, we need to satisfy:
By solving for we get and . β
Lemma 4.7.
Let be the output of AlgorithmΒ 3. With high probability, it holds that
The proof of LemmaΒ 4.7 is routine.
Proof.
By LemmaΒ 4.6, we have
(5) |
Let , , and be as defined in AlgorithmΒ 4 and AlgorithmΒ 5. By the definition of , , and we have
Thus,
Then, using Chernoff bound, we obtain
Therefore, with a probability of ,
By repeating the same argument for and , we get
Plugging 5 in the bounds obtained above implies
On the other hand, we have
which completes the proof. β
4.4 Running Time Analysis
For the analysis of the running time, we use a crucial property of random greedy maximal matching algorithm that was proved recently in [MRTV25].
Proposition 4.8 (Lemma 5.14 of [MRTV25]).
Let be the expected number of times that the oracle queries an adjacent edge of if we start the oracle calls from a random vertex, for a random permutation over the edges of the graph when running random greedy maximal matching. It holds that .
Proposition 4.9 (Corollary 5.15 of [MRTV25]).
Let be the expected time needed to return a random neighbor of vertex . Then, the expected time to run the random greedy maximal matching oracle for a random vertex and a random permutation in graph is .
Lemma 4.10.
AlgorithmΒ 4 runs in time in expectation.
Proof.
Without loss of generality, we assume that ; otherwise, AlgorithmΒ 3 does not need to execute AlgorithmΒ 4, as the estimate from AlgorithmΒ 5 suffices.
The proof consists of two parts: first we show that line 6 of the algorithm can be implemented in expected time, and then we prove that line 8 of the algorithm can be implemented in expected time.
For the first part, let and let be a vertex in . In the adjacency list of in the original graph, which contains elements, only of them are neighbors in . Thus, to find a neighbor in , we need to randomly sample elements from βs adjacency list in expectation. Consequently, using PropositionΒ 4.9, the expected time required to execute the random greedy maximal matching oracle for a randomly chosen vertex and permutation in is
In the algorithm, we choose a permutation and run the random greedy maximal matching for randomly chosen vertices. By linearity of expectation, the expected running time of the first part is .
We prove the second part in a few steps. As a first step, for simplicity, assume that we have access to the adjacency list of and there is no need to run nested oracles. Then, using PropositionΒ 3.1, the expected running time of line 8 is bounded by for random permutations and random vertices.
Now, suppose that we have access to the adjacency list of instead of . By PropositionΒ 4.9, the total expected runtime of the outer (-matching) oracle is
where is the expected time needed to find a random neighbor of in . To find such a neighbor, we sample neighbors of in and check whether is matched in using the inner oracle. The expected number of these checks until a vertex with is found is . The expected cost of each check is , where is the runtime of invoking the inner oracle to check if is matched in . Putting this together, the expected total runtime is
where is the expected time needed to find a random neighbor of in . Under our assumption of having access to the adjacency list of , and thus we can finish bounding with .
Now we lift the assumption of having access to the adjacency list of . This means that, similarly as in the first part, we need to sample vertices from the adjacency list of in expectation (checking if a vertex is matched in is done in time). Then the total bound becomes
Also increases by , which increases the total runtime only by
β
Lemma 4.11.
AlgorithmΒ 5 runs in time in expectation.
Proof.
It is enough to show that it takes time to run the algorithm of PropositionΒ 3.1 (the -matching oracle) for each sampled vertex and permutation, as we have . We repeat the same arguments as in the proof ofΒ LemmaΒ 4.10.
Let be a vertex in the graph . Note that in the adjacency list of in the original graph, which contains elements, only of the elements are neighbors in . Thus, we need to randomly sample elements of the adjacency list of to find a neighbor in . Recall that we can check whether a vertex is matched in in time. Therefore, using PropositionΒ 4.9, the expected time needed to run the random greedy maximal matching for a random vertex and permutation in is
β
Lemma 4.12.
AlgorithmΒ 3 runs in time with high probability.
Proof.
By ClaimΒ 4.2, the sparsification step takes time. Also, by LemmasΒ 4.10 andΒ 4.11, AlgorithmsΒ 4 andΒ 5 run in and expected time, respectively. To establish a high probability bound on the time complexity, we execute instances of the algorithm in parallel and halt as soon as the first one completes. By applying Markovβs inequality, we deduce that each individual instance terminates within time with constant probability. As a result, with high probability, at least one of these instances finishes within time. This concludes the proof. β
Now we are ready to prove the final theorem of this section.
Theorem 4.13.
There exists an algorithm that, given access to the adjacency list of a bipartite graph, estimates the size of the maximum matching with a multiplicative-additive approximation factor of and runs in time with high probability.
Proof.
By LemmaΒ 4.7, AlgorithmΒ 3 achieves multiplicative-additive approximation of . Moreover, the running time of AlgorithmΒ 3 is with high probability by LemmaΒ 4.12. β
5 Algorithm for General Graphs
The following is an analogue of LemmaΒ 4.6 for general graphs.
Lemma 5.1.
In a general graph , let , , and be as in the description of AlgorithmΒ 3. Then
Proof.
The proof proceeds as that of LemmaΒ 4.6, with the following modifications:
-
β’
we invoke the second rather than the first part of LemmaΒ 4.5 to obtain analogues of inequalities (3) and (4), i.e., instead of lower-bounding , we lower-bound , and instead of lower-bounding , we lower-bound ,
-
β’
the factor on the left-hand side of the statement can be folded into the term in at the end of the proof, for small enough .
β
In this section, we show how to extend our algorithm to work for general graphs. The main difference between bipartite and general graphs is that the estimate based on the sizes of , , , and is insufficient to achieve a 0.5109 approximation guarantee. In LemmaΒ 5.1, we show that we can achieve this approximation ratio by estimating and . More formally, to produce and in AlgorithmΒ 3, we use and , respectively. Also, AlgorithmΒ 4 and AlgorithmΒ 5 output and , respectively.
In both AlgorithmΒ 4 and AlgorithmΒ 5 we have access to oracles that can return whether a vertex is matched in , , or for a fixed permutation . These oracles can also be used to return the edge of the matching if the vertex is matched, which is a corollary of PropositionΒ 3.1 since the algorithm of PropositionΒ 3.1 can be also used to return the edge of the matching.
Lemma 5.2.
For a vertex , there exists an algorithm that returns the edges of in and in expected time.
Proof.
The proof follows from LemmaΒ 4.10 and the fact that the algorithm in PropositionΒ 3.1 can return the edge of the matched vertex in the same running time. β
Lemma 5.3.
For a vertex , there exists an algorithm that returns the edges of in and in expected time.
Proof.
If is matched in , we can return the edge in time since we have access to this maximal matching. The rest of the proof follows from LemmaΒ 4.11 and the fact that the algorithm in PropositionΒ 3.1 can return the edge of the matched vertex in the same running time. β
Local computation algorithms (LCA) is a model of computation, also motivated by large data sets, in which the algorithm is not expected to produce the entire output at once. Instead, the algorithm is queried for parts of the output, and must produce a consistent and approximately optimal output. We use the following local computation algorithm (LCA) by [LRY17] to design our algorithm.
Proposition 5.4 ([LRY17]).
There exists a -approximate local computation algorithm for maximum matching of graph in time with access to the adjacency list of .
Now we prove the main technical part of this section that can be used to estimate and .
Lemma 5.5.
Let be a subgraph of graph . Suppose that is the union of a constant number of random greedy maximal matching on different subsets of vertices. Also, we have oracle access to edges of random greedy maximal matching. We can query a vertex to obtain the matching edge of vertex in expected time. Moreover, the maximum degree of is constant. Then, there exists a -approximate algorithm that estimates the size of the maximum matching of in expected time.
Proof.
We run the algorithm of PropositionΒ 5.4. Each time the algorithm visits a new vertex , we first query all the constant number of oracles for random greedy maximal matching to get the adjacency list of in . If the algorithm in PropositionΒ 5.4 made uniform queries to the oracle, then we could conclude the proof since and are constant. However, note that queries to the oracles are not independent and we have an expected time guarantee on the running time of the oracles. So it is possible that the way that the algorithm of PropositionΒ 5.4 works might result in making biased queries to oracles.
Let be the set of vertices the algorithm of PropositionΒ 5.4 visits when we run the algorithm for vertex and we use permutation for random greedy maximal matchings of the subgraph . Let be the running time of the oracle for vertex and permutation . Also, let be the running time of the algorithm of PropositionΒ 5.4 on vertex using permutation for random greedy maximal matchings. Hence, we have
Further, the algorithm in PropositionΒ 5.4 explores the local neighborhood of a vertex to answer each query. Therefore, if two vertices and are at a distance greater than , the algorithm does not visit when answering a query for . Thus,
where the first inequality follows by the fact that each vertex has at most neighbors in within a distance of , and the last inequality because and are constants. Therefore, even with the biased queries, we can implement PropositionΒ 5.4 on subgraph in time, which completes the proof. β
Lemma 5.6.
There exists an algorithm that outputs a -approximate estimation of the value of in expected time.
Proof.
The proof follows by plugging LemmaΒ 5.3 into LemmaΒ 5.5 and running the algorithm for samples. β
Lemma 5.7.
There exists an algorithm that outputs a -approximate estimation of the value of in expected time.
Proof.
The proof follows by plugging LemmaΒ 5.2 into LemmaΒ 5.5 and running the algorithm for samples. β
Theorem 5.8.
There exists an algorithm that, given access to the adjacency list of a graph, estimates the size of the maximum matching with a multiplicative-additive approximation factor of and runs in time with high probability.
Proof.
By LemmaΒ 5.1 and a Chernoff bound similar to LemmaΒ 4.7, we achieve a multiplicative-additive approximation of . Moreover, the expected running time is by ClaimΒ 4.2, LemmaΒ 5.6, and LemmaΒ 5.7. To establish a high probability bound on the time complexity, we execute instances of the algorithm in parallel and halt as soon as the first one completes. By applying Markovβs inequality, we deduce that each individual instance terminates within time with constant probability. As a result, with high probability, at least one of these instances finishes within time. This concludes the proof. β
6 Multiplicative Approximation
In this section, we show that we can achieve a multiplicative approximation guarantee by slightly increasing the number of samples in AlgorithmΒ 4 and AlgorithmΒ 5. First, we prove a simple lower bound for the size of the maximum matching of a graph based on its maximum and average degree.
Claim 6.1.
For any graph , it holds that .
Proof.
Any graph with a maximum degree of can be colored greedily using colors. Furthermore, the edges of each color create a matching. Thus, we have . Combining with , we can conclude the proof. β
The goal is to obtain a multiplicative approximation guarantee of . It is important to note that if any of , , , or is not a constant fraction of the others, it can be omitted from the equation in the statement of LemmaΒ 4.6 without affecting the approximation by more than a function of . Thus, without loss of generality, we can assume that , , , and . Consequently, using ClaimΒ 6.1 and as an application of Chernoff bound, we can use samples in AlgorithmΒ 4 and AlgorithmΒ 5 to obtain multiplicative estimation of , , and .
By LemmaΒ 4.10, AlgorithmΒ 4 runs in time when we have samples. By increasing the number of samples to , the running time of AlgorithmΒ 4 increases to . Moreover, by LemmaΒ 4.11, AlgorithmΒ 5 runs in time when we have samples. By increasing the number of samples to , the running time of AlgorithmΒ 5 increases to . Therefore, the total running time of the algorithm is within .
Finally, we can obtain the degree of each vertex in the graph using binary search. Therefore, we can assume that we have access to and by spending time. Thus we get:
See 1.1
7 Algorithm with Access to the Adjacency Matrix
In this section, we use a simple reduction to show that with a small modification, our algorithm can be adapted to the setting where we have access to the graphβs adjacency matrix. A slightly different version of this kind of reduction appeared in previous works on sublinear time algorithms for maximum matching [Beh21, BRRS23].
It is important to note that obtaining a constant-factor multiplicative approximation is impossible when the algorithm only has access to the adjacency matrix of the graph. This is because if the graph is guaranteed to contain either a single edge or be completely empty, any algorithm would require adjacency matrix queries to distinguish between these two cases. Consequently, we allow the algorithm to have an additive error of in addition to the multiplicative approximation ratio.
We build an auxiliary graph with the following vertex set and edge set:
-
β’
Vertex set: contains disjoint sets of vertices , and . Each is a copy of the vertices of the original graph. Each contains vertices.
-
β’
Edge set: For each vertex , the -th neighbor of is the -th vertex in if , and otherwise it is the -th vertex in . Similarly, for each vertex , the -th neighbor of is -th vertex in if , and otherwise it is the -th vertex in . Also, each is connected to all vertices of . As a result, the degree of each vertex in is 1, the degree of each vertex in is , and the degree of each vertex in is . Therefore, we have .
Because of the way we constructed the graph, it is not hard to see that we can find the -th neighbor of the adjacency list of each vertex in using only a single query to the adjacency matrix of .
Observation 7.1.
For each vertex in graph , the -th neighbor of can be found using at most a single adjacency matrix query in .
Proof.
If , we can simply certify this since the degree of each vertex in is determined by the vertex set to which belongs. If , then the only neighbor of is in . If , we make an adjacency matrix query for , and based on the result, the -th neighbor is either vertex in or in . A similar procedure applies for when . If for , we return the -th vertex in . β
Modification to the algorithm:
Since the graph contains vertices, we cannot afford to apply the sparsification step to all vertices. However, vertices in have degree 1. Therefore, we apply the sparsification step only to vertices in and . Since we have , we can apply the sparsification for these sets in time. We first iterate over the vertices in and apply the sparsification step, and then we apply it to the vertices in . This ordering ensures that most vertices in get matched to vertices in in this step, which is desirable for our application.
Claim 7.2.
After the sparsification step, each vertex in is matched by with high probability. Moreover, at most vertices in are matched to vertices in with high probability.
Proof.
Note that we first process the vertices in . At the time we process a vertex , it has at least neighbors due to its neighbors in . Since, after sparsification, each vertex must have a degree of with high probability by LemmaΒ 4.3, all vertices in will be matched with high probability.
Additionally, at the time we process a vertex , it has unmatched neighbors in . On the other hand, it has at most neighbors in the rest of the graph. Thus, with a probability of at most , it gets matched to a vertex outside . Therefore, in expectation, at most vertices in are matched to vertices in . Using the Chernoff bound, we can conclude that at most vertices in are matched to vertices in with high probability. β
Equipped with this reduction, we can now simply run the rest of the algorithm for vertices in . The only difference is that we exclude the edges of that lie between and in the estimation. Additionally, in the final estimation, the algorithm returns the previous estimate minus , accounting for the vertices in that are not matched within , which introduces an additional additive error. Thus we obtain:
See 1.2
References
- [ABR24] Amir Azarmehr, Soheil Behnezhad, and Mohammad Roghani. Fully dynamic matching: -approximation in polylog update time. In DavidΒ P. Woodruff, editor, Proceedings of the 2024 ACM-SIAM Symposium on Discrete Algorithms, SODA 2024, Alexandria, VA, USA, January 7-10, 2024, pages 3040β3061. SIAM, 2024.
- [ARVX12] Noga Alon, Ronitt Rubinfeld, Shai Vardi, and Ning Xie. Space-Efficient Local Computation Algorithms. In Proceedings of the Twenty-Third Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2012, Kyoto, Japan, January 17-19, 2012, pages 1132β1139, 2012.
- [Beh21] Soheil Behnezhad. Time-optimal sublinear algorithms for matching and vertex cover. In 62nd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2021, Denver, CO, USA, February 7-10, 2022, pages 873β884. IEEE, 2021.
- [Beh23] Soheil Behnezhad. Dynamic algorithms for maximum matching size. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 129β162. SIAM, 2023.
- [BKS23a] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Dynamic -approximate matching size in truly sublinear update time. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 1563β1588. IEEE, 2023.
- [BKS23b] Sayan Bhattacharya, Peter Kiss, and Thatchaphol Saranurak. Sublinear algorithms for (1.5+)-approximate matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 254β266, 2023.
- [BKSW23] Sayan Bhattacharya, Peter Kiss, Thatchaphol Saranurak, and David Wajc. Dynamic matching with better-than-2 approximation in polylogarithmic update time. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 100β128. SIAM, 2023.
- [BRR23a] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Local computation algorithms for maximum matching: New lower bounds. In 2023 IEEE 64th Annual Symposium on Foundations of Computer Science (FOCS), pages 2322β2335. IEEE, 2023.
- [BRR23b] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Sublinear time algorithms and complexity of approximate maximum matching. In Proceedings of the 55th Annual ACM Symposium on Theory of Computing, pages 267β280, 2023.
- [BRR24] Soheil Behnezhad, Mohammad Roghani, and Aviad Rubinstein. Approximating maximum matching requires almost quadratic time. In Proceedings of the 56th Annual ACM Symposium on Theory of Computing, pages 444β454, 2024.
- [BRRS23] Soheil Behnezhad, Mohammad Roghani, Aviad Rubinstein, and Amin Saberi. Beating greedy matching in sublinear time. In Nikhil Bansal and Viswanath Nagarajan, editors, Proceedings of the 2023 ACM-SIAM Symposium on Discrete Algorithms, SODA 2023, Florence, Italy, January 22-25, 2023, pages 3900β3945. SIAM, 2023.
- [CKK20] YuΒ Chen, Sampath Kannan, and Sanjeev Khanna. Sublinear algorithms and lower bounds for metric tsp cost estimation. arXiv preprint arXiv:2006.05490, 2020.
- [Gha22] Mohsen Ghaffari. Local computation of maximal independent set. In 63rd IEEE Annual Symposium on Foundations of Computer Science, FOCS 2022, Denver, CO, USA, October 31 - November 3, 2022, pages 438β449, 2022.
- [KMNFT20] Michael Kapralov, Slobodan MitroviΔ, Ashkan Norouzi-Fard, and Jakab Tardos. Space efficient approximation to maximum matching size from uniform edge samples. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1753β1772. SIAM, 2020.
- [KN21] Christian Konrad and KheeranΒ K. Naidu. On two-pass streaming algorithms for maximum bipartite matching. In Mary Wootters and Laura Sanita, editors, Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2021, August 16-18, 2021, University of Washington, Seattle, Washington, USA (Virtual Conference), volume 207 of LIPIcs, pages 19:1β19:18, 2021.
- [LRY17] Reut Levi, Ronitt Rubinfeld, and Anak Yodpinyanee. Local computation algorithms for graphs of non-constant degrees. Algorithmica, 77(4):971β994, 2017.
- [MRTV25] Sepideh Mahabadi, Mohammad Roghani, Jakub Tarnawski, and Ali Vakilian. Sublinear Metric Steiner Tree via Improved Bounds for Set Cover. In Raghu Meka, editor, 16th Innovations in Theoretical Computer Science Conference (ITCS 2025), volume 325 of Leibniz International Proceedings in Informatics (LIPIcs), pages 74:1β74:24, Dagstuhl, Germany, 2025. Schloss Dagstuhl β Leibniz-Zentrum fΓΌr Informatik.
- [NO08] HuyΒ N Nguyen and Krzysztof Onak. Constant-time approximation algorithms via local improvements. In 2008 49th annual IEEE symposium on foundations of computer science, pages 327β336. IEEE, 2008.
- [PR07] Michal Parnas and Dana Ron. Approximating the Minimum Vertex Cover in Sublinear Time and a Connection to Distributed Algorithms. Theor. Comput. Sci., 381(1-3):183β196, 2007.
- [RTVX11] Ronitt Rubinfeld, Gil Tamir, Shai Vardi, and Ning Xie. Fast local computation algorithms. In Innovations in Computer Science - ICS 2011, Tsinghua University, Beijing, China, January 7-9, 2011. Proceedings, pages 223β238, 2011.
- [YYI09] Yuichi Yoshida, Masaki Yamamoto, and Hiro Ito. An improved constant-time approximation algorithm for maximum matchings. In Michael Mitzenmacher, editor, Proceedings of the 41st Annual ACM Symposium on Theory of Computing, STOC 2009, Bethesda, MD, USA, May 31 - June 2, 2009, pages 225β234. ACM, 2009.