
suffice to compute an MCM.
Surprisingly, the best approximate MWM algorithms do
not achieve anything close to the accuracy and efficiency of
the decades old approximate MCM algorithms [25], [38].
On real weighted graphs the Gabow-Tarjan algorithm [16]
gives a (1 − n
−Θ(1)
)-MWM in O(m
√
n log
3/2
n) time,
simply by retaining the O(log n) high order bits in each
edge weight, treating them as polynomial size integers. It
is well known that the greedy algorithm—iteratively choose
the maximum weight edge not incident to previously chosen
edges—produces a
1
2
-MWM. A straightforward implementa-
tion of this algorithm takes O(m log n) time. Preis [43], [5],
reviving the δ-MWM problem from its long slumber, gave
a
1
2
-MWM algorithm running in linear time. Vinkemeier
and Hougardy [50] and Pettie and Sanders [42] proposed
several (
2
3
−)-MWM algorithms (see also [37]) running in
O(m log
−1
) time; each is based on iteratively improving
a matching by identifying sets of short weight-augmenting
paths and cycles. No linear time algorithms with approxi-
mation ratio better than
2
3
are known.
New Results: Our main result is the first (1−)-MWM
algorithm for arbitrary weighted graphs whose running
time is nearly linear. In particular, we show that such a
matching can be found in O(m
−2
log
3
n) time. We also
present a faster algorithm that finds (
3
4
−)-MWMs in time
O(m log n log
−1
), which improves the accuracy of (
2
3
−)-
MWM algorithms [50], [42] though is a logarithmic factor
slower. (In independent work, Hanke and Hougardy [21],
[20] obtained a similar (
3
4
−)-MWM algorithm, as well as
a (
4
5
− )-MWM algorithm running in O(m log
2
n log
−1
)
time.)
Technical Challenges: After one surveys the state-of-
the-art in exact and approximate MWM algorithms, one
finds two natural avenues to a (1 − )-MWM algorithm.
The first is to extend the
2
3
-MWM algorithms [42], [50]
in a straightforward way, by looking for weight-augmenting
paths and cycles with length on the order of 1/. Due to the
haphazard way edges are placed in the current matching it is
very difficult to find long augmentations, augmenting cycles
in particular. We are able to improve the approximation ratio
of [42], [50] to
3
4
− by better handling augmenting 6-cycles.
However, new obstacles arise that prevent us from going
further. A more principled approach to the problem is to
start with Gabow and Tarjan’s [16] scaling algorithm, which
solves a version of the problem at each of log N scales, each
of which consists of O(
√
n) iterations. The goal of each
scale is not to compute a matching per se, but to compute
a hierarchy of nested blossoms and a set of nearly tight
dual variables on the vertices and blossoms. Is it necessary
to perform all O(
√
n) iterations at one scale? The answer,
unfortunately, seems to be yes. Performing
˜
O(
−1
) iterations
before prematurely jumping to the next scale leaves us with
essentially useless dual variables. This difficulty arises in the
absence of blossoms and is exacerbated by their presence.
We develop a new scaling technique that does not require
that dual variables be carried from one scale to the next.
This technique is fit for finding approximate-MWMs but not
exact ones. At each scale we apply a new and simple (1−)-
MWM algorithm for small integer weights, which follows
the standard primal-dual approach to the problem.
Some Applications of Approximate Matching: In input-
queued switches packets are routed across a “switch fabric”
from input to output ports. In each cycle one partial per-
mutation can be realized. Existing algorithms for choosing
these matchings, such as iSLIP [35] and PIM [2], guarantee
1
2
-MCMs and it has been shown [36], [17] that (approxi-
mate) maximum weight matchings, where edge weights are
based on queue-length, have good throughput guarantees.
(Of course, computing exact MWMs is unrealistic in this
application.)
Approximate MWM algorithms are a key component
in several clustering libraries.
2
These clustering algorithms
are used, for example, to partition a graph across many
parallel processors so as to minimize the communication
cost in certain algorithms [28], [29], [24]. Maximum weight
matching algorithms are used as a heuristic preprocessing
step in several sparse linear system solvers [40], [6], [47],
[19]. The goal is to permute the rows/columns to maximize
the weight on or near the main diagonal.
Definitions and Conventions: The input is a graph G =
(V, E, w) where |V | = n, |E| = m, and w : E → R. A
matching M is a set of vertex-disjoint edges. Vertices not
incident to an M edge are free. We use v
0
to refer to the mate
of a matched vertex v, that is, if (v, v
0
) ∈ M. An alternating
path (or cycle) is one whose edges alternate between M
and E\M. An alternating path/cycle P is augmenting if
M ⊕ P is a matching, where ⊕ is symmetric difference,
and w(M ⊕ P ) > w(M), where w(M ) =
P
e∈M
w(e).
If weights are not discussed, an augmenting path is one
whose ends are free vertices. Since we only seek (1 − )
approximate solutions, we can afford to scale and round edge
weights to small integers. Henceforth, w : E → {1, . . . , N },
where N ≤ n
2
. In Section II N is regarded as being a much
smaller number.
Overview: In Section II we give a (1 −)-MWM algo-
rithm whose running time depends linearly on the maximum
edge weight. In Section III we present a scaling algorithm
for (1 −)-MWM running in O(m log
3
n) time, for fixed .
Section IV gives a (3/4 − )-MWM algorithm running in
O(m log n) time.
II. APPROXIMATE MWMS FOR SMALL WEIGHTS
In this section we describe a simple algorithm for finding
(1−)-MWMs in graphs with integer weights in {1, . . . , N}
2
The clustering libraries METIS [27], PARTY [44], PT-SCOTCH [41]
CHACO [23], and JOSTLE [51] all use some approximate matching
routine. PARTY, for example, builds a hierarchical clustering by iteratively
finding and contracting approximate MWMs. Preis’s algorithm [43] was
specifically motivated by this application.