0% found this document useful (0 votes)

36 views

Finding Association Rules That Trade Support Optimally Against Confidence

This document discusses finding association rules that optimally trade support against confidence. It proposes maximizing the expected accuracy of rules on future data as the optimal trade-off criterion. A Bayesian framework is used to determine how confidence and support contribute to expected accuracy. An algorithm is presented that finds the n best rules according to this criterion by pruning redundant or suboptimal parts of the search space.

Uploaded by

david jhon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Finding Association Rules That Trade Support Optimally Against Confidence

Uploaded by

david jhon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Finding Association Rules That Trade Support

Optimally against Conﬁdence

Tobias Scheﬀer1,2
1
University of Magdeburg, FIN/IWS, PO Box 4120, 39016 Magdeburg, Germany
2
SemanticEdge, Kaiserin-Augusta-Allee 10-11, 10553 Berlin, Germany
[email protected]

Abstract. When evaluating association rules, rules that diﬀer in both

support and confidence have to compared; a larger support has to be
traded against a higher confidence. The solution which we propose for
this problem is to maximize the expected accuracy that the association
rule will have for future data. In a Bayesian framework, we determine
the contributions of confidence and support to the expected accuracy on
future data. We present a fast algorithm that finds the n best rules which
maximize the resulting criterion. The algorithm dynamically prunes re-
dundant rules and parts of the hypothesis space that cannot contain
better solutions than the best ones found so far. We evaluate the perfor-
mance of the algorithm (relative to the Apriori algorithm) on realistic
knowledge discovery problems.

1 Introduction
Association rules (e.g., [1,5,2]), express regularities between sets of data items
in a database. [Beer and TV magazine ⇒ chips] is an example of an association
rule and expresses that, in a particular store, all customers who buy beer and a
TV magazine are also likely to buy chips. In contrast to classifiers, association
rules do not make a prediction for all database records. When a customer does
not buy beer and a magazine, then our example rule does not conjecture that
he will not buy chips either. The number of database records for which a rule
does predict the proper value of an attribute is called the support of that rule.
Associations rules may not be perfectly accurate. The fraction of database
records for which the rules conjectures a correct attribute value, relative to the
fraction of records for which it makes any prediction, is called the confidence.
Note that the confidence is the relative frequency of a correct prediction on the
data that is used for training. We expect the confidence (or accuracy) on unseen
data to lie below that on average, in particular, when the support is small.
When deciding which rules to return, association rule algorithms need to
take both confidence and support into account. Of course, we can find any num-
ber of rules with perfectly high confidence but support of only one or very few
records. On the other hand, we can construct very general rules with large sup-
port but low confidence. The Apriori algorithm [2] possesses confidence and
support thresholds and returns all rules which lie above these bounds. However,

L. De Raedt and A. Siebes (Eds.): PKDD 2001, LNAI 2168, pp. 424–435, 2001.
c Springer-Verlag Berlin Heidelberg 2001

Finding Association Rules That Trade Support Optimally 425

a knowledge discovery system has to evaluate the interestingness of these rules

and provide the user with a reasonable number of interesting rules.
Which rules are interesting to the user depends on the problem which the
user wants to solve and hopes the rules to be helpful for. In many cases, the user
will be interested in finding items that do not only happen to co-occur in the
available data. He or she will rather be interested in finding items between which
there is a connection in the underlying reality. Items that truly correlate, will
most likely also correlate in future data. In statistics, confidence intervals (which
bound the difference between relative frequencies and their probabilities) can be
used to derive guarantees that empirical observations reflect existing regularities
in the underlying reality, rather than occurring just by chance. The number of
observation plays a crucial role; when a rule has a large support, then we can be
much more certain that the observed confidence is close to the confidence that
we can expect to see in future. This is one reason why association rules with
very small support are considered less interesting.
In this paper, we propose a trade-off between confidence and support which
is in a way optimal by maximizing the chance of correct predictions on unseen
data. We concretize the problem setting in Sect. 2, and in Sect. 3 we present our
resulting utility criterion. In Sect. 4, we present a fast algorithm that finds the n
best association rules with respect to this criterion. We discuss the algorithm’s
mechanism for pruning regions of the hypothesis space that cannot contain so-
lutions that are better than the ones found so far, as well as the technique used
to delete redundant rules which are already implied by other rules. In Sect. 5,
we evaluate our algorithm empirically. Section 6 concludes.

2 Preliminaries
Let D be a database consisting of one table over binary attributes a1 , . . . , ak ,
called items. In general, D has been generated by discretizing the attributes
of a relation of an original database D . For instance, when D contains an
attribute income, then D may contain binary attributes 0 ≤ income ≤ 20k,
20k < income ≤ 40k, and so on. A database record r ⊆ {a1 , . . . , ak } is the set of
attributes that take value one in a focused row of the table D.
A database record r satisfies an item set x ⊆ {a1 , . . . , ak } if x ⊆ r. The
support s(x) of an item set x is the number of records in D which satisfy x.
Often, the fraction s(x)
|D| of records in D that satisfy x is called the support of x.
But since the database D is constant, these terms are equivalent.
An association rule [x ⇒ y] with x, y ⊆ {a1 , . . . , ak }, y = ∅, and x ∩ y = ∅
expresses a relationship between an item set x and a nonempty item set y. The
intuitive semantic of the rule is that all records which satisfy x are predicted to
also satisfy y. The confidence of the rule with respect to the (training) database
D is ĉ([x ⇒ y]) = s(x∪y)
s(x) – that is, the ratio of correct predictions over all records
for which a prediction is made.
The confidence is measured with respect to the database D that is used for
training. Often, a user will assume that the resulting association rules provide
426 T. Scheffer

information on the process that generated the database which will be valid in
future, too. But the confidence on the training data is only an estimate of the
rules’ accuracy in the future, and since we search the space of association rules
to maximize the confidence, the estimate is optimistically biased. We define the
predictive accuracy c([x ⇒ y]) of a rule as the probability of a correct prediction
with respect to the process underlying the database.
Definition 1. Let D be a database the records r of which are generated by a
static process P , let [x ⇒ y] be an association rule. The predictive accuracy
c([x ⇒ y]) = P r[r satisfies y|r satisfies x] is the conditional probability of y ⊆ r
given that x ⊆ r when the distribution of r is governed by P .
The confidence ĉ([x ⇒ y]) is the relative frequency of probability c([x ⇒ y]) for
given database D. We now pose the n most accurate association rules problem.
Definition 2. Given a database D (defined like above) and a set of database
items a1 through ak , find n rules h1 , . . . , hn ∈ {[x ⇒ y]|x, y ⊆ {a1 , . . . , ak }; y =
∅; x ∩ y = ∅} which maximize the expected predictive accuracy c([x ⇒ y]).
We formulate the problem such that the algorithm needs to return a fixed
number of best association rules rather than all rules the utility of which ex-
ceeds a given threshold. We think that this setting is more appropriate in many
situation because a threshold may not be easy to specify and a user may not be
satisfied with either an empty or an outrageously large set of rules.

3 Bayesian Frequency Correction

In this section, we analyze how confidence and support contribute to the pre-
dictive accuracy. The intuitive idea is that we “mistrust” the confidence a little.
How strongly we have to discount the confidence depends on the support – the
greater the support, the more closely does the confidence relate to the predictive
accuracy. In the Bayesian framework that we adopt, there is an exact solution as
to how much we have to discount the confidence. We call this approach Bayesian
frequency correction since the resulting formula (Equation 6) takes a confidence
and “corrects” it by returning a somewhat lower predictive accuracy.
Suppose that we have a given association rule [x ⇒ y] with observed con-
fidence ĉ([x ⇒ y]). We can read p(c([x ⇒ y]|ĉ([x ⇒ y]), s(x)) as “P (predictive
accuracy of [x ⇒ y] given confidence of [x ⇒ y] and support of x)”. The intuition
of our analysis is that application of Bayes’ rule implies “P (predictive accuracy
given confidence and support) = P (confidence given predictive accuracy and
support)P (predictive accuracy)/ normalization constant”. Note that the likeli-
hood P (ĉ|c, s) is simply the binomial distribution. (The target attributes of each
record that is satisfied by x can be classified correctly or erroneously; the chance
of a correct prediction is just the predictive accuracy c; this leads to a binomial
distribution.) “P (predictive accuracy)”, the prior in our equation, is the accu-
racy histogram over the space of all association rules. This histogram counts, for
every accuracy c, the fraction of rules which possess that accuracy.
Finding Association Rules That Trade Support Optimally 427

In Equation 1, we decompose the expectation by integrating over all pos-

sible values of c([x ⇒ y]). In Equation 2, we apply Bayes’ rule. π(c) =
|{[x⇒y]|c([x⇒y])=c}|
|{[x⇒y]}| is the accuracy histogram. It speciﬁes the probability of draw-
ing an association rule with accuracy c when drawing at random under uniform
distribution from the space of association rules of length up to k.

E(c([x ⇒ y])|ĉ([x ⇒ y]), s(x))

= cp(c([x ⇒ y]) = c|ĉ([x ⇒ y]), s(x))dc (1)

P (ĉ([x ⇒ y])|c([x ⇒ y]) = c, s(x))π(c)
= c dc (2)
P (ĉ([x ⇒ y])|s(x))

In Equation 4, we apply Equation 2. Since, over all c, the distribution p(c([x ⇒

y]) = c|ĉ([x ⇒ y]), s(x)) has to integrate to one (Equation 3), we can treat
P (ĉ([x ⇒ y])|c([x ⇒ y]), s(x)) as a normalizing constant which we can determine
uniquely in Equation 5.

p(c([x ⇒ y]) = c|ĉ([x ⇒ y]), s(x))dc = 1 (3)

P (ĉ([x ⇒ y])|c([x ⇒ y]) = c, s(x))π(c)
⇔ dc = 1 (4)
P (ĉ([x ⇒ y])|s(x))

⇔ P (ĉ([x ⇒ y])|s(x)) = P (ĉ([x ⇒ y])|c([x ⇒ y]) = c, s(x))π(c)dc (5)

Combining Equations 2 and 5 we obtain Equation 6. In this equation, we also

state that, when the accuracy c is given, the conﬁdence ĉ is governed by the
binomial distribution which we write as B[c, s](ĉ). This requires us make the
standard assumption of independent and identically distributed instances.

cB[c, s(x)](ĉ([x ⇒ y]))π(c)dc
E(c([x ⇒ y])|ĉ([x ⇒ y]), s(x)) = (6)
B[c, s(x)](ĉ([x ⇒ y]))π(c)dc

We have now found a solution that quantifies E(c([x ⇒ y])|ĉ([x ⇒ y]), s(x)),
the exact expected predictive accuracy of an association rule [x ⇒ y] with given
confidence ĉ and body support s(x). Equation 6 thus quantifies just how strongly
the confidence of a rule has to be corrected, given the support of that rule. Note
that the solution depends on the prior π(c) which is the histogram of accuracies
of all association rules over the given items for the given database.
One way of treating such priors is to assume a certain standard distribution.
Under a set of assumptions on the process that generated the database, π(c)
can be shown to be governed by a certain binomial distribution [9]. However,
empirical studies (see Sect. 5 and Fig. 2a) show that the shape of the prior can
deviate strongly from this binomial distributions. Reasonably accurate estimates
can be obtained by following a Markov Chain Monte Carlo [4] approach to
estimating the prior, using the available database (see Sect. 4). For an extended
discussion of the complexity of estimating this distributions, see [9,6].
428 T. Scheffer

predictive accuracy

0.6
0.5
0.4
0.3
0.2
0.1
0
1
0.8
0 0.6
10 20 0.4 confidence
30 40
50 0.2
support 60 70 80 0

Fig. 1. Contributions of support s(x) and conﬁdence ĉ([x ⇒ y]) to predictive accuracy
c([x ⇒ y]) of rule [x ⇒ y]

Example Curve. Figure 1 shows how expected predictive accuracy, conﬁdence,

and body support relate for the database that we also use for our experiments in
Sect. 5, using 10 items. The predictive accuracy grows with both confidence and
body support of the rule. When the confidence exceeds 0.5, then the predictive
accuracy is lower than the confidence, depending on the support and on the
histogram π of accuracies of association rules for this database.

4 Discovery of Association Rules

The Apriori algorithm [1] finds association rules in two steps. First, all item
sets x with support of more then the fixed threshold “minsup” are found. Then,
all item sets are split into left and right hand side x and y (in all possible
ways) and the confidence of the rules [x ⇒ y] is calculated as s(x∪y)
s(x) . All rules
with a confidence above the confidence threshold “minconf” are returned. Our
algorithm differs from that scheme since we do not have fixed confidence and
support thresholds. Instead, we want to find the n best rules.
In the first step, our algorithm estimates the prior π(c). Then generation of
frequent item sets, pruning the hypothesis space by dynamically adjusting the
minsup threshold, generating association rules, and removing redundant associ-
ation rules interleave. The algorithm is displayed in Table 1.

Estimating π(c). We can estimate π by drawing many hypotheses at random

under uniform distribution, measuring their conﬁdence, and recording the re-
sulting histogram. Algorithmically, we run a loop over the length of the rule
Finding Association Rules That Trade Support Optimally 429

Table 1. Algorithm PredictiveApriori: discovery of n most predictive association rules

1. Input: n (desired number of association rules), database with items a1 , . . . , ak .

2. Let τ = 1.
3. For i = 1 . . . k Do: Draw a number of association rules [x ⇒ y] with i items at
random. Measure their conﬁdence (provided s(x) > 0). Let πi (c) be the distribution
of conﬁdences. k
πi (c)(k
i)
(2i −1)
4. For all c, Let π(c) = i=1
k .
i=1 i
( )(2i −1)
k

5. Let X0 = {∅}; Let X1 = {{a1 }, . . . , {ak }} be all item sets with one single element.
6. For i = 1 . . . k − 1 While (i = 1 or Xi−1 = ∅).
a) If i > 1 Then determine the set of candidate item sets of length i as Xi =
{x ∪ x |x, x ∈ Xi−1 , |x ∪ x | = i}. Generation of Xi can be optimized by
considering only item sets x and x ∈ Xi−1 that diﬀer only in the element
with highest item index. Eliminate double occurrences of item sets in Xi .
b) Run a database pass and determine the support of the generated item sets.
Eliminate item sets with support less than τ from Xi .
c) For all x ∈ Xi Call RuleGen(x).
d) If best has been changed, Then Increase τ to be the smallest number such
that E(c|1, τ ) > E(c(best[n])|ĉ(best[n], s(best[n])) (refer to Equation 6). If τ >
database size, Then Exit.
e) If τ has been increased in the last step, Then eliminate all item sets from Xi
which have support below τ .
7. Output best[1] . . . best[n], the list of the n best association rules.

Algorithm RuleGen(x) (generate all rules with body x)

10. Let γ be the smallest number such that E(c|γ/s(x), s(x)) >
E(c(best[n])|ĉ(best[n], s(best[n])).
11. For i = 1 . . . k With ai ∈ x Do (for all items not in x)
a) If i = 1 Then Let Y1 = {{ai }|ai ∈ x} (item sets with one element not in x).
b) Else Let Yi = {y ∪ y |y, y ∈ Yi−1 , |y ∪ y | = i} analogous to the generation of
candidates in step 6a.
c) For all y ∈ Yi Do
i. Measure the support s(x ∪ y). If s(x ∪ y) ≤ γ, Then eliminate y from Yi
and Continue the for loop with the next y.
ii. Equation 6 gives the predictive accuracy E(c([x ⇒ y])|s(x ∪ y)/s(x), s(x)).
iii. If the predictive accuracy is among the n best found so far (recorded
in best), Then update best, remove rules in best that are subsumed
by other rules, and Increase γ to be the smallest number such that
E(c|γ/s(x), s(x)) ≥ E(c(best[n])|ĉ(best[n], s(best[n])).
12. If any subsumed rule has been erased in step 11(c)iii, Then recur from step 10.
430 T. Scheﬀer

and, given that length, draw a ﬁxed number of rules. We determine the items
and the split into body and head by drawing at random (Step 3). We have now
drawn equally many rules for each size while the uniform distribution requires
us to prefer long rules
as there are many more long rules than there are short
ones. There are ki item sets of size i over k database items, and given i items,
there are 2i − 1 distinct association rules (each item can be located on the left or
right hand side of the rule but the right hand side must be nonempty). Hence,
Equation 7 gives the probability that exactly i items occur in a rule which is
drawn at random under uniform distribution from the space of all association
rules over k items. k i
(2 − 1)
P [i items] = k i k (7)
j
j=1 j (2 − 1)

In step 4 we apply a Markov Chain Monte Carlo style correction to the prior by
weighting each prior for rule length i by the probability of a rule length of i.

Enumerating Item Sets with Dynamic Minsup Threshold. Similarly to

the Apriori algorithm, the PredictiveApriori algorithm generates frequent item
sets, but using a dynamically increasing minsup threshold τ . Note that we start
with size zero (only the empty item set is contained in X0 ). X1 contains all
item sets with one element. Given Xi−1 , the algorithm computes Xi in step 6a
just like Apriori does. An item set can only be frequent when all its subsets are
frequent, too. We can thus generate Xi by only joining those elements of Xi−1
which diﬀer exactly in the last element (where last refers to the highest item
index). Since all subsets of an element of Xi must be in Xi−1 , the subsets that
result from removing the last, or the last but one element must be in Xi−1 , too.
After running a database pass and measuring the support of each element of Xi ,
we can delete all those candidates that do not achieve the required support of τ .
We then call the RuleGen procedure in step 6c that generates all rules over
body x, for each x ∈ Xi . The RuleGen procedure alters our array best[1 . . . n]
which saves the best rules found so far. In step 6d, we refer to best[n], meaning
the nth best rule found so far. We now refer to Equation 6 again to determine
the least support that the body of an association rule with perfect conﬁdence
must possess in order to exceed the predictive accuracy of the currently nth best
rule. If that required support exceeds the database size we can exit because no
such rule can exist. We delete all item sets in step 6e which lie below that new
τ . Finally, we output the n best rules in step 7.

Generating All Rules over Given Body x. In step 10, we introduce a new
accuracy threshold γ which quantiﬁes the conﬁdence that a rule with support
s(x) needs in order to be among the n best ones. We then start enumerating all
possible heads y, taking into account in step 11 that body and head must be
disjoint and generating candidates in step 11(b) analogous to step 6a. In step
11(c)i we calculate the support of x ∪ y for all heads y. When a rules lies among
the best ones so far, we update best. We will not bother with rules that have
a predictive accuracy below the accuracy of best[n], so we increase γ. In step
Finding Association Rules That Trade Support Optimally 431

11(c)iii, we delete rules from best which are subsumed by other rules. This may
result in the unfortunate fact that rules which we dropped from best earlier, now
belong to the n best rules again. So in step 11(c)iii we have to check this and
recur from step 10 if necessary.

Removing Redundant Rules. Consider an association rule [a ⇒ c, d]. When

this rule is satisfied by a database, then that database must also satisfy [a, b ⇒
c, d], [a ⇒ c], [a ⇒ d], and many other rules. We write [x ⇒ y] |= [x ⇒ y ] to
express that any database that satisfies [x ⇒ y] must also satisfy [x ⇒ y ]. Since
we can generate exponentially many redundant rules that can be inferred from
a more general rule, it is not desirable to present all these redundant rules to
the user. Consider the example in Table 2 which shows the five most interesting
rules generated by PredictiveApriori for the purchase database that we study
in Sect. 5. The first and second rule in the bottom part are special cases of the
third rule; the fourth and fifth rules are subsumed by the second rule of the top
part. The top part shows the best rules with redundant variants removed.

Theorem 1. We can decide whether a rule subsumes another rule by two simple
subset tests: [x ⇒ y] |= [x ⇒ y ] ⇔ x ⊆ x ∧ y ⊇ y . Moreover, if [x ⇒ y] is
supported by a database D, and [x ⇒ y] |= [x ⇒ y ] then this database also
supports [x ⇒ y ].

Proofs of Theorems 1 and 2 are left for the full paper. Theorem 1 says that
[x ⇒ y] subsumes [x ⇒ y ] if and only if x is a subset of x (weaker precondition)
and y is a superset of y (y predicts more attribute values than y ). We can then
delete [x ⇒ y ] because Theorem 1 says that from a more general rule we can
infer that all subsumed rules must be satisﬁed, too. In order to assure that the n
rules which the user is provided are not redundant specializations of each other,
we test for subsumption in step 11(c)iii by performing the two subset tests that
imply subsumption according to Theorem 1.

Theorem 2. The PredictiveApriori algorithm (Table 1) returns n association

rules [xi ⇒ yi ] with the following properties. (i) For all returned solutions [x ⇒
y], [x ⇒ y ]: [x ⇒ y] |= [x ⇒ y ]. (ii) Subject to constraint (i), the returned
rules maximize E(c[xi ⇒ yi ]|ĉ([xi ⇒ yi ]), s(x)) according to Equation 6.

Improvements. Several improvements of the Apriori algorithm have been sug-

gested that improve on the PredictiveApriori algorithm as well. The AprioriTid
algorithm requires much fewer database passes by storing, for each database
record, a list of item sets of length i which this record supports. From these
lists, the support of each item set can easily be computed. In the next itera-
tion, the list of item sets of length i + 1 that each transaction supports can
be computed without accessing the database. We can expect this modiﬁcation
to enhance the overall performance when the database is very large but sparse.
Further improvements can be obtained by using sampling techniques (e.g., [11]).
432 T. Scheﬀer

Table 2. (Top) ﬁve best association rules when subsumed rules are removed; (bottom)
ﬁve best rules when subsumed rules are not removed

[ ⇒ PanelID=9 ProductGroup=84 ]
E(c|ĉ = 1, s = 10000) = 1
[ Location=market 4 ⇒ PanelID=9, ProductGroup=84, Container=nonreuseable ]
E(c|ĉ = 1, s = 1410) = 1
[ Location=market 6 ⇒ PanelID=9, ProductGroup=84, Container=nonreuseable ]
E(c|ĉ = 1, s = 1193) = 1
[ Location=market 1 ⇒ PanelID=9, ProductGroup=84, Container=nonreuseable ]
E(c|ĉ = 1, s = 1025) = 1
[ Manufacturer=producer 18 ⇒ PanelID=9, ProductGroup=84, Type=0, Con-
tainer=nonreuseable ]
E(c|ĉ = 1, s = 1804) = 1

[ ⇒ PanelID=9 ] E(c|ĉ = 1, s = 10000) = 1

[ ⇒ ProductGroup=84 ] E(c|ĉ = 1, s = 10000) = 1
[ ⇒ PanelID=9, ProductGroup=84 ] E(c|ĉ = 1, s = 10000) = 1
[ Location=market 4 ⇒ PanelID=9 ] E(c|ĉ = 1, s = 1410) = 1
[ Location=market 4 ⇒ ProductGroup=84 ] E(c|ĉ = 1, s = 1410) = 1

5 Experiments

For our experiments, we used a database of 14,000 fruit juice purchase transac-
tions, and the mailing campaign data used for the KDD cup 1998. Each trans-
action of the fruit juice dtabase is described by 29 real valued and string valued
attributes which specify properties of the purchased juice as well as attributes
of the customer (e.g., age and job). By binarizing the attributes and considering
only a subset of the binary attributes, we varied the number of items during the
experiments. For instance, we transformed the attribute “ContainerSize” into
five binary attributes, “ContainerSize ≤ 0.3”, “0.3 < ContainerSize ≤ 0.5”, etc.
Figure 2a shows the prior π(c) as estimated by the algorithm in step 3 for
several numbers of items. Figure 1 shows the predictive accuracy for this prior,
depending on the confidence and the body support. Table 2 (top) shows the five
best association rules found for the fruit juice problem by the PredictiveApriori
algorithm. The rules say that all transactions are performed under PanelID 9
and refer to product group 84 (fruit juice purchases). Apparently, markets 1, 4,
and 6 only sell non-reuseable bottles (in contrast to the refillable bottles sold by
most german supermarkets). Producer 18 does not sell refillable bottles either.
In order to compare the performance of Apriori and PredictiveApriori, we
need to find a uniform measure that is independent of implementation details.
For Apriori, we count how many association rules have to be compared against
the minconf threshold (this number is independent of the actual minconf thresh-
old). We can determine this number from the item sets without actually enumer-
ating all rules. For PredictiveApriori, we measure for how many rules we need
to determine the predictive accuracy by evaluating Equation 6.
Finding Association Rules That Trade Support Optimally 433

0.6 20000
10 items 18000 20 items
0.5 20 items 30 items
30 items 16000 40 items

rules considered
fraction of rules

0.4 14000
12000
0.3 10000
8000
0.2 6000
0.1 4000
2000
0 0
0 0.2 0.4 0.6 0.8 1.0 0 10 20 30 40 50 60 70 80 90 100
(a) confidence (b) rules to be found

Fig. 2. (a) Conﬁdence prior π for various numbers of items. (b) Number of rules that
PredictiveApriori has to consider dependent on the number n of desired solutions

1e+06 120000
900000 Apriori, minsup=1000/10000 Apriori, minsup=1000/10000
number of rules considered

Apriori, minsup=500/10000 100000 Apriori, minsup=500/10000

800000 Apriori, minsup=100/10000 Apriori, minsup=100/10000
700000 Apriori, minsup=10/10000 Apriori, minsup=50/10000
PredictiveApriori 80000 Apriori, minsup=10/10000
600000 PredictiveApriori
500000 60000
400000
300000 40000
200000 20000
100000
0 0
10 20 30 40 50 60 70 50 55 60 65 70 75 80
(a) number of items (b) number of items

Fig. 3. Time complexity of PredictiveApriori and Apriori, depending on the number

of items and (in case of Apriori) of minsup (a) fruit juice problem, (b) KDD cup 1998

The performance of Apriori depends crucially on the choice of the support

threshold “minsup”. In Fig. 3, we compare the computational expenses imposed
by PredictiveApriori (10 best solutions) to the complexity of Apriori for several
different minsup thresholds and numbers of items for both the fruit juice and the
KDD cup database. The time required by Apriori grows rapidly with decreasing
minsup values. Among the 25 best solutions for the juice problem found by Pre-
dictiveApriori we can see rules with body support and confidence of 92. In order
to find such special but accurate rules, Apriori would run many times as long as
PredictiveApriori. Figure 2b shows how the complexity increases with the num-
ber of desired solutions. The increase is only sub-linear. Figure 4 shows extended
comparisons of the Apriori and PredictiveApriori performance for the fruit juice
problem. The horizontal lines show the time required by PredictiveApriori for
the given number of database items (n = 10 best solutions). The curves show
how the time required by Apriori depends on the minsup support threshold.
Apriori is faster for large thresholds since it then searches only a small fraction
of the space of association rules.
434 T. Scheffer

4500 8000 12000

4000 Apriori, 20 items 7000 Apriori, 30 items Apriori, 40 items
PredictiveApriori PredictiveApriori 10000 PredictiveApriori
rules considered

rules considered

rules considered
3500 6000
3000 5000 8000
2500
4000 6000
2000
1500 3000 4000
1000 2000
1000 2000
500
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
minsup minup minup

70000 180000 500000

Apriori, 50 items 160000 Apriori, 60 items 450000 Apriori, 65 items
60000 PredictiveApriori PredictiveApriori PredictiveApriori
rules considered

rules considered

rules considered
140000 400000
50000 120000 350000
40000 100000 300000
250000
30000 80000 200000
20000 60000 150000
40000 100000
10000 20000 50000
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
minup minup minup

Fig. 4. Number of rules that PredictiveApriori and Apriori and need to consider, de-
pending on the number of items (in case of Apriori also depending on minsup)

6 Discussion and Related Results

We discussed the problem of trading conﬁdence of an association rule against

support. When the goal is to maximize the expected accuracy on future database
records that are generated by the same underlying process, then Equation 6 gives
us the optimal trade-off between confidence and support of the rule’s body. Equa-
tion 6 results from a Bayesian analysis of the predictive accuracy; it is based on
the assumption that the database records are independent and identically dis-
tributed and requires us to estimate the confidence prior. The PredictiveApriori
algorithm does this using a MCMC approach [4].
The Bayesian frequency correction approach that eliminates the optimistical
bias of high confidences relates to an analysis of classification algorithms [8]
that yields a parameter-free regularization criterion for decision tree algorithms
[10]. The PredictiveApriori algorithm returns the n rules which maximize the
expected accuracy; the user only has to specify how many rules he or she wants
to be presented. This is perhaps a more natural parameter than minsup and
minconf, required by the Apriori algorithm.
The algorithm also checks the rules for redundancies. It has a bias towards
returning general rules and eliminating all rules which are entailed by equally
accurate, more general ones. Guided by similar ideas, the Midos algorithm [12]
performs a similarity test for hypotheses. In [13], a rule discovery algorithm
is discussed that selects from classes of redundant rules the most simple, rather
than the most general ones. For example, given two equally accurate rules [a ⇒ b]
and [a ⇒ b, c] PredictiveApriori would prefer the latter which predicts more
values whereas [13] would prefer the shorter first one.
The favorable computational performance of the PredictiveApriori algorithm
can be credited to the dynamic pruning technique that uses an upper bound on
Finding Association Rules That Trade Support Optimally 435

the accuracy of all rules over supersets of a given item set. Very large parts of
the search space can thus be excluded. A similar idea is realized in Midos [12].
Many optimizations of the Apriori algorithm have been proposed which have
helped this algorithm gain its huge practical relevance. These include the Apri-
oriTid approach for minimizing the number of database passes [2], and sampling
approaches for estimating the support of item sets [2,11]. In particular, eﬃcient
search for frequent itemsets has been addressed intensely and successfully [7,
3,14]. Many of these improvements can, and should be, applied to the Predic-
tiveApriori algorithm as well.

References
1. R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of
items in large databases. In ACM SIGMOD Conference on Management of Data,
pages 207–216, 1993.
2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. Verkamo. Fast discovery
of association rules. In Advances in Knowledge Discovery and Data Mining, 1996.
3. S. Brin, R. Motwani, J. Ullmann, and S. Tsur. Dynamic itemset counting and
implication rules for market basket data. In Proceedings of the ACM SIGMOD
Conference on Managament of Data, 1997.
4. W. Gilks, S. Richardson, and D. Spiegelhalter, editors. Markov Chain Monte Carlo
in Practice. Chapman & Hall, 1995.
5. M. Klemettinen, H. Mannila, P. Ronkainen, H.Toivonen, and A. I. Verkamo. Find-
ing interesting rules from large sets of discovered associacion rules. Proc. Third
International Conference on Information and Knowledge Management, 1994.
6. J. Langford and D. McAllester. Computable shell decomposition bounds. In Pro-
ceedings of the International Conference on Computational Learning Theory, 2000.
7. D. Lin and Z. Kedem. Pincer search: a new algorithm for discovering the maximum
frequent set. In Proceedings of the International Conference on Extending Database
Technology, 1998.
8. T. Scheffer. Error Estimation and Model Selection. Infix Publisher, Sankt Au-
gustin, 1999.
9. T. Scheffer. Average-case analysis of classification algorithms for boolean functions
and decision trees. In Proceedings of the International Conference on Algorithmic
Learning Theory, 2000.
10. T. Scheffer. Nonparametric regularization of decision trees. In Proceedings of the
European Conference on Machine Learning, 2000.
11. T. Scheffer and S. Wrobel. A sequential sampling algorithm for a general class
of utility functions. In Proceedings of the International Conference on Knowledge
Discovery and Data Mining, 2000.
12. S. Wrobel. Inductive logic programming for knowledge discovery in databases. In
Sašo Džeroski and Nada Lavrač, editors, Relational Data Mining, 2001.
13. M. Zaki. Generating non-redundant association rules. In Proceedings of the Inter-
national Conference on Knowledge Discovery and Data Mining, 2000.
14. M. Zaki and C. Hiao. Charm: an efficient algorithm for closed association rule
mining. Technical Report 99-10, Rensselaer Polytechnic Institute, 1999.

Little Rotters - Composting Handbook - 2nd Ed
No ratings yet
Little Rotters - Composting Handbook - 2nd Ed
77 pages
Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
Case Study 1 The Saeed Driving School Case Study
0% (1)
Case Study 1 The Saeed Driving School Case Study
4 pages
Guideline For Welding P91
No ratings yet
Guideline For Welding P91
44 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Association
No ratings yet
Association
29 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
No ratings yet
Assignment 3 Aim: Association Rule Mining Using Apriori Algorithm. Objectives
7 pages
Association Rule
No ratings yet
Association Rule
27 pages
Top 9 Data Science Algorithms
No ratings yet
Top 9 Data Science Algorithms
152 pages
Measuring The Accuracy and Interest of Association Rules: A New Framework
No ratings yet
Measuring The Accuracy and Interest of Association Rules: A New Framework
15 pages
Market Basket Analysis
No ratings yet
Market Basket Analysis
27 pages
Apriori
No ratings yet
Apriori
27 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
28 pages
Chapter 5 - Association Rule Mining
No ratings yet
Chapter 5 - Association Rule Mining
45 pages
Mining Optimized Gain Rules
No ratings yet
Mining Optimized Gain Rules
15 pages
Rough Sets Association Analysis
No ratings yet
Rough Sets Association Analysis
14 pages
Association-Rules
No ratings yet
Association-Rules
33 pages
Chapter 10 Association Rule
No ratings yet
Chapter 10 Association Rule
41 pages
Apriori
No ratings yet
Apriori
27 pages
ANL305 SU1 July2024
No ratings yet
ANL305 SU1 July2024
61 pages
A New Approach To Classification Based On Association Rule Mining
No ratings yet
A New Approach To Classification Based On Association Rule Mining
16 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Tutorial
No ratings yet
Tutorial
52 pages
s13042-013-0172-6
No ratings yet
s13042-013-0172-6
11 pages
Marketbasket Analysis
No ratings yet
Marketbasket Analysis
28 pages
ITS632 Lecture7 Research Paper Association
No ratings yet
ITS632 Lecture7 Research Paper Association
21 pages
Lecture 11 Assiciation Rules II M
No ratings yet
Lecture 11 Assiciation Rules II M
27 pages
Association Rule - Data Mining
100% (1)
Association Rule - Data Mining
131 pages
Module 4 DM
No ratings yet
Module 4 DM
86 pages
Lec.5.Intro.D.S. Fall 2023
No ratings yet
Lec.5.Intro.D.S. Fall 2023
18 pages
Association Rules
No ratings yet
Association Rules
14 pages
Generating Non-Redundant Association Rules
No ratings yet
Generating Non-Redundant Association Rules
18 pages
Cap. 10 Reglas de asociacion sobre intervalos miller1997
No ratings yet
Cap. 10 Reglas de asociacion sobre intervalos miller1997
10 pages
DataMining_Chapter2
No ratings yet
DataMining_Chapter2
8 pages
Association Rule Mining Presentation
No ratings yet
Association Rule Mining Presentation
44 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Association Rule Miningsolvedexamples
No ratings yet
Association Rule Miningsolvedexamples
9 pages
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
No ratings yet
Interesting Measures For Mining Association Rules: FAST-NUCES, Lahore
4 pages
ENDSEM IMP BI UNIT 5
No ratings yet
ENDSEM IMP BI UNIT 5
41 pages
Apriori Algorithm Dataware House&Data Mining
No ratings yet
Apriori Algorithm Dataware House&Data Mining
11 pages
4.Eng-Association Rule Mining For Different Minimum
No ratings yet
4.Eng-Association Rule Mining For Different Minimum
8 pages
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
No ratings yet
Data Analysis Using Apriori Algorithm & Neural Netwok: Ashutosh Padhi
27 pages
DWM UNIT-4 SEM ANS
No ratings yet
DWM UNIT-4 SEM ANS
9 pages
Assignment 03
No ratings yet
Assignment 03
9 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Mining Items From Large Database Using Coherent Rules
No ratings yet
Mining Items From Large Database Using Coherent Rules
10 pages
DM Unit 2
No ratings yet
DM Unit 2
55 pages
Apriori
No ratings yet
Apriori
27 pages
Lec.6 Into. D.S. Fall 2023
No ratings yet
Lec.6 Into. D.S. Fall 2023
22 pages
Chapter 13 - Association Rules: Data Mining For Business Intelligence
No ratings yet
Chapter 13 - Association Rules: Data Mining For Business Intelligence
22 pages
Mineria de Datos
No ratings yet
Mineria de Datos
10 pages
Unit 2: Scs5623 - Data Mining and Warehousing
No ratings yet
Unit 2: Scs5623 - Data Mining and Warehousing
9 pages
Apriori Algorithm Example Problems
No ratings yet
Apriori Algorithm Example Problems
8 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
Aml Unit 3
No ratings yet
Aml Unit 3
17 pages
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Foundations of Probability Theory
From Everand
Foundations of Probability Theory
Himadri Deshpande
No ratings yet
Bayesian Decision Networks: Fundamentals and Applications
From Everand
Bayesian Decision Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
Advance Computer Programming: by Hasnat Ali
No ratings yet
Advance Computer Programming: by Hasnat Ali
35 pages
Assignment 2
No ratings yet
Assignment 2
2 pages
Assignment #1: FALL-2020 SDT. Max Marks: 10: Q #7 (1 Mark) Q #8 (2 Mark)
No ratings yet
Assignment #1: FALL-2020 SDT. Max Marks: 10: Q #7 (1 Mark) Q #8 (2 Mark)
2 pages
Lab 3 Exception Handling
No ratings yet
Lab 3 Exception Handling
10 pages
New Rawal Food Industry: Requirement Elicitation Technique
No ratings yet
New Rawal Food Industry: Requirement Elicitation Technique
4 pages
Dailry Milk Management System Project
No ratings yet
Dailry Milk Management System Project
11 pages
LPP Model
No ratings yet
LPP Model
1 page
Lab 1 Development Environment and Basic Constructs in Java
No ratings yet
Lab 1 Development Environment and Basic Constructs in Java
11 pages
Lab 2 Abstract Classes and Interfaces
No ratings yet
Lab 2 Abstract Classes and Interfaces
11 pages
Advance Computer Programming: by Hasnat Ali
No ratings yet
Advance Computer Programming: by Hasnat Ali
45 pages
Name: Hafiz Bilal Naseem Shah SAPID: 2460
No ratings yet
Name: Hafiz Bilal Naseem Shah SAPID: 2460
1 page
Project Description: Software Constructions
No ratings yet
Project Description: Software Constructions
1 page
Advance Computer Programming: by Hasnat Ali
No ratings yet
Advance Computer Programming: by Hasnat Ali
41 pages
Assignment Format
No ratings yet
Assignment Format
7 pages
Operating Systems: Chapter 2 - Operating System Structures
No ratings yet
Operating Systems: Chapter 2 - Operating System Structures
56 pages
Computer Organization & Architecture
No ratings yet
Computer Organization & Architecture
41 pages
Analysis and Development
No ratings yet
Analysis and Development
7 pages
CS408 - Assignment# 2 Spring 2012 Solution
No ratings yet
CS408 - Assignment# 2 Spring 2012 Solution
2 pages
Write Summary
0% (1)
Write Summary
3 pages
Name: Hafiz Bilal Naseem Shah SAPID:2460
No ratings yet
Name: Hafiz Bilal Naseem Shah SAPID:2460
9 pages
Some HCI Priorities For GDPR-Compliant Machine Lea
No ratings yet
Some HCI Priorities For GDPR-Compliant Machine Lea
8 pages
Human Computer Interaction Sec B: Hafiz Bilal Naseem Shah
No ratings yet
Human Computer Interaction Sec B: Hafiz Bilal Naseem Shah
6 pages
Problems With ER Models
No ratings yet
Problems With ER Models
11 pages
Stone Columns
No ratings yet
Stone Columns
4 pages
Cost Sheet Indiabulls Park CLP (IBREL)
No ratings yet
Cost Sheet Indiabulls Park CLP (IBREL)
15 pages
Phil Comsat Corp v. Alcuaz (G.r. No. 84818)
No ratings yet
Phil Comsat Corp v. Alcuaz (G.r. No. 84818)
3 pages
Philips Hadco-Comm Intro Overview
No ratings yet
Philips Hadco-Comm Intro Overview
29 pages
gem-overview-ppt-12-august-2024_1724322936(1)
No ratings yet
gem-overview-ppt-12-august-2024_1724322936(1)
31 pages
PC W130B 04 EU
No ratings yet
PC W130B 04 EU
82 pages
Judge William E. Moody Order Denying Temporary Injunction
100% (1)
Judge William E. Moody Order Denying Temporary Injunction
2 pages
Greek in 60 Minutes
No ratings yet
Greek in 60 Minutes
15 pages
Section:01, Seat:8, Milady Ice Cream
100% (1)
Section:01, Seat:8, Milady Ice Cream
248 pages
CDP Alandi Report PDF
50% (2)
CDP Alandi Report PDF
188 pages
Optimizingdiagnosisand Managementofcommunity-Acquiredpneumoniainthe Emergencydepartment
No ratings yet
Optimizingdiagnosisand Managementofcommunity-Acquiredpneumoniainthe Emergencydepartment
17 pages
Unit Ii
No ratings yet
Unit Ii
45 pages
Falamine Plus Pages LQ
No ratings yet
Falamine Plus Pages LQ
1 page
5th Sem Mech Diploma Odisha
No ratings yet
5th Sem Mech Diploma Odisha
14 pages
EN Genetec HID Global VertX EVO V2000 Specifications Sheet
No ratings yet
EN Genetec HID Global VertX EVO V2000 Specifications Sheet
2 pages
Proposal of Automatic CV Generator
No ratings yet
Proposal of Automatic CV Generator
6 pages
MHADA INFORMATION BOOKLET ENGLISH Chap-9
No ratings yet
MHADA INFORMATION BOOKLET ENGLISH Chap-9
13 pages
The Long and Winding Road To A Residency by Zainab Malik (Class of 2000, Paeds Resident, Duke University)
No ratings yet
The Long and Winding Road To A Residency by Zainab Malik (Class of 2000, Paeds Resident, Duke University)
9 pages
17019/JP HYB EXP First Ac (1A)
No ratings yet
17019/JP HYB EXP First Ac (1A)
3 pages
Module 4
No ratings yet
Module 4
21 pages
Performance Management System of A Security Agency in The Philippines
No ratings yet
Performance Management System of A Security Agency in The Philippines
8 pages
DE HSG TAnh 11
No ratings yet
DE HSG TAnh 11
19 pages
Valmet 645 DSBAEL
No ratings yet
Valmet 645 DSBAEL
60 pages
MAXIMO Job Description
No ratings yet
MAXIMO Job Description
2 pages
Mintex Racing Applications
No ratings yet
Mintex Racing Applications
83 pages
Critical Thinking Medical
100% (2)
Critical Thinking Medical
174 pages
Huawei AirEngine 5761R-11 & AirEngine 5761R-11E Access Points Datasheet
No ratings yet
Huawei AirEngine 5761R-11 & AirEngine 5761R-11E Access Points Datasheet
16 pages
Research Methodology (RM) Solved MCQs (Set-10)
No ratings yet
Research Methodology (RM) Solved MCQs (Set-10)
6 pages