SlideShare a Scribd company logo
Associative Learning
Content
1. Introduction
2. Association Rule Learning
3. Apriori Algorithm
4. Proposed Work
Introduction
Data mining is the analysis of large quantities of data to extract interesting
patterns such as :-
groups of data records- cluster analysis
unusual records -anomaly detection
dependencies- associative rules
Association rule mining which was first proposed in[2], is a popular and
well researched data mining method for discovering interesting relations
between variables in large databases.
Association Rule learning
The problem of association Rule Mining[2] is defined as :
Let I = {i1 ,i2 ,……,in,} be a set of n attributes called items.
Let D={ t1, t2,……., tm} be a set of transactions called the database.
Each transaction t in D has a unique transaction ID and contains a subset of
the items in I.
A rule is defined as an implication of the form XY where X,Y ⊆ I
and X ∩ Y = Ø.
 Example of rule for a supermarket could be
{butter , bread}{milk}.This means if butter and bread are bought then
customers also buy milk.
Constraints
The Best known constraints are minimum threshold on support and
confidence.[3]
The Support of an item-set X is defined as the number of transaction in the
data set which contain the item-set. It is written as supp(X).
The confidence of a rule is defined as conf(XY)=supp(X U Y) / supp(X).
Association rule generation technique[16,17] can be split into two steps :
i) First ,we apply user defined minimum support on a database to find out
all the frequent item-sets.
ii) Second, these frequent item-sets and the user defined minimum
confidence are used to form the rules.
For the purpose of finding the frequent item-sets we use the Apriori algorithm.[4]
[5]
An Example
Supp(milk)= 2/5 Supp(bread)=3/5 Supp(butter)=2/5 Supp(beer)=1/5
Rule:{milk,bread}{butter} has a confidence =
supp(milk,bread,butter)/supp(milk,bread)
=2/4=50%
Transaction ID milk bread butter beer
1 1 1 0 0
2 0 0 1 0
3 0 0 0 1
4 1 1 1 0
5 0 1 0 0
Application
Market Analysis
Telecommunication
Credit Cards/ Banking Services
Medical Treatments
Basketball-Game Analysis
Apriori Algorithm
Apriori[11]is a classic algorithm for finding the frequent item-set over
transactional databases.
It proceeds by identifying the frequent individual items in the database and
extending them to larger and larger item sets as long as those item sets
appear sufficiently often in the database i.e. satisfies minimum support for
the database.
• Frequent Item-set Property:
Any subset of a frequent item-set is frequent.
This algorithm is divided into two part :
Generating Candidate Item-set
Generating the Large Frequent Item-set
Apriori Algorithm Contd.
Lk: Set of frequent item-sets of size k (with min support)
Ck: Set of candidate item-set of size k (potentially frequent item-sets)
L1 = {frequent items where the size of item is 1};
for (k = 1; Lk !=∅; k++) do
Ck+1 = candidates generated from Lk ;
for each transaction t in database do
increment the count of all candidates in
Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
Return ∪k Lk;
How it Works
Scan D
itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
C1 itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
L1
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L2
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
C2 itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
C2
Scan D
C3 itemset
{2 3 5}
Scan D L3 itemset sup
{2 3 5} 2
TID Items
T1 1 3 4
T2 2 3 5
T3 1 2 3 5
T4 2 5
Database D
Min support =2
Generation of Candidates
Input: Li-1 : set of frequent item-sets of size i-1
Output: Ci: set of candidate item-sets of size i
Ci = empty set;
for each item-set J in Li-1 do
for each item-set K in Li-1 s.t. K<> J do
if i-2 of the elements in J and K are equal then
if all subsets of {K ∪ J} are in Li-1 then
Ci = Ci ∪ {K ∪ J}
return Ci;
Example of finding Candidates
Say L3 consists of the item-sets{abc, abd, acd, ace, bcd}
Now to Generate C4 from L3
abcd from abc and abd
acde from acd and ace
Pruning the candidate set :
acde is removed because ade is not in L3
Hence C4 will have only {abcd}
Discovering Rules
for each frequent item-set I do
for each rule C  I-C do
if (support(I) / support(C) >= min_conf) then [ as {(C) U (I-C)}  I ]
output the rule (C  I-C) ,with confidence = support(I) / support (C)
and support = support(I)
Example of Discovering Rules
Let use consider the 3-itemset {I2, I3, I5}:
Support of {I2,I3,I5}= 2
{I2 , I3} I5 confidence = 2/2=100%
{I2 , I5} I3 confidence = 2/3=67%
{I3 , I5} I2 confidence = 2/2=100%
I2 {I3 , I5} confidence = 2/3=67%
I3 {I2 , I5} confidence = 2/3=67%
I5 {I2 , I3} confidence = 2/3=67%

TID Items
T1 1 3 4
T2 2 3 5
T3 1 2 3 5
T4 2 5
Database D
Advantage :
i) Apriori Method is very useful when the data size is huge as it uses level-
wise search method to find out the frequent item-sets.
ii) Apriori uses breadth-first search to count candidate item sets efficiently.
Disadvantage :
i) The Apriori Algorithm needs to go through all the database.
ii)The computation complexity does increase when the size of the candidate
increases.
Proposed Work
1. Modified Search Algorithm
2.Modified Association Rule Generation for
Classification of Data
Modified Search Algorithm
1. Add a tag Field to each Transaction in database
Format : if transaction is <T1> then the transaction
will be modified in to <T1,tag>.
2.Tag will contain the first ,middle and last instance of
the transaction.
3. Example : If a certain transaction <I4,I5,I6,I9,I11,I12>
then the tag field will be <I4,I6,I12>
Modified Search Algorithm Contd.
 Step 1: First create a TAG field for each Transaction in the Dataset. TAG field will
contain 3 fields <Starting Value, Middle Value, End value>.
 Step 2: For each item to search in the dataset first check whether the item is equal to
or greater than starting value and also less than or equal to end value.
 Step 3: If the value does not match the condition in Step 2 then do not search in that
particular Transaction. If value does match with both the conditions in the Step 2
then go to Step 4.
 Step 4: Check whether the item to be searched matches with the middle element. If
it matches then go to Step 6.If it does not match then go to Step 5
 Step 5: Calculate the difference of the item to be searched from the starting, middle
and the end value. Choose the least difference of these three values and reduce the
range of data-set and go to Step 4 if the difference from any element is 0 then go to
Step 6
 Step 6: Increase the count by 1 for that particular item when found in the particular
Transaction.
Example:
 We randomly take 30 numbers for the example
(10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101,103,105,107,109,
111,127)

We need to find 51 among these data.
 1st
Iteration
Middle Element

10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101,103,
105,107,109,111,127
 41 3
51< 54 so the range must be 10-51.But we calculate the difference.
And from the difference we can say that item(51) is much closer to the 54 than 10.So the actual range can be
converted to 33-51 as at most middle position of the range 10-51 can be equal to the item(51)
Example:
2nd
Iteration :
 Middle Element
10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101
, 103,105,107,109,111,127
 6
51>45 so the range must be 46-51.But again we calculate the difference.
And difference of item (51) from 45 is 6 and from the 51 is 0.So the Search
will end. And counter for the item will be increased by 1.
So we can see that in only 2 iterations we can find out the data we need to
find.
Example:
Comparison With Binary Search :
10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101
, 103,105,107,109,111,127
For Binary Search we will have the following iteration :
1st
iteration:(check 51<,>, = 54) result: 51<54 search in the range 10 and 51
2nd
iteration:(check 51<, >, = 33)result: 51>33 search in the range 37 and 51
3rd
iteration: (check 51<,>, = 45)result: 51>45 search in the range 46 and 51
4th
iteration: (check 51<,>, = 49)result: 51>49 search in the range 51 and 51
5th
iteration: (check 51<,>, = 51)result: 51=51 search end, Data found
Conclusion :
From the comparison it is clear that our proposed algorithm for search can
find the desired data in lesser amount of iteration hence less time.
Modified Association Rule Generation
for Classification of Data
Issues : a) Minimal Number of Rules
b) Maximum Classification of data Correctly
Example :
For item value 1 there is 3 decisions : 1, 2 and 3. We calculate
count(1,1),count(1,2)and count(1,3).And
support(1)=max(count(1,1),count(1,2),count(1,3)).
I1 I2 I3 I4 DECISION
1 2 3 4 1
1 2 6 7 1
1 3 5 8 2
2 5 6 9 2
1 2 3 6 3
Modified Association Rule Generation
for Classification of Data
Algorithm :
Step 1 : Let k = 1
Step 2 : Generate frequent item-sets of length 1(GOTO STEP 11)
Step 3 : Repeat until no new frequent item-sets are identified
(i)Generate length (k+1) candidate item-sets from length k frequent
item-sets
(ii)Prune candidate item-sets containing subsets of length k that are
infrequent
(iii)Count the support of each candidate by scanning the DB(GOTO
STEP11)
(iv)Eliminate candidates that are infrequent, leaving only those that
are frequent.
Step 11: For each item in the dataset calculate the number of times the item
is present in the whole data-set and also their corresponding decision
values.( For example I2D1or I2D2or I2D3)
Step 12: Find the maximum of the calculated support for each item.
Step 13: Return the Support for the item.
DECISION TABLE AlgorithmPART AlgorithmProposed AlgorithmOne-R Algorithm
Experimental Results
We have IRIS data-set from UCI Machine Learning Repository
Total Number of Instances 148
Classes available 3 : Iris Setosa(A), Iris Versicolour(B), Iris Virginica(C)
We first classify this data-set using the existing algorithms using the
Weka Tool.
Conclusion
Comparative Studies :
From this comparative study we can say that using our proposed algorithm
we can classify the data-set more correctly than the existing algorithms.
ALGORITHMS
Classification
DECISION
TABLE
ONE-R PART Proposed
Method
Correctly Classified 134 136 134 138
In-Correctly Classified 13 11 13 10
Number of total
Instances classified
147 147 147 148
Future Scope
In future we will try to optimize the searching
technique for apriori algorithm
Also we will try to optimize the rule set generated to
have lesser number of rules.
References
 1. Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in
Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases,
AAAI/MIT Press, Cambridge,
2. MA.Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets
of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference
on Management of data-SIGMOD'93.pp. 207.
3. Liu, B., Hsu, W., Ma, Y. (1998).Integrating Classification and Association Rule Mining,
American Association for Artificial Intelligence.
 4. Agrawal, R.,Faloutsos C. and Swami A.N.(1994).Efficient similarity search in sequence
datatabases.
5. Lomet D. (Ed.), Proceedings of the 4th International Conference of Foundations of Data
Organization and Algorithms (FODO), Chicago, Illinois, pp. 69-84. Springer Verlag.
6. www.en.wikipedia.org/wiki/Binary_search_algorithm.
7. Press, William H.; Flannery, Brian P.; Teukolsky, Saul A.; Vetterling, William T.
(1988), Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press,
pp. 98–99,
8. Hipp, J., Güntzer, U., and Nakhaeizadeh, G. (2000). Algorithms for association rule mining
— a general survey and comparison. SIGKDD Explor. Newsl. 2, 1 (Jun. 2000), 58-64.
9. Pingping W, Cuiru W, Baoyi W, Zhenxing Z, “Data Mining Technology and Its
Application in University Education System”. Computer Engineering, June 2003, pp.87-89.
10. Taorong Q, Xiaoming B, Liping Z, “An Apriori algorithm based on granular computing
and its application in Library management system”, Control & Automation, 2006, pp.218-221
References Contd.
• 11. R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules”, In Proc.
VLDB 1994, pp.487-499.
• 12. Chai, S, Jia Y, and Yang C. "The research of improved Apriori algorithm for mining
association rules." Service Systems and Service Management, 2007 International
Conference on. IEEE, 2007.
• 13. Kumar, K. Saravana, and R. Manicka Chezian. "A Survey on Association Rule Mining
using Apriori Algorithm." International Journal of Computer Applications 45.5 (2012): 47-
50.
14. Saggar, M., Agrawal, A. K., & Lad, A. (2004, October). “Optimization of association
rule mining using improved genetic algorithms”. In Systems, Man and Cybernetics, 2004
IEEE International Conference on (Vol. 4, pp. 3725-3729). IEEE.
15. Christian, A. J., & Martin, G. P. (2010, November).” Optimization of association rules
with genetic algorithms”. In Chilean Computer Science Society (SCCC), 2010 XXIX
International Conference of the (pp. 193-197). IEEE.
• 16. Hipp, J., Güntzer, U., & Nakhaeizadeh, G. (2000).” Algorithms for association rule
mining—a general survey and comparison”. ACM SIGKDD Explorations Newsletter, 2(1),
58-64.
17. Mitra, S., & Acharya, T. (2003). “Data Mining: multimedia, soft computing, and
bioinformatics”. Wiley-Interscience,7-8
Thank You

More Related Content

PPT
Apriori algorithm
PPTX
Apriori algorithm
PPT
Rmining
PPT
Association rule mining
PPT
The comparative study of apriori and FP-growth algorithm
PPTX
Apriori algorithm
PPT
1.9.association mining 1
Apriori algorithm
Apriori algorithm
Rmining
Association rule mining
The comparative study of apriori and FP-growth algorithm
Apriori algorithm
1.9.association mining 1

What's hot (20)

PPT
1.11.association mining 3
PPT
Apriori and Eclat algorithm in Association Rule Mining
PDF
Introduction To Multilevel Association Rule And Its Methods
PPTX
Mining single dimensional boolean association rules from transactional
PPT
1.10.association mining 2
PPT
Mining Frequent Patterns, Association and Correlations
PPT
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
PPT
Associations1
PPTX
Dynamic Itemset Counting
PPTX
Apriori algorithm
PPSX
Frequent itemset mining methods
PDF
B0950814
PPTX
Association Analysis
PPTX
Apriori algorithm
PPTX
Association 04.03.14
PPT
Dwh lecture slides-week15
PPT
Cs583 association-sequential-patterns
PPT
My6asso
PDF
REVIEW: Frequent Pattern Mining Techniques
PDF
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
1.11.association mining 3
Apriori and Eclat algorithm in Association Rule Mining
Introduction To Multilevel Association Rule And Its Methods
Mining single dimensional boolean association rules from transactional
1.10.association mining 2
Mining Frequent Patterns, Association and Correlations
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Associations1
Dynamic Itemset Counting
Apriori algorithm
Frequent itemset mining methods
B0950814
Association Analysis
Apriori algorithm
Association 04.03.14
Dwh lecture slides-week15
Cs583 association-sequential-patterns
My6asso
REVIEW: Frequent Pattern Mining Techniques
An Improved Frequent Itemset Generation Algorithm Based On Correspondence
Ad

Viewers also liked (20)

PDF
Lecture13 - Association Rules
PDF
Data Mining: Association Rules Basics
PDF
PPT
Units 37 39
PPT
Units 30+31
PPT
Units 17-19
PPTX
Learning by observation
PPT
Data preprocessing ppt1
PPT
Seminar Association Rules
PPT
Data mining-primitives-languages-and-system-architectures2641
PPTX
Lesson 12 observational learning
PPT
Clustering
PPT
Data preprocessing
PPTX
Decision trees
PPTX
Cluster Analysis
PDF
Association Rule Mining with R
PPTX
Cluster analysis
PPTX
Clustering in Data Mining
PPTX
Data Mining: clustering and analysis
PPTX
Data mining
Lecture13 - Association Rules
Data Mining: Association Rules Basics
Units 37 39
Units 30+31
Units 17-19
Learning by observation
Data preprocessing ppt1
Seminar Association Rules
Data mining-primitives-languages-and-system-architectures2641
Lesson 12 observational learning
Clustering
Data preprocessing
Decision trees
Cluster Analysis
Association Rule Mining with R
Cluster analysis
Clustering in Data Mining
Data Mining: clustering and analysis
Data mining
Ad

Similar to Associative Learning (20)

PPTX
Hiding Sensitive Association Rules
PPT
MiningAssociationRulesMiningAssociationRulesMiningAssociationRules.ppt
PPT
MiningAssociationbestRulespresentation.ppt
PDF
Discovering Frequent Patterns with New Mining Procedure
PPTX
Interval intersection
PPT
CS583 - association-rules(BahanAR-5).ppt
PPTX
Apriori Algorithm.pptx
PDF
J0945761
PPT
CS583-association-rules.ppt
PPT
CS583-association-rules presentation.ppt
PDF
unit II Mining Association Rule.pdf
PPT
CS583-association-rules.ppt
PPT
Association rule mining used in data mining
PPT
Association Rule.ppt
PPT
Association Rule.ppt
PDF
An Approach of Improvisation in Efficiency of Apriori Algorithm
PDF
Ijcatr04051008
PPTX
Hiding slides
PPTX
Association rule mining
PDF
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM
Hiding Sensitive Association Rules
MiningAssociationRulesMiningAssociationRulesMiningAssociationRules.ppt
MiningAssociationbestRulespresentation.ppt
Discovering Frequent Patterns with New Mining Procedure
Interval intersection
CS583 - association-rules(BahanAR-5).ppt
Apriori Algorithm.pptx
J0945761
CS583-association-rules.ppt
CS583-association-rules presentation.ppt
unit II Mining Association Rule.pdf
CS583-association-rules.ppt
Association rule mining used in data mining
Association Rule.ppt
Association Rule.ppt
An Approach of Improvisation in Efficiency of Apriori Algorithm
Ijcatr04051008
Hiding slides
Association rule mining
A PREFIXED-ITEMSET-BASED IMPROVEMENT FOR APRIORI ALGORITHM

Recently uploaded (20)

PDF
Principles of Food Science and Nutritions
PDF
Queuing formulas to evaluate throughputs and servers
PPTX
Lesson 3_Tessellation.pptx finite Mathematics
PDF
6th International Conference on Artificial Intelligence and Machine Learning ...
PPTX
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
PPTX
Glazing at Facade, functions, types of glazing
PDF
Chad Ayach - A Versatile Aerospace Professional
PPT
Chapter 6 Design in software Engineeing.ppt
PPTX
Simulation of electric circuit laws using tinkercad.pptx
PDF
Introduction to Data Science: data science process
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PPTX
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
PDF
dse_final_merit_2025_26 gtgfffffcjjjuuyy
PDF
A Framework for Securing Personal Data Shared by Users on the Digital Platforms
PPTX
Soil science - sampling procedures for soil science lab
PPT
SCOPE_~1- technology of green house and poyhouse
PDF
flutter Launcher Icons, Splash Screens & Fonts
PPTX
24AI201_AI_Unit_4 (1).pptx Artificial intelligence
Principles of Food Science and Nutritions
Queuing formulas to evaluate throughputs and servers
Lesson 3_Tessellation.pptx finite Mathematics
6th International Conference on Artificial Intelligence and Machine Learning ...
MET 305 MODULE 1 KTU 2019 SCHEME 25.pptx
Structs to JSON How Go Powers REST APIs.pdf
LEAP-1B presedntation xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Glazing at Facade, functions, types of glazing
Chad Ayach - A Versatile Aerospace Professional
Chapter 6 Design in software Engineeing.ppt
Simulation of electric circuit laws using tinkercad.pptx
Introduction to Data Science: data science process
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
The-Looming-Shadow-How-AI-Poses-Dangers-to-Humanity.pptx
dse_final_merit_2025_26 gtgfffffcjjjuuyy
A Framework for Securing Personal Data Shared by Users on the Digital Platforms
Soil science - sampling procedures for soil science lab
SCOPE_~1- technology of green house and poyhouse
flutter Launcher Icons, Splash Screens & Fonts
24AI201_AI_Unit_4 (1).pptx Artificial intelligence

Associative Learning

  • 2. Content 1. Introduction 2. Association Rule Learning 3. Apriori Algorithm 4. Proposed Work
  • 3. Introduction Data mining is the analysis of large quantities of data to extract interesting patterns such as :- groups of data records- cluster analysis unusual records -anomaly detection dependencies- associative rules Association rule mining which was first proposed in[2], is a popular and well researched data mining method for discovering interesting relations between variables in large databases.
  • 4. Association Rule learning The problem of association Rule Mining[2] is defined as : Let I = {i1 ,i2 ,……,in,} be a set of n attributes called items. Let D={ t1, t2,……., tm} be a set of transactions called the database. Each transaction t in D has a unique transaction ID and contains a subset of the items in I. A rule is defined as an implication of the form XY where X,Y ⊆ I and X ∩ Y = Ø.  Example of rule for a supermarket could be {butter , bread}{milk}.This means if butter and bread are bought then customers also buy milk.
  • 5. Constraints The Best known constraints are minimum threshold on support and confidence.[3] The Support of an item-set X is defined as the number of transaction in the data set which contain the item-set. It is written as supp(X). The confidence of a rule is defined as conf(XY)=supp(X U Y) / supp(X). Association rule generation technique[16,17] can be split into two steps : i) First ,we apply user defined minimum support on a database to find out all the frequent item-sets. ii) Second, these frequent item-sets and the user defined minimum confidence are used to form the rules. For the purpose of finding the frequent item-sets we use the Apriori algorithm.[4] [5]
  • 6. An Example Supp(milk)= 2/5 Supp(bread)=3/5 Supp(butter)=2/5 Supp(beer)=1/5 Rule:{milk,bread}{butter} has a confidence = supp(milk,bread,butter)/supp(milk,bread) =2/4=50% Transaction ID milk bread butter beer 1 1 1 0 0 2 0 0 1 0 3 0 0 0 1 4 1 1 1 0 5 0 1 0 0
  • 7. Application Market Analysis Telecommunication Credit Cards/ Banking Services Medical Treatments Basketball-Game Analysis
  • 8. Apriori Algorithm Apriori[11]is a classic algorithm for finding the frequent item-set over transactional databases. It proceeds by identifying the frequent individual items in the database and extending them to larger and larger item sets as long as those item sets appear sufficiently often in the database i.e. satisfies minimum support for the database. • Frequent Item-set Property: Any subset of a frequent item-set is frequent. This algorithm is divided into two part : Generating Candidate Item-set Generating the Large Frequent Item-set
  • 9. Apriori Algorithm Contd. Lk: Set of frequent item-sets of size k (with min support) Ck: Set of candidate item-set of size k (potentially frequent item-sets) L1 = {frequent items where the size of item is 1}; for (k = 1; Lk !=∅; k++) do Ck+1 = candidates generated from Lk ; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support Return ∪k Lk;
  • 10. How it Works Scan D itemset sup. {1} 2 {2} 3 {3} 3 {4} 1 {5} 3 C1 itemset sup. {1} 2 {2} 3 {3} 3 {5} 3 L1 itemset sup {1 3} 2 {2 3} 2 {2 5} 3 {3 5} 2 L2 itemset sup {1 2} 1 {1 3} 2 {1 5} 1 {2 3} 2 {2 5} 3 {3 5} 2 C2 itemset {1 2} {1 3} {1 5} {2 3} {2 5} {3 5} C2 Scan D C3 itemset {2 3 5} Scan D L3 itemset sup {2 3 5} 2 TID Items T1 1 3 4 T2 2 3 5 T3 1 2 3 5 T4 2 5 Database D Min support =2
  • 11. Generation of Candidates Input: Li-1 : set of frequent item-sets of size i-1 Output: Ci: set of candidate item-sets of size i Ci = empty set; for each item-set J in Li-1 do for each item-set K in Li-1 s.t. K<> J do if i-2 of the elements in J and K are equal then if all subsets of {K ∪ J} are in Li-1 then Ci = Ci ∪ {K ∪ J} return Ci;
  • 12. Example of finding Candidates Say L3 consists of the item-sets{abc, abd, acd, ace, bcd} Now to Generate C4 from L3 abcd from abc and abd acde from acd and ace Pruning the candidate set : acde is removed because ade is not in L3 Hence C4 will have only {abcd}
  • 13. Discovering Rules for each frequent item-set I do for each rule C  I-C do if (support(I) / support(C) >= min_conf) then [ as {(C) U (I-C)}  I ] output the rule (C  I-C) ,with confidence = support(I) / support (C) and support = support(I)
  • 14. Example of Discovering Rules Let use consider the 3-itemset {I2, I3, I5}: Support of {I2,I3,I5}= 2 {I2 , I3} I5 confidence = 2/2=100% {I2 , I5} I3 confidence = 2/3=67% {I3 , I5} I2 confidence = 2/2=100% I2 {I3 , I5} confidence = 2/3=67% I3 {I2 , I5} confidence = 2/3=67% I5 {I2 , I3} confidence = 2/3=67%  TID Items T1 1 3 4 T2 2 3 5 T3 1 2 3 5 T4 2 5 Database D
  • 15. Advantage : i) Apriori Method is very useful when the data size is huge as it uses level- wise search method to find out the frequent item-sets. ii) Apriori uses breadth-first search to count candidate item sets efficiently. Disadvantage : i) The Apriori Algorithm needs to go through all the database. ii)The computation complexity does increase when the size of the candidate increases.
  • 16. Proposed Work 1. Modified Search Algorithm 2.Modified Association Rule Generation for Classification of Data
  • 17. Modified Search Algorithm 1. Add a tag Field to each Transaction in database Format : if transaction is <T1> then the transaction will be modified in to <T1,tag>. 2.Tag will contain the first ,middle and last instance of the transaction. 3. Example : If a certain transaction <I4,I5,I6,I9,I11,I12> then the tag field will be <I4,I6,I12>
  • 18. Modified Search Algorithm Contd.  Step 1: First create a TAG field for each Transaction in the Dataset. TAG field will contain 3 fields <Starting Value, Middle Value, End value>.  Step 2: For each item to search in the dataset first check whether the item is equal to or greater than starting value and also less than or equal to end value.  Step 3: If the value does not match the condition in Step 2 then do not search in that particular Transaction. If value does match with both the conditions in the Step 2 then go to Step 4.  Step 4: Check whether the item to be searched matches with the middle element. If it matches then go to Step 6.If it does not match then go to Step 5  Step 5: Calculate the difference of the item to be searched from the starting, middle and the end value. Choose the least difference of these three values and reduce the range of data-set and go to Step 4 if the difference from any element is 0 then go to Step 6  Step 6: Increase the count by 1 for that particular item when found in the particular Transaction.
  • 19. Example:  We randomly take 30 numbers for the example (10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101,103,105,107,109, 111,127)  We need to find 51 among these data.  1st Iteration Middle Element  10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101,103, 105,107,109,111,127  41 3 51< 54 so the range must be 10-51.But we calculate the difference. And from the difference we can say that item(51) is much closer to the 54 than 10.So the actual range can be converted to 33-51 as at most middle position of the range 10-51 can be equal to the item(51)
  • 20. Example: 2nd Iteration :  Middle Element 10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101 , 103,105,107,109,111,127  6 51>45 so the range must be 46-51.But again we calculate the difference. And difference of item (51) from 45 is 6 and from the 51 is 0.So the Search will end. And counter for the item will be increased by 1. So we can see that in only 2 iterations we can find out the data we need to find.
  • 21. Example: Comparison With Binary Search : 10,11,12,21,22,31,33,37,39,41,45,46,49,51,54,57,61,67,69,71,78,79,81,101 , 103,105,107,109,111,127 For Binary Search we will have the following iteration : 1st iteration:(check 51<,>, = 54) result: 51<54 search in the range 10 and 51 2nd iteration:(check 51<, >, = 33)result: 51>33 search in the range 37 and 51 3rd iteration: (check 51<,>, = 45)result: 51>45 search in the range 46 and 51 4th iteration: (check 51<,>, = 49)result: 51>49 search in the range 51 and 51 5th iteration: (check 51<,>, = 51)result: 51=51 search end, Data found Conclusion : From the comparison it is clear that our proposed algorithm for search can find the desired data in lesser amount of iteration hence less time.
  • 22. Modified Association Rule Generation for Classification of Data Issues : a) Minimal Number of Rules b) Maximum Classification of data Correctly Example : For item value 1 there is 3 decisions : 1, 2 and 3. We calculate count(1,1),count(1,2)and count(1,3).And support(1)=max(count(1,1),count(1,2),count(1,3)). I1 I2 I3 I4 DECISION 1 2 3 4 1 1 2 6 7 1 1 3 5 8 2 2 5 6 9 2 1 2 3 6 3
  • 23. Modified Association Rule Generation for Classification of Data Algorithm : Step 1 : Let k = 1 Step 2 : Generate frequent item-sets of length 1(GOTO STEP 11) Step 3 : Repeat until no new frequent item-sets are identified (i)Generate length (k+1) candidate item-sets from length k frequent item-sets (ii)Prune candidate item-sets containing subsets of length k that are infrequent (iii)Count the support of each candidate by scanning the DB(GOTO STEP11) (iv)Eliminate candidates that are infrequent, leaving only those that are frequent. Step 11: For each item in the dataset calculate the number of times the item is present in the whole data-set and also their corresponding decision values.( For example I2D1or I2D2or I2D3) Step 12: Find the maximum of the calculated support for each item. Step 13: Return the Support for the item.
  • 24. DECISION TABLE AlgorithmPART AlgorithmProposed AlgorithmOne-R Algorithm Experimental Results We have IRIS data-set from UCI Machine Learning Repository Total Number of Instances 148 Classes available 3 : Iris Setosa(A), Iris Versicolour(B), Iris Virginica(C) We first classify this data-set using the existing algorithms using the Weka Tool.
  • 25. Conclusion Comparative Studies : From this comparative study we can say that using our proposed algorithm we can classify the data-set more correctly than the existing algorithms. ALGORITHMS Classification DECISION TABLE ONE-R PART Proposed Method Correctly Classified 134 136 134 138 In-Correctly Classified 13 11 13 10 Number of total Instances classified 147 147 147 148
  • 26. Future Scope In future we will try to optimize the searching technique for apriori algorithm Also we will try to optimize the rule set generated to have lesser number of rules.
  • 27. References  1. Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, and presentation of strong rules, in Piatetsky-Shapiro, Gregory; and Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, 2. MA.Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of the 1993 ACM SIGMOD international conference on Management of data-SIGMOD'93.pp. 207. 3. Liu, B., Hsu, W., Ma, Y. (1998).Integrating Classification and Association Rule Mining, American Association for Artificial Intelligence.  4. Agrawal, R.,Faloutsos C. and Swami A.N.(1994).Efficient similarity search in sequence datatabases. 5. Lomet D. (Ed.), Proceedings of the 4th International Conference of Foundations of Data Organization and Algorithms (FODO), Chicago, Illinois, pp. 69-84. Springer Verlag. 6. www.en.wikipedia.org/wiki/Binary_search_algorithm. 7. Press, William H.; Flannery, Brian P.; Teukolsky, Saul A.; Vetterling, William T. (1988), Numerical Recipes in C: The Art of Scientific Computing, Cambridge University Press, pp. 98–99, 8. Hipp, J., Güntzer, U., and Nakhaeizadeh, G. (2000). Algorithms for association rule mining — a general survey and comparison. SIGKDD Explor. Newsl. 2, 1 (Jun. 2000), 58-64. 9. Pingping W, Cuiru W, Baoyi W, Zhenxing Z, “Data Mining Technology and Its Application in University Education System”. Computer Engineering, June 2003, pp.87-89. 10. Taorong Q, Xiaoming B, Liping Z, “An Apriori algorithm based on granular computing and its application in Library management system”, Control & Automation, 2006, pp.218-221
  • 28. References Contd. • 11. R. Agrawal, and R. Srikant, “Fast Algorithms for Mining Association Rules”, In Proc. VLDB 1994, pp.487-499. • 12. Chai, S, Jia Y, and Yang C. "The research of improved Apriori algorithm for mining association rules." Service Systems and Service Management, 2007 International Conference on. IEEE, 2007. • 13. Kumar, K. Saravana, and R. Manicka Chezian. "A Survey on Association Rule Mining using Apriori Algorithm." International Journal of Computer Applications 45.5 (2012): 47- 50. 14. Saggar, M., Agrawal, A. K., & Lad, A. (2004, October). “Optimization of association rule mining using improved genetic algorithms”. In Systems, Man and Cybernetics, 2004 IEEE International Conference on (Vol. 4, pp. 3725-3729). IEEE. 15. Christian, A. J., & Martin, G. P. (2010, November).” Optimization of association rules with genetic algorithms”. In Chilean Computer Science Society (SCCC), 2010 XXIX International Conference of the (pp. 193-197). IEEE. • 16. Hipp, J., Güntzer, U., & Nakhaeizadeh, G. (2000).” Algorithms for association rule mining—a general survey and comparison”. ACM SIGKDD Explorations Newsletter, 2(1), 58-64. 17. Mitra, S., & Acharya, T. (2003). “Data Mining: multimedia, soft computing, and bioinformatics”. Wiley-Interscience,7-8