SlideShare a Scribd company logo
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 
Mining Sequential Patterns for interval based 
events by applying multiple constraints 
M. Kalaivany and V. Uma 
Department of Computer Science, Pondicherry University, Pondicherry 
ABSTRACT 
Sequential pattern mining finds the frequent subsequences or patterns from the given sequences. 
TPrefixSpan algorithm finds the relevant frequent patterns from the given sequential patterns formed using 
interval based events. In our proposed work, we add multiple constraints like item, length and aggregate to 
the interval based TPrefixSpan algorithm. By adding these constraints the efficiency and effectiveness of 
the algorithm improves. The proposed constraint based algorithm CTPrefixSpan has been applied to 
synthetic medical dataset. The algorithm can be applied for stock market analysis, DNA sequences analysis 
etc. 
KEYWORDS 
Sequential patterns, temporal patterns, Constraints, Interval based events. 
1.INTRODUCTION 
Data mining is useful in various domains such as market analysis, decision support, fraud 
detection, business management and so on. Sequential pattern mining is an approach to extract 
information from input sequences [1]. Various methods have been proposed for mining temporal 
patterns in sequence databases such as mining repetitive patterns, trends and sequential patterns. 
Sequential Pattern Mining is a popular technique which consists of finding subsequences 
appearing frequently in a set of sequence. However, knowing that a sequence appear frequently is 
not sufficient for making predictions. Sequential pattern mining approaches are classified as 
Apriori [2] or generate and test approach, pattern growth or divide-and-conquer approach. The 
Apriori and AprioriAll algorithms are based on apriori property and use the generate join 
procedure to form the candidate sequence. It identifies frequent item set in the database and 
extends it to a larger item set as those item set appears sufficiently in the database. Some of the 
widely used apriori based algorithms are GSP [3], SPADE [4] and SPAM [5]. 
Pattern growth algorithms allow the frequent item set discovery without candidate item set 
generation. They first build the data structure called FP-tree. Frequent Pattern tree consists of 
nodes corresponding to items and counters. This tree reads only one transaction at a time and 
maps it to a path. Then it extracts the frequent item set directly from the FP-tree. Some of the 
widely used pattern growth Algorithms are PrefixSpan [6] and FreeSpan [7]. A projection based 
pattern-growth method is used in PrefixSpan (Prefix-projected Sequential pattern mining) 
algorithm for mining sequential patterns. Its major idea is that, instead of projecting sequence 
databases by considering all the possible occurrences of frequent sub sequences, the projection is 
done on frequent prefix which results in higher efficiency of the algorithm in terms of processing 
time. The TPrefixSpan [8] algorithm is developed to mine the temporal patterns from interval- 
DOI:10.5121/ijcsa.2014.4406 59
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 
based events. Interval-based events are defined as the pair of time values associated with each 
event. Temporal pattern mining includes various tasks like temporal pattern analysis and 
comparison, temporal classification, temporal association rules, temporal clustering and temporal 
prediction [9]. TPrefixSpan algorithm deals with mining frequent temporal patterns from interval 
based events in a given sequence database. This paper discusses about adding multiple constraints 
with TPrefixSpan algorithm which increases the performance of TPrefixSpan algorithm and 
reduce the computational time. Constraints based PrefixSpan algorithm discovers sequential 
patterns which are frequent and also satisfy aggregate, length, and item constraints. In this work, 
Constraint based algorithm CTPrefixSpan is proposed to discover frequent temporal patterns 
considering item, length and aggregate constraints. 
60 
2. Literature survey 
In [4] R.Srikant and R.Agarwal proposed an algorithm called “GSP” (Generalized Sequential 
Pattern) algorithm which is an apriori based approach for mining frequent patterns. SPAM: 
SPAM (Sequential Pattern Mining) [5] uses a vertical bitmap data structure representation of 
database which is similar to the given id-list of SPADE. It integrates the concept of GSP [4], 
SPADE [6] and FREESPAN [7] algorithms. FreeSpan [7] is the frequent pattern projected 
sequential pattern mining algorithm which mines frequent sequences, finds the frequent patterns 
and uses the projected database to find the growth of subsequent fragments. Pie et al. [6] 
proposed a projection based sequential pattern mining algorithm called PrefixSpan algorithm 
(Prefix-projected Sequential Pattern mining), which mines the frequent patterns from a given data 
sequence. In [8] Wu-Chen proposed a non-ambiguous temporal pattern for interval-based events. 
TPrefixSpan algorithm was proposed to mine the new kind of temporal pattern from interval 
based events. Irfan Khan [10] included multiple constraints with the existing PrefixSpan 
algorithm to improve the efficiency of PrefixSpan algorithm. Similarly, to improve the efficiency 
of TPrefixSpan algorithm constraints are included and CTPrefixSpan algorithm is proposed in 
this work. 
3. Existing system 
Interval-based events are represented by two end points. The end point represent the start time 
and end time of the intervals. Projection method for interval-based events is different from that of 
point-based events. This form of representation is used in TPrefixSpan algorithm. TPrefixSpan 
algorithm is slower than PrefixSpan algorithm as representing the interval based events requires 
larger space. 
Table 1. Example Sequence Database
International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 
In TPrefixSpan algorithm, initially the length of the sequential pattern is found. Then the set of 
frequent temporal 1-patterns is found from a given sequence database. These steps are repeated 
until the all frequent patterns are found. All frequent patterns from the sequences are generated. In 
TPrefixSpan algorithm “Temporal Prefix”, and “Temporal Postfix” projected databases are to be 
found and the definitions are given below. The following 2 definitions are based on the 
definitions in [8]. 
61 
3.1 Definition: (Temporal prefix) 
Suppose we have two temporal sequences  = (p1op1 p2op2… opn-1pn) and  = (p1op1 p2op2 
…opm-1pm), where m  n. Let pxdenote the last event starting point in . Then,  is called a 
temporal prefix of  if and only if 1)  is contained in ,2) the events match, and 3) match in 
operators exist i.e.opi = opi (for 1  i  x-1). 
3.2 Definition: (Temporal postfix) 
Let  be the projection of  with respect to . Then temporal sequence  is called the temporal 
postfix of  with respect to temporal prefix  and is denoted as  =  / . If  contains only one 
event starting point then we set  = . 
4. Proposed system 
To improve the efficiency of the existing TPrefixSpan algorithm, we need to add more constraints 
like length, item and aggregate on the output of frequent patterns. Multiple constraints based 
sequential pattern mining extracts the sequential patterns which are of user’s interest. 
By adding multiple constraints in interval based events the run time of finding frequent patterns 
from the given sequences will increases. We can use this constraints based algorithm in finding 
relevant sequences from the given medical data set and also to find the particular stock details in 
finance data set. It decreases the patterns from relevant frequent patterns according to the input 
sequences given. For any particular stock it gives the relevant patterns with respect to user’s 
interest. 
By adding multiple constraints in interval based events the run time of finding frequent patterns 
from the given sequences will increases. We can use this constraints based algorithm in finding 
relevant sequences from the given medical data set and also to find the particular stock details in 
finance data set. It decreases the patterns from relevant frequent patterns according to the input 
sequences given. For any particular stock it gives the relevant patterns with respect to user’s 
interest. 
Let us discuss how to generate patterns from given constraints by using our given algorithm. 
4.1. Requirements: 
4.1.1 To find the length and minimum support: From the given sequence database find the 
length and set minimum support initially as 0.5. 
4.1.2 Apply TPrefixSpan algorithm: To mine the frequent patterns we are applying existing 
TPrefixSpan algorithm proposed by Wu and Chen [8]. TPrefixSpan algorithm is the pattern 
mining algorithm to find maximum number of frequent patterns from interval based events.
International Journal on Computational Sciences  Applications (IJCSA) Vol.4, No.4, August 2014 
4.1.3 Constraints added: After applying TPrefixSpan algorithm the multiple constraints like 
item, length and aggregate constraints are added to refine the result according to user’s interest. 
The various constraints added in our paper are explained below: 
Item Constraints: An item constraint is defined as the subset of items that should or should not 
be present in the patterns [10]. For example,when mining sequential patterns over a stock market, 
a user may be interested in the patterns that have details about the day high price. The patterns 
that contain details about the day high are considered as interesting patterns. This enhances the 
effectiveness of the algorithm by providing patterns according to user’s interest. 
Length Constraints: A length constraint specifies the requirement on the length of the patterns, 
where the length can be either the number of occurrences of items or the number of transactions 
[10]. For example, a user may want to find longer patterns (i.e., at least 4 prices) in stock analysis. 
Such a requirement can be expressed by a length constraint and thus interesting patterns can be 
identified. 
Aggregate Constraint: An aggregate constraint is the constraint on an aggregate of items in a 
pattern, where the aggregate function can be sum, average, max, min, standard deviation [10]. 
For example, a user may want to find the frequent sequential patterns where the average number 
of cost related to the prices is over 3. 
62 
By applying TPrefixSpan algorithm the frequent patterns mined are shown in Figure 2. 
Figure 2. Frequent patterns mined using TPrefixSpan algorithm 
After applying Length constraint the resulting frequent patterns are shown in Figure 3.
International Journal on Computational Sciences  Applications (IJCSA) Vol.4, No.4, August 2014 
63 
Figure 3. Frequent patterns after applying Length Constraint 
After applying item constraint the resulting frequent patterns are shown in Figure 4. 
Figure 4. Frequent patterns mined after applying item constraints 
The proposed CTPrefixSpan algorithm is given in Figure 5. 
Algorithm: CTPrefixSpan () 
Input: Temporal Sequence Database S. 
Output: Frequent patterns, Interesting patterns satisfying user specified constraints 
Begin 
Step 1: Identify frequent temporal 1-patterns in S. 
Considering the frequent temporal 1-pattern generated identify the projected database. 
Step 2: Generate frequent patterns with frequent 1-pattern as prefix. 
Identify the projected database. 
Repeat Step 1 and 2 until all the frequent temporal patterns are mined. 
Get the constraint type from the user. 
If constraint specified is item constraint then
International Journal on Computational Sciences  Applications (IJCSA) Vol.4, No.4, August 2014 
64 
Identify the patterns that has the item (event) specified. 
Else if constraint specified is length constraint then 
Identify the patterns with length= length specified 
Else if constraint specified is aggregate constraint then 
Identify the patterns that matches the aggregate specified 
End if 
Generate interesting patterns that satisfy the constraints 
End 
Figure 5. Pseudo-code for CTPrefixspan 
. Result Analysis 
Our CTPrefixSpan algorithm improves the efficiency and effectiveness than other sequential 
pattern mining algorithms. The runtime and Precision of both TPrefixSpan and CTPrefixSpan can 
be analysed by using medical or finance datasets. Though our algorithm increases the runtime, the 
precision rate of the proposed CTPrefixSpan algorithm is almost equal to 1. 
5.1 Performance Evaluation 
To evaluate the performance of our CTPrefixSpan, we implemented the TPrefixSpan algorithm 
for comparison. Both these algorithms were implemented in .NET language and tested on an Intel 
core windows 7 system using Micrososft Visual Studio package. The runtime of these two 
algorithms are compared with different length of output patterns. Finally, we calculated the 
Precision rate of TPrefixSpan and CTPrefixSpan algorithms with different data sets. Finally, it 
was found that our algorithm increases the runtime. But the precision rate is always nearest to 
value 1. 
5.2 Runtime Comparisons 
Figure 6. Comparison of RunTime 
Figure 6 explains the run time with different number of patterns. The run time increases with 
different lengths like L1, L2, and L3 for CTPrefixSpan. But for existing TPrefixspan algorithm 
runtime is low than our CTPrefixSpan.
International Journal on Computational Sciences  Applications (IJCSA) Vol.4, No.4, August 2014 
65 
Figure 7. Precision Rate Comparison 
Precision = Relevant patterns  Retrieved patterns 
Retrieved patterns 
The precision value is calculated by using relevant patterns and retrieved patterns with respect to 
TPrefixSpan and CTPrefixSpan algorithms. After calculating the precision it is found that our 
CTPrefixSpan algorithm has a precision rate of 1. Finally it can be concluded that after applying 
constraints with TPrefixSpan the runtime may increase but the precision rate of the proposed 
algorithm is high. 
6. CONCLUSIONS 
TPrefixSpan algorithm finds the relevant frequent patterns from the given data sequences 
representing interval-based events. It is the most efficient interval based algorithms among the 
sequential pattern mining algorithms. Adding multiple constraints like item and length to the 
mined patterns increases the scalability, accuracy and hence increases the efficiency of the 
proposed CTPrefixSpan algorithm. 
REFERENCES 
[1] R. Agrawal and R. Srikant, “Mining sequential patterns”, International Conference of Data 
Engineering (ICDE ’95), 1995. 
[2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules”, Proc. 1994 International 
Conference of Very Large Data Bases (VLDB ’94), pp. 487-499, Sept. 1994. 
[3] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance 
improvements”, Proceedings of the 5th International Conference Extending Database Technology, 
1057, pp. 3-17, 1996. 
[4] M. Zaki, “SPADE: An efficient algorithm for mining frequent sequences”, Machine Learning, 2001. 
[5] J.Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation”, 
Proc.OfInternational Conference on Knowledge Discovery and Data Mining, 2002. 
[6] J. Pei, J. Han, B. Mortazavi-Asi, H. Pino, PrefixSpan: Mining SequentialPatterns Efficiently by 
Prefix- Projected Pattern Growth, ICDE'01, 2001. 
[7] J. Han, G. Dong, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu,“FreeSpan: Frequent pattern-projected 
sequential pattern mining”, Proc. 2000 International Conference of Knowledge Discovery 
and Data Mining (KDD’00), pp. 355-359, 2000. 
[8] Shin-Yi Wu and Yen-Liang Chen, “Mining Non-ambiguous Temporal Patterns for Interval-Based 
Events”, IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.6, June 2007. 
[9] S.Kalaivany, M. Gomathi, R.Sethukkarasi, “Mining Temporal Patterns for Interval-Based and Point- 
Based Events”, International Journal of Computational Engineering Research (IJCER), Vol 3, Issue 4, 
April 2013.
International Journal on Computational Sciences  Applications (IJCSA) Vol.4, No.4, August 2014 
[10] Irfan Khan. “PrefixSpan Algorithm Based on Multiple Constraints for Mining Sequential Patterns”, 
International Journal of Computer Science and Management Research, Vol. 1, Issue 5, December 
2012. 
66 
Authors 
M. Kalaivany is a student of M.Tech (Network and Internet Engineering) in the 
Department of Computer Science, School of Engineering and Technology, Pondicherry 
University, India. She was awarded the Anna University for B.Tech degree in Computer 
Science. She finished her M. Tech in the field of Network and Internet Engineering. She 
has published 1 research paper in international journals and in the proceedings of various 
international conferences. 
V.Uma is working as Assistant Professor in the Department of Computer Science, School 
of Engineering and Technology, Pondicherry University, India. She was awarded the 
Pondicherry University gold medal for M.Tech degree in Distributed Computing 
Systems. She is pursuing her Ph.D in the field of Temporal knowledge representation. 
Her research interest includes Data mining, Semantic web. She has published 15 research 
papers in various international journals and in the proceedings of various international 
conferences.
Ad

Recommended

Ad03301810188
Ad03301810188
ijceronline
 
A novel algorithm for mining closed sequential patterns
A novel algorithm for mining closed sequential patterns
IJDKP
 
J0945761
J0945761
IOSR Journals
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
Ijariie1129
Ijariie1129
IJARIIE JOURNAL
 
OS_Ch8
OS_Ch8
Supriya Shrivastava
 
A comprehensive study of major techniques of multi level frequent pattern min...
A comprehensive study of major techniques of multi level frequent pattern min...
eSAT Publishing House
 
Ijtra130516
Ijtra130516
International Journal of Technical Research & Application
 
Ice
Ice
ponnibala
 
final
final
Erick Miller
 
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
IRJET Journal
 
Ej36829834
Ej36829834
IJERA Editor
 
A New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item sets
ijcsa
 
Methods for handling deadlock
Methods for handling deadlock
sangrampatil81
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Ijcatr04051004
Ijcatr04051004
Editor IJCATR
 
Anomaly detection
Anomaly detection
철 김
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
IJCSEA Journal
 
Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...
ijitjournal
 
Max Entropy
Max Entropy
jianingy
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization Algorithm
Xin-She Yang
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
IEEEGLOBALSOFTTECHNOLOGIES
 
An improved teaching learning
An improved teaching learning
csandit
 
Mining frequent patterns association
Mining frequent patterns association
DeepaR42
 
B0950814
B0950814
IOSR Journals
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
ijcsa
 
5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Politique communale liégeoise en matière de radiation des registres de la pop...
Politique communale liégeoise en matière de radiation des registres de la pop...
Michel Péters
 
Omaha Hilton
Omaha Hilton
srohr
 

More Related Content

What's hot (18)

Ice
Ice
ponnibala
 
final
final
Erick Miller
 
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
IRJET Journal
 
Ej36829834
Ej36829834
IJERA Editor
 
A New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item sets
ijcsa
 
Methods for handling deadlock
Methods for handling deadlock
sangrampatil81
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Ijcatr04051004
Ijcatr04051004
Editor IJCATR
 
Anomaly detection
Anomaly detection
철 김
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
IJCSEA Journal
 
Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...
ijitjournal
 
Max Entropy
Max Entropy
jianingy
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization Algorithm
Xin-She Yang
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
IEEEGLOBALSOFTTECHNOLOGIES
 
An improved teaching learning
An improved teaching learning
csandit
 
Mining frequent patterns association
Mining frequent patterns association
DeepaR42
 
B0950814
B0950814
IOSR Journals
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
An Efficient and Scalable UP-Growth Algorithm with Optimized Threshold (min_u...
IRJET Journal
 
A New Extraction Optimization Approach to Frequent 2 Item sets
A New Extraction Optimization Approach to Frequent 2 Item sets
ijcsa
 
Methods for handling deadlock
Methods for handling deadlock
sangrampatil81
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Anomaly detection
Anomaly detection
철 김
 
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
AN ALGORITHM FOR OPTIMIZED SEARCHING USING NON-OVERLAPPING ITERATIVE NEIGHBOR...
IJCSEA Journal
 
Improving data compression ratio by the use of optimality of lzw & adaptive h...
Improving data compression ratio by the use of optimality of lzw & adaptive h...
ijitjournal
 
Max Entropy
Max Entropy
jianingy
 
A Framework for Self-Tuning Optimization Algorithm
A Framework for Self-Tuning Optimization Algorithm
Xin-She Yang
 
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
JAVA 2013 IEEE DATAMINING PROJECT Region based foldings in process discovery
IEEEGLOBALSOFTTECHNOLOGIES
 
An improved teaching learning
An improved teaching learning
csandit
 
Mining frequent patterns association
Mining frequent patterns association
DeepaR42
 
Discovering Frequent Patterns with New Mining Procedure
Discovering Frequent Patterns with New Mining Procedure
IOSR Journals
 

Viewers also liked (15)

AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
ijcsa
 
5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Politique communale liégeoise en matière de radiation des registres de la pop...
Politique communale liégeoise en matière de radiation des registres de la pop...
Michel Péters
 
Omaha Hilton
Omaha Hilton
srohr
 
Omni Orlando
Omni Orlando
srohr
 
Dart 2
Dart 2
srohr
 
alat tradisi
alat tradisi
izyan bakri
 
Urban Reserve
Urban Reserve
srohr
 
Edwarsiella
Edwarsiella
Eduardo Francisco Arce Calderón
 
The Boathouse
The Boathouse
Tom Aikins
 
Process
Process
zafirahmed
 
Galleria Village
Galleria Village
srohr
 
Dart 1
Dart 1
srohr
 
Salman Profile
Salman Profile
Carachi
 
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
ijcsa
 
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
AN ENHANCED EDGE ADAPTIVE STEGANOGRAPHY APPROACH USING THRESHOLD VALUE FOR RE...
ijcsa
 
5.3 mining sequential patterns
5.3 mining sequential patterns
Krish_ver2
 
Politique communale liégeoise en matière de radiation des registres de la pop...
Politique communale liégeoise en matière de radiation des registres de la pop...
Michel Péters
 
Omaha Hilton
Omaha Hilton
srohr
 
Omni Orlando
Omni Orlando
srohr
 
Dart 2
Dart 2
srohr
 
Urban Reserve
Urban Reserve
srohr
 
Galleria Village
Galleria Village
srohr
 
Dart 1
Dart 1
srohr
 
Salman Profile
Salman Profile
Carachi
 
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
MULTIPROCESSOR SCHEDULING AND PERFORMANCE EVALUATION USING ELITIST NON DOMINA...
ijcsa
 
Ad

Similar to Mining sequential patterns for interval based (20)

C034011016
C034011016
ijceronline
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
A Survey of Sequential Rule Mining Techniques
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
Progressive Mining of Sequential Patterns Based on Single Constraint
Progressive Mining of Sequential Patterns Based on Single Constraint
TELKOMNIKA JOURNAL
 
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
lecture13.pdfhejejejejekkeejejejejejejejej
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
Sequential Pattern Tree Mining
Sequential Pattern Tree Mining
IOSR Journals
 
Review Over Sequential Rule Mining
Review Over Sequential Rule Mining
ijsrd.com
 
Mining closed sequential patterns in large sequence databases
Mining closed sequential patterns in large sequence databases
IJDMS
 
H0964752
H0964752
IOSR Journals
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
ijtsrd
 
20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt
Premkumar R
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
Ijsrdv1 i2039
Ijsrdv1 i2039
ijsrd.com
 
Temporal Pattern Mining
Temporal Pattern Mining
Prakhar Dhama
 
Fast Sequential Rule Mining
Fast Sequential Rule Mining
ijsrd.com
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
PalaniKumarR2
 
An efficient algorithm for sequence generation in data mining
An efficient algorithm for sequence generation in data mining
ijcisjournal
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
A survey paper on sequence pattern mining with incremental
A survey paper on sequence pattern mining with incremental
Alexander Decker
 
The study on mining temporal patterns and related applications in dynamic soc...
The study on mining temporal patterns and related applications in dynamic soc...
Thanh Hieu
 
A Survey of Sequential Rule Mining Techniques
A Survey of Sequential Rule Mining Techniques
ijsrd.com
 
Progressive Mining of Sequential Patterns Based on Single Constraint
Progressive Mining of Sequential Patterns Based on Single Constraint
TELKOMNIKA JOURNAL
 
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
Agrhwoowheh3hwjoeorhehehwjeoeoeooekekekekkekee
jasminealisha635
 
lecture13.pdfhejejejejekkeejejejejejejejej
lecture13.pdfhejejejejekkeejejejejejejejej
jasminealisha635
 
Sequential Pattern Tree Mining
Sequential Pattern Tree Mining
IOSR Journals
 
Review Over Sequential Rule Mining
Review Over Sequential Rule Mining
ijsrd.com
 
Mining closed sequential patterns in large sequence databases
Mining closed sequential patterns in large sequence databases
IJDMS
 
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
Identification of User Aware Rare Sequential Pattern in Document Stream An Ov...
ijtsrd
 
20IT501_DWDM_U5.ppt
20IT501_DWDM_U5.ppt
Premkumar R
 
Sequential Pattern Mining Methods: A Snap Shot
Sequential Pattern Mining Methods: A Snap Shot
IOSR Journals
 
Ijsrdv1 i2039
Ijsrdv1 i2039
ijsrd.com
 
Temporal Pattern Mining
Temporal Pattern Mining
Prakhar Dhama
 
Fast Sequential Rule Mining
Fast Sequential Rule Mining
ijsrd.com
 
20IT501_DWDM_PPT_Unit_V.ppt
20IT501_DWDM_PPT_Unit_V.ppt
PalaniKumarR2
 
Ad

Mining sequential patterns for interval based

  • 1. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 Mining Sequential Patterns for interval based events by applying multiple constraints M. Kalaivany and V. Uma Department of Computer Science, Pondicherry University, Pondicherry ABSTRACT Sequential pattern mining finds the frequent subsequences or patterns from the given sequences. TPrefixSpan algorithm finds the relevant frequent patterns from the given sequential patterns formed using interval based events. In our proposed work, we add multiple constraints like item, length and aggregate to the interval based TPrefixSpan algorithm. By adding these constraints the efficiency and effectiveness of the algorithm improves. The proposed constraint based algorithm CTPrefixSpan has been applied to synthetic medical dataset. The algorithm can be applied for stock market analysis, DNA sequences analysis etc. KEYWORDS Sequential patterns, temporal patterns, Constraints, Interval based events. 1.INTRODUCTION Data mining is useful in various domains such as market analysis, decision support, fraud detection, business management and so on. Sequential pattern mining is an approach to extract information from input sequences [1]. Various methods have been proposed for mining temporal patterns in sequence databases such as mining repetitive patterns, trends and sequential patterns. Sequential Pattern Mining is a popular technique which consists of finding subsequences appearing frequently in a set of sequence. However, knowing that a sequence appear frequently is not sufficient for making predictions. Sequential pattern mining approaches are classified as Apriori [2] or generate and test approach, pattern growth or divide-and-conquer approach. The Apriori and AprioriAll algorithms are based on apriori property and use the generate join procedure to form the candidate sequence. It identifies frequent item set in the database and extends it to a larger item set as those item set appears sufficiently in the database. Some of the widely used apriori based algorithms are GSP [3], SPADE [4] and SPAM [5]. Pattern growth algorithms allow the frequent item set discovery without candidate item set generation. They first build the data structure called FP-tree. Frequent Pattern tree consists of nodes corresponding to items and counters. This tree reads only one transaction at a time and maps it to a path. Then it extracts the frequent item set directly from the FP-tree. Some of the widely used pattern growth Algorithms are PrefixSpan [6] and FreeSpan [7]. A projection based pattern-growth method is used in PrefixSpan (Prefix-projected Sequential pattern mining) algorithm for mining sequential patterns. Its major idea is that, instead of projecting sequence databases by considering all the possible occurrences of frequent sub sequences, the projection is done on frequent prefix which results in higher efficiency of the algorithm in terms of processing time. The TPrefixSpan [8] algorithm is developed to mine the temporal patterns from interval- DOI:10.5121/ijcsa.2014.4406 59
  • 2. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 based events. Interval-based events are defined as the pair of time values associated with each event. Temporal pattern mining includes various tasks like temporal pattern analysis and comparison, temporal classification, temporal association rules, temporal clustering and temporal prediction [9]. TPrefixSpan algorithm deals with mining frequent temporal patterns from interval based events in a given sequence database. This paper discusses about adding multiple constraints with TPrefixSpan algorithm which increases the performance of TPrefixSpan algorithm and reduce the computational time. Constraints based PrefixSpan algorithm discovers sequential patterns which are frequent and also satisfy aggregate, length, and item constraints. In this work, Constraint based algorithm CTPrefixSpan is proposed to discover frequent temporal patterns considering item, length and aggregate constraints. 60 2. Literature survey In [4] R.Srikant and R.Agarwal proposed an algorithm called “GSP” (Generalized Sequential Pattern) algorithm which is an apriori based approach for mining frequent patterns. SPAM: SPAM (Sequential Pattern Mining) [5] uses a vertical bitmap data structure representation of database which is similar to the given id-list of SPADE. It integrates the concept of GSP [4], SPADE [6] and FREESPAN [7] algorithms. FreeSpan [7] is the frequent pattern projected sequential pattern mining algorithm which mines frequent sequences, finds the frequent patterns and uses the projected database to find the growth of subsequent fragments. Pie et al. [6] proposed a projection based sequential pattern mining algorithm called PrefixSpan algorithm (Prefix-projected Sequential Pattern mining), which mines the frequent patterns from a given data sequence. In [8] Wu-Chen proposed a non-ambiguous temporal pattern for interval-based events. TPrefixSpan algorithm was proposed to mine the new kind of temporal pattern from interval based events. Irfan Khan [10] included multiple constraints with the existing PrefixSpan algorithm to improve the efficiency of PrefixSpan algorithm. Similarly, to improve the efficiency of TPrefixSpan algorithm constraints are included and CTPrefixSpan algorithm is proposed in this work. 3. Existing system Interval-based events are represented by two end points. The end point represent the start time and end time of the intervals. Projection method for interval-based events is different from that of point-based events. This form of representation is used in TPrefixSpan algorithm. TPrefixSpan algorithm is slower than PrefixSpan algorithm as representing the interval based events requires larger space. Table 1. Example Sequence Database
  • 3. International Journal on Computational Sciences & Applications (IJCSA) Vol.4, No.4, August 2014 In TPrefixSpan algorithm, initially the length of the sequential pattern is found. Then the set of frequent temporal 1-patterns is found from a given sequence database. These steps are repeated until the all frequent patterns are found. All frequent patterns from the sequences are generated. In TPrefixSpan algorithm “Temporal Prefix”, and “Temporal Postfix” projected databases are to be found and the definitions are given below. The following 2 definitions are based on the definitions in [8]. 61 3.1 Definition: (Temporal prefix) Suppose we have two temporal sequences = (p1op1 p2op2… opn-1pn) and = (p1op1 p2op2 …opm-1pm), where m n. Let pxdenote the last event starting point in . Then, is called a temporal prefix of if and only if 1) is contained in ,2) the events match, and 3) match in operators exist i.e.opi = opi (for 1 i x-1). 3.2 Definition: (Temporal postfix) Let be the projection of with respect to . Then temporal sequence is called the temporal postfix of with respect to temporal prefix and is denoted as = / . If contains only one event starting point then we set = . 4. Proposed system To improve the efficiency of the existing TPrefixSpan algorithm, we need to add more constraints like length, item and aggregate on the output of frequent patterns. Multiple constraints based sequential pattern mining extracts the sequential patterns which are of user’s interest. By adding multiple constraints in interval based events the run time of finding frequent patterns from the given sequences will increases. We can use this constraints based algorithm in finding relevant sequences from the given medical data set and also to find the particular stock details in finance data set. It decreases the patterns from relevant frequent patterns according to the input sequences given. For any particular stock it gives the relevant patterns with respect to user’s interest. By adding multiple constraints in interval based events the run time of finding frequent patterns from the given sequences will increases. We can use this constraints based algorithm in finding relevant sequences from the given medical data set and also to find the particular stock details in finance data set. It decreases the patterns from relevant frequent patterns according to the input sequences given. For any particular stock it gives the relevant patterns with respect to user’s interest. Let us discuss how to generate patterns from given constraints by using our given algorithm. 4.1. Requirements: 4.1.1 To find the length and minimum support: From the given sequence database find the length and set minimum support initially as 0.5. 4.1.2 Apply TPrefixSpan algorithm: To mine the frequent patterns we are applying existing TPrefixSpan algorithm proposed by Wu and Chen [8]. TPrefixSpan algorithm is the pattern mining algorithm to find maximum number of frequent patterns from interval based events.
  • 4. International Journal on Computational Sciences Applications (IJCSA) Vol.4, No.4, August 2014 4.1.3 Constraints added: After applying TPrefixSpan algorithm the multiple constraints like item, length and aggregate constraints are added to refine the result according to user’s interest. The various constraints added in our paper are explained below: Item Constraints: An item constraint is defined as the subset of items that should or should not be present in the patterns [10]. For example,when mining sequential patterns over a stock market, a user may be interested in the patterns that have details about the day high price. The patterns that contain details about the day high are considered as interesting patterns. This enhances the effectiveness of the algorithm by providing patterns according to user’s interest. Length Constraints: A length constraint specifies the requirement on the length of the patterns, where the length can be either the number of occurrences of items or the number of transactions [10]. For example, a user may want to find longer patterns (i.e., at least 4 prices) in stock analysis. Such a requirement can be expressed by a length constraint and thus interesting patterns can be identified. Aggregate Constraint: An aggregate constraint is the constraint on an aggregate of items in a pattern, where the aggregate function can be sum, average, max, min, standard deviation [10]. For example, a user may want to find the frequent sequential patterns where the average number of cost related to the prices is over 3. 62 By applying TPrefixSpan algorithm the frequent patterns mined are shown in Figure 2. Figure 2. Frequent patterns mined using TPrefixSpan algorithm After applying Length constraint the resulting frequent patterns are shown in Figure 3.
  • 5. International Journal on Computational Sciences Applications (IJCSA) Vol.4, No.4, August 2014 63 Figure 3. Frequent patterns after applying Length Constraint After applying item constraint the resulting frequent patterns are shown in Figure 4. Figure 4. Frequent patterns mined after applying item constraints The proposed CTPrefixSpan algorithm is given in Figure 5. Algorithm: CTPrefixSpan () Input: Temporal Sequence Database S. Output: Frequent patterns, Interesting patterns satisfying user specified constraints Begin Step 1: Identify frequent temporal 1-patterns in S. Considering the frequent temporal 1-pattern generated identify the projected database. Step 2: Generate frequent patterns with frequent 1-pattern as prefix. Identify the projected database. Repeat Step 1 and 2 until all the frequent temporal patterns are mined. Get the constraint type from the user. If constraint specified is item constraint then
  • 6. International Journal on Computational Sciences Applications (IJCSA) Vol.4, No.4, August 2014 64 Identify the patterns that has the item (event) specified. Else if constraint specified is length constraint then Identify the patterns with length= length specified Else if constraint specified is aggregate constraint then Identify the patterns that matches the aggregate specified End if Generate interesting patterns that satisfy the constraints End Figure 5. Pseudo-code for CTPrefixspan . Result Analysis Our CTPrefixSpan algorithm improves the efficiency and effectiveness than other sequential pattern mining algorithms. The runtime and Precision of both TPrefixSpan and CTPrefixSpan can be analysed by using medical or finance datasets. Though our algorithm increases the runtime, the precision rate of the proposed CTPrefixSpan algorithm is almost equal to 1. 5.1 Performance Evaluation To evaluate the performance of our CTPrefixSpan, we implemented the TPrefixSpan algorithm for comparison. Both these algorithms were implemented in .NET language and tested on an Intel core windows 7 system using Micrososft Visual Studio package. The runtime of these two algorithms are compared with different length of output patterns. Finally, we calculated the Precision rate of TPrefixSpan and CTPrefixSpan algorithms with different data sets. Finally, it was found that our algorithm increases the runtime. But the precision rate is always nearest to value 1. 5.2 Runtime Comparisons Figure 6. Comparison of RunTime Figure 6 explains the run time with different number of patterns. The run time increases with different lengths like L1, L2, and L3 for CTPrefixSpan. But for existing TPrefixspan algorithm runtime is low than our CTPrefixSpan.
  • 7. International Journal on Computational Sciences Applications (IJCSA) Vol.4, No.4, August 2014 65 Figure 7. Precision Rate Comparison Precision = Relevant patterns Retrieved patterns Retrieved patterns The precision value is calculated by using relevant patterns and retrieved patterns with respect to TPrefixSpan and CTPrefixSpan algorithms. After calculating the precision it is found that our CTPrefixSpan algorithm has a precision rate of 1. Finally it can be concluded that after applying constraints with TPrefixSpan the runtime may increase but the precision rate of the proposed algorithm is high. 6. CONCLUSIONS TPrefixSpan algorithm finds the relevant frequent patterns from the given data sequences representing interval-based events. It is the most efficient interval based algorithms among the sequential pattern mining algorithms. Adding multiple constraints like item and length to the mined patterns increases the scalability, accuracy and hence increases the efficiency of the proposed CTPrefixSpan algorithm. REFERENCES [1] R. Agrawal and R. Srikant, “Mining sequential patterns”, International Conference of Data Engineering (ICDE ’95), 1995. [2] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules”, Proc. 1994 International Conference of Very Large Data Bases (VLDB ’94), pp. 487-499, Sept. 1994. [3] R. Srikant and R. Agrawal, “Mining sequential patterns: Generalizations and performance improvements”, Proceedings of the 5th International Conference Extending Database Technology, 1057, pp. 3-17, 1996. [4] M. Zaki, “SPADE: An efficient algorithm for mining frequent sequences”, Machine Learning, 2001. [5] J.Ayres, J. Flannick, J. Gehrke, and T. Yiu, “Sequential pattern mining using a bitmap representation”, Proc.OfInternational Conference on Knowledge Discovery and Data Mining, 2002. [6] J. Pei, J. Han, B. Mortazavi-Asi, H. Pino, PrefixSpan: Mining SequentialPatterns Efficiently by Prefix- Projected Pattern Growth, ICDE'01, 2001. [7] J. Han, G. Dong, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu,“FreeSpan: Frequent pattern-projected sequential pattern mining”, Proc. 2000 International Conference of Knowledge Discovery and Data Mining (KDD’00), pp. 355-359, 2000. [8] Shin-Yi Wu and Yen-Liang Chen, “Mining Non-ambiguous Temporal Patterns for Interval-Based Events”, IEEE Transactions on Knowledge and Data Engineering, Vol.19, No.6, June 2007. [9] S.Kalaivany, M. Gomathi, R.Sethukkarasi, “Mining Temporal Patterns for Interval-Based and Point- Based Events”, International Journal of Computational Engineering Research (IJCER), Vol 3, Issue 4, April 2013.
  • 8. International Journal on Computational Sciences Applications (IJCSA) Vol.4, No.4, August 2014 [10] Irfan Khan. “PrefixSpan Algorithm Based on Multiple Constraints for Mining Sequential Patterns”, International Journal of Computer Science and Management Research, Vol. 1, Issue 5, December 2012. 66 Authors M. Kalaivany is a student of M.Tech (Network and Internet Engineering) in the Department of Computer Science, School of Engineering and Technology, Pondicherry University, India. She was awarded the Anna University for B.Tech degree in Computer Science. She finished her M. Tech in the field of Network and Internet Engineering. She has published 1 research paper in international journals and in the proceedings of various international conferences. V.Uma is working as Assistant Professor in the Department of Computer Science, School of Engineering and Technology, Pondicherry University, India. She was awarded the Pondicherry University gold medal for M.Tech degree in Distributed Computing Systems. She is pursuing her Ph.D in the field of Temporal knowledge representation. Her research interest includes Data mining, Semantic web. She has published 15 research papers in various international journals and in the proceedings of various international conferences.