A
TECHNICAL SEMINAR
ON
1
Presented By :
K.Shiva Kumar,
16d35a0507 .
DEPARTMENT OF COMPUTER SCIENCE&ENGINEERING
INDEX
Introduction
Why web clustering engine
Advantages of cluster hierarchy
Issues in implementation of Clusters
Architecture
Conclusion
3/23/2019 2
Introduction
Web Clustering Engine
Clustering is the act of grouping similar objects into
sets.
3/23/2019 3
3/23/2019 4
Why web clustering engine
Conventional engines not
much efficient in ‘Ambiguous’
queries.
The search results returned by
conventional search engines
query will be mixed together in
the list, irrelevant items occurs.
3/23/2019 5
Advantages of cluster hierarchy
It makes for shortcuts to the items that relate to the
same meaning.
It allows better topic understanding.
It favors systematic exploration of search results.
3/23/2019 6
Issues in Implementation of clusters
Short input data description.
Meaningful labels.
Selection of similarity measure.
Grouping of objects into clusters.
Over lapping.
Unknown number of clusters.
3/23/2019 7
Architecture
3/23/2019 8
Search Results Acquisition
Provides input for the rest of the system.
Deliver 50 to 500 results.
Public search engines such as Google , Yahoo.
3/23/2019 9
Preprocessing of Search Results
Covert the search results into “features”.
Steps:
Language Identification
Tokenization
Stemming
Selection features
3/23/2019 10
Cluster Construction and Labeling
Search results are input to the clustering algorithm.
Data centric Clustering Algorithm.
Created cluster should be aptly labeled.
3/23/2019 11
How can represent a Feature/Text
Vector space Model (VSM).
Document d is represented
in the VSM as a vector
[wt0 , wt1 , … , wtn].
Example:
d->”polly had a dog and
the dog had polly”
3/23/2019 12
Visualization
One prominent approach is based on heirarchical folders
Clusty , CREDO ,Lingo3G – heirarchical folder
visualization approach.
Grokker – Nesting , zooming approach.
KarTOO – Graph based interface.
3/23/2019 13
Conclusion
A number of advances must be made to improve the cluster labels ,
coherence of cluster structure , performance evaluation studies
advanced visualization techniques . Then web clustering engines
entirely fulfills the promise of being the page Rank of the future.
3/23/2019 14
References
http://clusty.com
http://credo.fub.it
www.google.com
http://credino.demi.uniud.it
3/23/2019 15
Thank You…
3/23/2019 16