0% found this document useful (0 votes)
73 views16 pages

A Technical Seminar ON: Presented

This technical seminar discusses web clustering engines. It introduces clustering and explains why web clustering engines are useful for ambiguous queries by grouping similar search results. The document outlines some advantages of cluster hierarchies, like providing shortcuts between related topics. It also discusses challenges in implementing clusters, such as determining meaningful labels and similarity measures. An overview is given of the architecture of a web clustering engine, including acquiring search results, preprocessing them, constructing clusters, and visualizing the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views16 pages

A Technical Seminar ON: Presented

This technical seminar discusses web clustering engines. It introduces clustering and explains why web clustering engines are useful for ambiguous queries by grouping similar search results. The document outlines some advantages of cluster hierarchies, like providing shortcuts between related topics. It also discusses challenges in implementing clusters, such as determining meaningful labels and similarity measures. An overview is given of the architecture of a web clustering engine, including acquiring search results, preprocessing them, constructing clusters, and visualizing the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

A

TECHNICAL SEMINAR
ON

1
Presented By :
K.Shiva Kumar,
16d35a0507 .

DEPARTMENT OF COMPUTER SCIENCE&ENGINEERING


INDEX
 Introduction
 Why web clustering engine
 Advantages of cluster hierarchy
 Issues in implementation of Clusters
 Architecture
 Conclusion

3/23/2019 2
Introduction
 Web Clustering Engine

 Clustering is the act of grouping similar objects into


sets.

3/23/2019 3
3/23/2019 4
Why web clustering engine
 Conventional engines not
much efficient in ‘Ambiguous’
queries.
 The search results returned by
conventional search engines
query will be mixed together in
the list, irrelevant items occurs.

3/23/2019 5
Advantages of cluster hierarchy
 It makes for shortcuts to the items that relate to the
same meaning.

 It allows better topic understanding.

 It favors systematic exploration of search results.

3/23/2019 6
Issues in Implementation of clusters
 Short input data description.
 Meaningful labels.
 Selection of similarity measure.
 Grouping of objects into clusters.
 Over lapping.
 Unknown number of clusters.

3/23/2019 7
Architecture

3/23/2019 8
Search Results Acquisition
 Provides input for the rest of the system.

 Deliver 50 to 500 results.

 Public search engines such as Google , Yahoo.

3/23/2019 9
Preprocessing of Search Results
 Covert the search results into “features”.
 Steps:
 Language Identification
 Tokenization

 Stemming

 Selection features

3/23/2019 10
Cluster Construction and Labeling
 Search results are input to the clustering algorithm.

 Data centric Clustering Algorithm.

 Created cluster should be aptly labeled.

3/23/2019 11
How can represent a Feature/Text
Vector space Model (VSM).
Document d is represented
in the VSM as a vector
[wt0 , wt1 , … , wtn].
 Example:
d->”polly had a dog and
the dog had polly”

3/23/2019 12
Visualization
One prominent approach is based on heirarchical folders
 Clusty , CREDO ,Lingo3G – heirarchical folder
visualization approach.
 Grokker – Nesting , zooming approach.
 KarTOO – Graph based interface.

3/23/2019 13
Conclusion
A number of advances must be made to improve the cluster labels ,
coherence of cluster structure , performance evaluation studies
advanced visualization techniques . Then web clustering engines
entirely fulfills the promise of being the page Rank of the future.

3/23/2019 14
References
 http://clusty.com
 http://credo.fub.it
 www.google.com
 http://credino.demi.uniud.it

3/23/2019 15
Thank You…

3/23/2019 16

You might also like