0% found this document useful (0 votes)
68 views5 pages

Web Mining Techniques and Methodologies

The document summarizes previous work done in the field of web usage mining and proposes a new methodology. It begins with an introduction to web mining and its categories. Notable contributions are reviewed, including methods for data preprocessing, clickstream analysis, conceptual hierarchies, dynamic clustering of sessions, and sequential pattern mining. The proposed methodology aims to improve existing techniques by considering both client-side and server-side log data to obtain more effective user navigational patterns. The expected outcome is a more accurate technique for discovering and analyzing web log data.

Uploaded by

a100rabh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
68 views5 pages

Web Mining Techniques and Methodologies

The document summarizes previous work done in the field of web usage mining and proposes a new methodology. It begins with an introduction to web mining and its categories. Notable contributions are reviewed, including methods for data preprocessing, clickstream analysis, conceptual hierarchies, dynamic clustering of sessions, and sequential pattern mining. The proposed methodology aims to improve existing techniques by considering both client-side and server-side log data to obtain more effective user navigational patterns. The expected outcome is a more accurate technique for discovering and analyzing web log data.

Uploaded by

a100rabh
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd

content 1)Introduction 2) Brief review of the work done in the related field 3) ) Noteworthy contributions 4) Proposed methodolo y !

) "#pected outcome $) %eferences

Introduction World wide web is the interactive and popular medium to distribute information today.Data on the web is rapidly increasing day by day and it is huge, diverse and dynamic.These properties of web data provides a complex surrounding to analysts for analysis and predicting the data for retrieving usefull information from

it. So for this purpose in 199 , !t"ioni who first coined the term web minin which is an important area of data mining that deals with the extraction of interesting #nowledge from the world wide web. $t is the application of data mining techni%ues to find interesting and potentially useful #nowledge from web data. Data mining mainly deals with structured data organi"ed in a database while web mining deals with structured and& or unstructured data. $t is normally expected that web mining uses the hyperlin# structure of the web or the web log data or both in the mining process. #osala and 'loc#eel who perform research in the area of web mining and suggest three web mining categories depending on which #ind of data to be mined that is mining for information or mining the web lin# structure or mining user navigation patterns. (ining for information focuses on the development of techni%ues for assisting a user in finding documents that meet a certain crieterion that is web content minin . web content mining refers to the discovery of useful information from web contents including text, image, audio, video etc. (ining the lin# structure aims at developing techni%ues to ta#e advantage of the collective )udgement of web page %uality which is available in the form of hyperlin#s that is web structure minin & web structure mining tries to discover the model underlying the lin# structure of the web. (odel is based on the topology of hyperlin#s with or without description of lin#s. *inally, mining for user navigation patterns focuses on techni%ues which study the user behavior when navigating the web that is web usa e minin . web usage mining refers to discovery of user access patterns from web servers. web usage data include data from web server access logs, proxy server logs, browser logs, user profiles , registration data , user session and transactions, coo#ies, user %ueries, boo#mar# data, mouse clic#s and scrolls or any other data as result of interaction.we can say that its aim is to obtain information that may assist web site reorgani"ation or assist site adaptationto better suit the user. There are many web log analysis tools available to mine data from log record on web page. +og record contains plenty of useful information such as ,-+, $. address and time and so on. /naly"ing and discovering log help organi"ations to find more potential customers, pages popularity 0 number of times a has been visited1 etc that can help in reorgani"ing the web site for fast and easy customer access, improving lin#s and navigation, attracting more advertisement capital by intelligent adverts, turning viewers into customers by better site architecture and monitoring the efficiency of the web site. Brief review of the work done in the field (ost data used for mining is collected from web servers, cleints, proxy servers, or server databases, all of them produces noisy data. 'ecause web mining is sensitive to noise, data cleaning methods are necesaary. 2aideep srivastava and -.3ooley categori"e data preprocessing into sub tas#s and noted that the final outcome of preprocessing should be data that allows identification of a particular users browsing pattern in the form of page views, sessions, and clic# streams. 3lic# streams are of particular interest because they allow reconstruction of user navigational patterns. (ar#ov models have been extensively used to model web users navigation behaviours on web sites.2ianhan 4hu, 2un 5ong proposed a clustering algorithm called citation cluster to cluster conceptually related pages. The clustering results are used to construct a conceptual hierarchy of the web site. (a#ov model based lin# prediction is integrated with the hierarchy to assist users navigation on the web site.

$n the previous six years collection of users navigation session were presented in the form of many models such as 5ypertext probabilistic grammar, 678ram model, Dynamic c#ustering based mar#ov model etc. Web access patern tree 0W/.7tree1 stores the highly compressed access se%uences, and mining fre%uent access se%uences based on wap7tree needs to scan transaction database twice. 5o ever , producing conditional W/.7tree repeatedly in the algorithm influences the efficiency in a certain degree. 3onsidering the shortage of W/.7tree combined with the need of mining maximal access se%uences, T/6 9iao%iu, :/; (in improves W/.7tree and introduces restrained seb tree structure to solve the problem that a mass of conditional W/. tree is built in the traditional algorithm, (any researchers have developed web usage mining 0W,(1 algorithms utili"ing web log records in order to discover useful #nowledge to be used in supporting bussiness applications and decision ma#ing. The %uality of W,( in #now#edge discovery, however, depends in the algorithm as well as on the data. This research by :u75ui Tao, T"ung7.ei 5ong explores a new data source called intensional browsing data 0$'D1 for potentially improving the effectiveness of W,( applications. $D' is a category of online browsing actions such as <copy<, <scroll<, or <save as< and is not recorded in web log files. 3onse%uently, the research aims to build a basic understanding of $'D which will lead its easy adoption in W,( research and practice. -ecently, a number of web usage mining algorithms have been proposed to mining user navigation behaviour. .artitioning method was one of the earliest clustering methods to be used in web usage mining. Web based recommender systems are very helpful in directing the users ti the target pages in particular web sites. Web usage mining recommender systems have been proposed to predict users intention and their navigation behaviour. We can ta#e into account the semantic #nowledge about underlying domain to improve the %uality of the recomendation. $ntegrating semantic web and web usage mining can achieve best recomendation in the dynamic huge web sites. .rediction of user future movements and intentions based on the users clic#stream data. (ehrdad 2alali, 6orwati (ustapha develop a model for online predicting through web usage mining system and propose an approach for classifying user navigation pattern to predict users future intentions. The approach is based on the using longest common subse%uences algorithn to classify current user activities to predict user next movement. The %uality of recomendations in the current systems to predict user future re%uests in a particular web site is below satisfaction. To effectively provide online peprediction ( 2alali , 6 (ustapha have developed a recomendation system called web.,(, an online prediction using web usage mining system an propose a novel approach for classifying user navigation patterns to predict users future intentions. The approach is based on the new graph .artitioning algorithm to model user navigation patterns for the navigation patterns mining phase, *urthermore longest common subse%uence algorithm is used for classifying current user activities to predict user next movement.

Noteworthy contributions in the related field 'aideep srivastav and %&(ooley)(ethod developed is statistical /nalysis /ssociation rules are for the personali"ation site modification. 'ianhan *hu)cluster algorithm called citation cluster is developed to construct a conceptual hierarchy of the website. Bor es and +&,evene) Dynamic clustering based method is developed for representing a collection of user web navigation sessions. -.N /iao0iu1 2ao +in) They developed a improved W/.7tree for se%uential pattern mining. 2u)3ui -ao1 -4un )pei 3on )Developed a Taxonomy of browsed data for decision support. +ehdi 3osseini) Developed a web based recommender system for predict ing users intention and their navigation behaviour. +ehrdad 'alali)Developed a +ongest common subse%uences algorithm and Web.,( for predicting user near future movement.

Proposed methodolo y $n the field of web usage mining , significant amount of research has been done by providing various techni%ues for the purpose of discovery and analysis of web log data. (ost of the techni%ues that were proposed considers either client side or server side log data which restricts the system. So for achieving

effective, efficient and better result the system should consider both client side and server side log data. The proposed methodology would consider following procedure 7 11 *inding the shortcoming of proposed methodology by comparing the previous systems with this proposed system. =1 /fter finding the shortcoming, the system will enter in the designing phase where the priority would be to recover the shortcomings found during first step. >1 /nalysis of the designed system on the basis of proposed methodology. ?1 .roving of the methodology proposed after the analysis.

"#pected outcome 'y means of proposed methodology , our main goal is to improve an existing techni%ue by enhancing the source of input in terms of considering both client side as well as server side logs so that the users navigational patterns recieved would be more effective and accurate.

You might also like