0% found this document useful (0 votes)
5 views11 pages

CO5 notes

The document discusses various data mining models, including spatial, temporal, multimedia, and text mining, highlighting their definitions, techniques, applications, and tools. Spatial data mining focuses on extracting knowledge from geospatial data, while temporal data mining analyzes patterns over time. Multimedia and text mining involve processing and analyzing multimedia content and unstructured text data, respectively, to uncover insights and relationships.

Uploaded by

geethakanna13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

CO5 notes

The document discusses various data mining models, including spatial, temporal, multimedia, and text mining, highlighting their definitions, techniques, applications, and tools. Spatial data mining focuses on extracting knowledge from geospatial data, while temporal data mining analyzes patterns over time. Multimedia and text mining involve processing and analyzing multimedia content and unstructured text data, respectively, to uncover insights and relationships.

Uploaded by

geethakanna13
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

CO5: execute various data mining models using tools

Multidimensional analysis: Spatial data mining – Temporal data mining – Multimedia


data mining - Text mining – Web Mining – Tools: R Programming and python.

Spatial Data Mining


Spatial data mining is a specialized subfield of data mining that deals with extracting knowledge from
spatial data.
Spatial data refers to data that is associated with a particular location or geography.
Examples of spatial data include maps, satellite images, GPS data, and other geospatial information.
Spatial data mining involves analyzing and discovering patterns, relationships, and trends in this data
to gain insights and make informed decisions.
The use of spatial data mining has become increasingly important in various fields, such as logistics,
environmental science, urban planning, transportation, and public health. By analyzing spatial data,
researchers and data mining professionals can identify correlations, predict future events, and make
informed decisions that can have a significant impact.
For instance, a transportation company can optimize its delivery routes for faster and more efficient
deliveries using spatial data mining techniques. They can analyze their delivery data along with other
spatial data, such as traffic flow, road network, and weather patterns, to identify the most efficient routes
for each delivery.
Types of Spatial Data
Different types of spatial data are used in spatial data mining. These include point data, line data, and
polygon data.
• Point Data
Point data represents a single location or a set of locations on a map. Each point is
defined by its x and y coordinates, representing its position in the geographic space.
Point data is commonly used to represent geographic features such as cities, landmarks,
or specific locations of interest. Examples of point data in transportation include
delivery locations, bus stops, or railway stations.
• Line Data
Line data represents a linear feature, such as a road, a river, or a pipeline, on a map.
Each line is defined by a set of vertices, which represent the start and end points of the
line. Line data is commonly used to represent `transportation networks, such as roads,
highways, or railways. Line data is also used in other areas, such as hydrology, geology,
or ecology, to represent streams, faults, or animal migration routes.
• Polygon Data
Polygon data represents a closed shape or an area on a map. Each polygon is defined
by a set of vertices that connect to form a closed boundary. Polygon data is commonly
used to represent administrative boundaries, land use, or demographic data. In
transportation, polygon data can be used to represent areas of interest, such as delivery
zones or traffic zones.
In summary, point data represents a single location, line data represents a linear feature, and polygon
data represents an area or a closed shape.
Applications of Spatial Data Mining
The following are some of the applications of spatial data mining:
Urban Planning
Spatial Data Mining is used by urban planners to analyze and improve urban dynamics. It can be used
to enhance urban growth, improve transportation systems, and refine decisions about land.
Public Health
Spatial Data Mining plays an important role in public health research. It is used to develop strategies to
identify diseases, track the spread of infections, and optimize healthcare resources.
Transportation
Spatial Data Mining can be used to identify traffic patterns, prevent congestion, manage the
transportation network, and optimize transportation routes.
Environmental Management
Spatial Data Mining also contributes to environmental management by detecting changes in the
environment, identifying the land at risk, conserving water and biodiversity, and monitoring natural
resources.
Crime Analysis
Spatial Data Mining can be used to identify crime hotspots, understand crime patterns and develop
proper strategies to prevent crimes and hence improve public safety.
Temporal data mining:
Temporal data mining defines the process of extraction of non-trivial, implicit, and potentially essential
data from large sets of temporal data. Temporal data are a series of primary data types, generally
numerical values, and it deals with gathering beneficial knowledge from temporal data.
The objective of temporal data mining is to find temporal patterns, unexpected trends, or several hidden
relations in the higher sequential data, which is composed of a sequence of nominal symbols from the
alphabet referred to as a temporal sequence and a sequence of continuous real-valued components called
a time series, by utilizing a set of approaches from machine learning, statistics, and database
technologies.
Temporal data mining is composed of three major works such as the description of temporal data,
representation of similarity measures, and mining services.
Temporal Data Mining includes processing time series, generally sequences of data, which compute
values of the same attribute at a sequence of multiple time points. Pattern matching using such
information, where it is searching for specific patterns of interest, has attracted considerable interest in
current years.
Temporal Data Mining can include the exploitation of efficient techniques of data storage, quick
processing, and quick retrieval methods that have been advanced for temporal databases.
Temporal data mining is an individual phase in the process of knowledge discovery in temporal
databases that calculate temporal patterns from or fit models too, temporal data is a temporal data
mining algorithm.
Temporal data mining is concerned with the analysis of temporal data and for discovering temporal
patterns and consistencies in sets of temporal information. It also allows the possibility of computer-
driven, automatic exploration of the data. There are various tasks in temporal mining which are as
follows −
• Data characterization and comparison
• Clustering analysis
• Classification
• Association rules
• Pattern analysis
• Prediction and trend analysis
Temporal data mining has led to a new way of interacting with a temporal database and specifying
queries at a much more abstract level than say, temporal structured query language permits. It also
facilities data exploration for problems that are due to multiple and multi-dimensionality.
The basic goal of temporal classification is to predict temporally related fields in a temporal database
based on other fields. The problem, in general, is cast as deciding the general value of the temporal
variable being predicted given the different fields, the training data in which the target variable is given
for each observation, and a set of assumptions representing one’s prior knowledge of the problem.
Temporal classification techniques are associated with the complex problem of density estimation.
Spatial Data Mining Vs Temporal Data Mining

Basis Spatial Data Mining Temporal Data Mining

Definition Process of extracting information from Spatial Data Process of extracting temporal
relationships and patterns

Data Spatial information or coordinates Temporal information or timestamps


Characteristics

Techniques Spatial Association Rules, Spatial Regression Time Series Analysis, Temporal
Analysis, Clustering, etc. Association Mining

Tools Python libraries like GeoPandas, sci-kit-learn, and MATLAB, R, Python, etc.
Packages like sp and raster in R.

Applications Urban Planning, Transportation, Public Health, Crime Forecasting, Anomaly Detection
Analysis, etc.

Challenges Spatial Autocorrelation, Scale and Image Resolution Temporal Dependencies and
Handling Irregular Data.
Multimedia data mining:
Multimedia mining is a subfield of data mining that is used to find interesting information of implicit
knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or
annotation mining. Mining multimedia data requires two or more data types, such as text and video or
text video and audio.

Multimedia data mining is an interdisciplinary field that integrates image processing and understanding,
computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting
patterns from multimedia databases that store and manage large collections of multimedia objects,
including image data, video data, audio data, sequence data and hypertext data containing text, text
markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity
search, generalization and multidimensional analysis. Multimedia data cubes contain additional
dimensions and measures for multimedia information.

The framework that manages different types of multimedia data stored, delivered, and utilized in
different ways is known as a multimedia database management system. There are three classes of
multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database
management system is as follows:

• Media data:The actual data representing an object.


• Media format data: Information such as sampling rate, resolution, encoding scheme etc., about
the format of the media data after it goes through the acquisition, processing and encoding
phase.
• Media keyword data:Keywords description relating to the generation of data. It is also known
as content descriptive data. Example: date, time and place of recording.
• Media feature data: Content dependent data such as the distribution of colours, kinds of texture
and different shapes present in data.
Types of Multimedia Applications
Types of multimedia applications based on data management characteristics are:
1. Repository applications: A Large amount of multimedia data and meta-data (Media format
date, Media keyword data, Media feature data) that is stored for retrieval purposes, e.g.,
Repository of satellite images, engineering drawings, radiology scanned pictures.
2. Presentation applications: They involve delivering multimedia data subject to temporal
constraints. Optimal viewing or listening requires DBMS to deliver data at a certain rate,
offering the quality of service above a certain threshold. Here data is processed as it is delivered.
Example: Annotating of video and audio data, real-time editing analysis.
3. Collaborative work using multimedia information involves executing a complex task by
merging drawings and changing notifications. Example: Intelligent healthcare network.

Uses of Multimedia Data Mining:


Multimedia mining is evaluating huge amounts of multimedia data for mining patterns on the basis of
statistical relationships. Here are the different uses of multimedia mining.
Digital Library: The place where all the digital data is maintained and stored is termed a digital library.
And to store every piece or type of data, it is important to convert the file into various information
formats like images, texts, audio, video, etc. Data mining techniques are very important at the time of
converting files or data into multimedia files and then storing them in the digital library.
Medical Analysis: Multimedia data mining is very essential for the medical analysis of medical images.
With the help of various multimedia mining techniques, the process of image classification is carried
out. This helps in analyzing, identifying, and auto-localizing X-rays, ECG reports, MRI scans, 3D CT
scans, reports of brain tumors, and much more. This has made analyzing medical reports quick and easy.
Traffic Video Sequences: Multimedia mining is also essential for determining essential knowledge or
information left unidentified previously from traffic video sequences. With the help of multimedia
mining, detailed mining, and analysis is possible based on the traffic flow, queue temporal relations,
and identification of the vehicle at the time of intersection. This gives a commercial approach to
monitoring the traffic regularly.
Media Making and Broadcasting: Media making and broadcasting include TV channels and radio
stations. These broadcasting companies also use multimedia data mining for creating and monitoring
the content they broadcast and search for efficient content for making the approaches competent and
well-organized. This also improves the data quality wise.
Customer Vision: To your surprise, multimedia mining is also helpful in gathering the opinions,
complaints, preferences, and satisfaction levels of the customers for any services or products. It not only
helps in collecting the data, but also helps in storing, managing, and analyzing the data for improving
the product or service in the future. For this audio data is collected through call centers where executives
receive data by making calls to the customers.
Surveillance System: Surveillance system relates to collecting, summarizing, and analyzing videos,
audiovisuals, and audio for obtaining information from particular areas like multinational companies,
banks, government organizations, shopping malls, agricultural areas, highways, forests, etc. With the
help of multimedia mining, the entire process goes smoothly. Again, it is also important for security
purposes which makes it an ideal choice for private companies, police, and the military.
Apart from the above-mentioned uses, Multimedia data mining is also popularly used as ICS. Intelligent
Content Service is a smart way of storing, recognizing, and managing data and other software services
that help in enhancing the relationship between computing systems and information workers. The entire
process is carried out by sensing the content, understanding the requests of the user, and recognizing
the data or content. Hence, multimedia mining helps in effective and advanced content mining of the
given data.

Text mining:
Text mining, also known as text data mining, is the process of transforming unstructured text
into a structured format to identify meaningful patterns and new insights. You can use text
mining to analyze vast collections of textual materials to capture key concepts, trends and
hidden relationships.
By applying advanced analytical techniques, such as Naïve Bayes, Support Vector Machines
(SVM), and other deep learning algorithms, companies are able to explore and discover hidden
relationships within their unstructured data.
Text is a one of the most common data types within databases. Depending on the database, this
data can be organized as:
Structured data: This data is standardized into a tabular format with numerous rows and
columns, making it easier to store and process for analysis and machine learning algorithms.
Structured data can include inputs such as names, addresses, and phone numbers.
Unstructured data: This data does not have a predefined data format. It can include text from
sources, like social media or product reviews, or rich media formats like, video and audio files.
Semi-structured data: As the name suggests, this data is a blend between structured and
unstructured data formats. While it has some organization, it doesn’t have enough structure to
meet the requirements of a relational database. Examples of semi-structured data include XML,
JSON and HTML files.
Text mining vs. text analytics
• The terms, text mining and text analytics, are largely synonymous in meaning in
conversation, but they can have a more nuanced meaning. Text mining and text analysis
identifies textual patterns and trends within unstructured data through the use of machine
learning, statistics, and linguistics. By transforming the data into a more structured format
through text mining and text analysis, more quantitative insights can be found through text
analytics. Data visualization techniques can then be harnessed to communicate findings to
wider audiences.
Text mining techniques
The process of text mining comprises several activities that enable you to deduce information
from unstructured text data. Before you can apply different text mining techniques, you must
start with text preprocessing, which is the practice of cleaning and transforming text data into
a usable format. This practice is a core aspect of natural language processing (NLP) and it
usually involves the use of techniques such as language identification, tokenization, part-of-
speech tagging, chunking, and syntax parsing to format data appropriately for analysis. When
text preprocessing is complete, you can apply text mining algorithms to derive insights from
the data. Some of these common text mining techniques include:
Information retrieval
Information retrieval (IR) returns relevant information or documents based on a pre-defined
set of queries or phrases. IR systems utilize algorithms to track user behaviors and identify
relevant data. Information retrieval is commonly used in library catalogue systems and popular
search engines, like Google. Some common IR sub-tasks include:
• Tokenization: This is the process of breaking out long-form text into sentences and
words called “tokens”. These are, then, used in the models, like bag-of-words, for text
clustering and document matching tasks.
• Stemming: This refers to the process of separating the prefixes and suffixes from words
to derive the root word form and meaning. This technique improves information retrieval by
reducing the size of indexing files.
Natural language processing (NLP)
Natural language processing, which evolved from computational linguistics, uses methods
from various disciplines, such as computer science, artificial intelligence, linguistics, and data
science, to enable computers to understand human language in both written and verbal forms.
By analyzing sentence structure and grammar, NLP sub-tasks allow computers to “read”.
Common sub-tasks include:
Summarization: This technique provides a synopsis of long pieces of text to create a concise,
coherent summary of a document’s main points.
Part-of-Speech (PoS) tagging: This technique assigns a tag to every token in a document based
on its part of speech—that is, denoting nouns, verbs, adjectives, and so on. This step enables
semantic analysis on unstructured text.
Text categorization: This task, which is also known as text classification, is responsible for
analyzing text documents and classifying them based on predefined topics or categories. This
sub-task is particularly helpful when categorizing synonyms and abbreviations.
Sentiment analysis: This task detects positive or negative sentiment from internal or external
data sources, allowing you to track changes in customer attitudes over time. It is commonly
used to provide information about perceptions of brands, products, and services. These insights
can propel businesses to connect with customers and improve processes and user experiences.
Information extraction
Information extraction (IE) surfaces the relevant pieces of data when searching various
documents. It also focuses on extracting structured information from free text and storing these
entities, attributes, and relationship information in a database. Common information extraction
sub-tasks include:
Feature selection, or attribute selection, is the process of selecting the important features
(dimensions) to contribute the most to output of a predictive analytics model.
Feature extraction is the process of selecting a subset of features to improve the accuracy of a
classification task. This is particularly important for dimensionality reduction.
Named-entity recognition (NER) also known as entity identification or entity extraction, aims
to find and categorize specific entities in text, such as names or locations. For example, NER
identifies “California” as a location and “Mary” as a woman’s name.
Text mining applications:
Text analytics software has impacted the way that many industries work, allowing them to
improve product user experiences as well as make faster and better business decisions. Some
use cases include:
Customer service: There are various ways in which we solicit customer feedback from our
users. When combined with text analytics tools, feedback systems, such as chatbots, customer
surveys, NPS (net-promoter scores), online reviews, support tickets, and social media profiles,
enable companies to improve their customer experience with speed. Text mining and sentiment
analysis can provide a mechanism for companies to prioritize key pain points for their
customers, allowing businesses to respond to urgent issues in real-time and increase customer
satisfaction. Learn how Verizon is using text analytics in customer service.
Risk management: Text mining also has applications in risk management, where it can provide
insights around industry trends and financial markets by monitoring shifts in sentiment and by
extracting information from analyst reports and whitepapers. This is particularly valuable to
banking institutions as this data provides more confidence when considering business
investments across various sectors. Learn how CIBC and EquBot are using text analytics for
risk mitigation.
Maintenance: Text mining provides a rich and complete picture of the operation and
functionality of products and machinery. Over time, text mining automates decision making by
revealing patterns that correlate with problems and preventive and reactive maintenance
procedures. Text analytics helps maintenance professionals unearth the root cause of challenges
and failures faster.
Healthcare: Text mining techniques have been increasingly valuable to researchers in the
biomedical field, particularly for clustering information. Manual investigation of medical
research can be costly and time-consuming; text mining provides an automation method for
extracting valuable information from medical literature.
Spam filtering: Spam frequently serves as an entry point for hackers to infect computer systems
with malware. Text mining can provide a method to filter and exclude these e-mails from
inboxes, improving the overall user experience and minimizing the risk of cyber-attacks to end
users.
Web Mining:
Web mining is the process of extracting valuable information from the vast data available on
the World Wide Web. The internet is an enormous repository of information, and web mining
techniques allow organizations to leverage this data for various purposes, such as marketing,
customer relationship management, and business intelligence. In this article, we will answer
some questions, such as, what is web mining, what is the process of web mining in data mining,
what are applications of web mining, and how web mining is different from data mining.
What is Web Mining?
Web mining refers to the process of discovering and extracting useful information from a large
amount of data available on the World Wide Web. It involves applying various data mining
techniques to web data to identify patterns, trends, and relationships. Web mining is a
multidisciplinary field that combines techniques from data mining, machine learning, artificial
intelligence, statistics, and information retrieval.
One example of web mining is to analyze website traffic and user behavior. By analyzing
clickstream data and other user interactions with a website, organizations can gain insights into
how users navigate their site, what content is most popular, and where users are dropping off.
This information can be used to optimize website design and improve user experience.
Web mining is broadly classified into three categories based on the type of data being analyzed
and the techniques used for analysis, as shown below -
• Web Content Mining -
Web content mining is the process of extracting useful information from web pages, including
text, images, and multimedia content. This involves techniques such as text mining, natural
language processing, and image analysis. Web content mining can be used to extract structured
and unstructured data from web pages, including product descriptions, reviews, and user-
generated content. The extracted information can be used for various purposes, such as
sentiment analysis, product recommendation, and opinion mining.
• Web Structure Mining -
Web structure mining focuses on analyzing the web structure and the relationships between
web pages. This includes analyzing links between pages, identifying communities of pages,
and detecting patterns in website design. Web structure mining techniques are used to improve
search engine results, identify authoritative pages, and detect web spam.
• Web Usage Mining -
Web usage mining involves analyzing user behavior on the web, including clickstream data,
search queries, and other interactions with web pages. Web usage mining can help identify user
preferences, behavior patterns, and trends. This information can be used to personalize content,
improve website design, and target advertising. Web usage mining can also be used for security
purposes, such as detecting fraud and identifying potential security threats.
Applications of Web Mining
Web mining has numerous applications in various fields, including business, marketing, e-
commerce, education, healthcare, and more. Some common applications of web mining include
-
Marketing and Advertising -
Web mining is used to analyze consumer behavior, identify trends, and personalize marketing
campaigns. This includes targeted advertising, product recommendation, and customer
segmentation.
Business Intelligence -
Web mining is used to extract valuable insights from web data, including competitor analysis,
market trends, and customer preferences.
E-commerce -
Web mining is used to analyze user behavior on e-commerce websites, including purchase
history, search queries, and clickstream data. This information can be used to optimize website
design, personalize product recommendations, and improve customer experience.
Fraud Detection -
Web mining is used to detect fraudulent activities, such as credit card fraud, identity theft, and
online scams. This includes analyzing user behavior patterns, detecting anomalies, and
identifying potential security threats.
Social Network Analysis -
Web mining is used to analyze social media data and identify social networks, communities,
and influencers. This information can be used to understand social dynamics, sentiment
analysis, and targeted advertising.
Process of Web Mining
The process of web mining typically involves the following steps -
Data collection -
Web data is collected from various sources, including web pages, databases, and APIs.
Data pre-processing -
The collected data is pre-processed to remove irrelevant information, such as advertisements
and duplicate content.
Data integration -
The pre-processed data is integrated and transformed into a structured format for analysis.
Pattern discovery -
Web mining techniques are applied to identify patterns, trends, and relationships.
Evaluation -
The discovered patterns are evaluated to determine their significance and usefulness.
• Visualization -
The analysis results are visualized through graphs, charts, and other visualizations.

Difference Between Data Mining and Web Mining


Parameter Data Mining Web Mining

The process of discovering patterns in


Definition The process of discovering patterns in web data
large datasets

Databases, data warehouses, and other Web pages, weblogs, social media, and other
Data Source
data repositories web-related data sources

Data Structured, semi-structured, and


Mostly unstructured data
Characteristics unstructured data

Clustering, classification, association Text mining, natural language processing, image


Techniques
rules, regression, etc. analysis, link analysis, etc.

Applications Marketing, finance, healthcare, etc. E-commerce, social media, search engines, etc.

Data quality, scalability, and privacy Data heterogeneity, ambiguity, and dynamic
Challenges
concerns nature of the web

You might also like