CO5 notes
CO5 notes
Definition Process of extracting information from Spatial Data Process of extracting temporal
relationships and patterns
Techniques Spatial Association Rules, Spatial Regression Time Series Analysis, Temporal
Analysis, Clustering, etc. Association Mining
Tools Python libraries like GeoPandas, sci-kit-learn, and MATLAB, R, Python, etc.
Packages like sp and raster in R.
Applications Urban Planning, Transportation, Public Health, Crime Forecasting, Anomaly Detection
Analysis, etc.
Challenges Spatial Autocorrelation, Scale and Image Resolution Temporal Dependencies and
Handling Irregular Data.
Multimedia data mining:
Multimedia mining is a subfield of data mining that is used to find interesting information of implicit
knowledge from multimedia databases. Mining in multimedia is referred to as automatic annotation or
annotation mining. Mining multimedia data requires two or more data types, such as text and video or
text video and audio.
Multimedia data mining is an interdisciplinary field that integrates image processing and understanding,
computer vision, data mining, and pattern recognition. Multimedia data mining discovers interesting
patterns from multimedia databases that store and manage large collections of multimedia objects,
including image data, video data, audio data, sequence data and hypertext data containing text, text
markups, and linkages. Issues in multimedia data mining include content-based retrieval and similarity
search, generalization and multidimensional analysis. Multimedia data cubes contain additional
dimensions and measures for multimedia information.
The framework that manages different types of multimedia data stored, delivered, and utilized in
different ways is known as a multimedia database management system. There are three classes of
multimedia databases: static, dynamic, and dimensional media. The content of the Multimedia Database
management system is as follows:
Text mining:
Text mining, also known as text data mining, is the process of transforming unstructured text
into a structured format to identify meaningful patterns and new insights. You can use text
mining to analyze vast collections of textual materials to capture key concepts, trends and
hidden relationships.
By applying advanced analytical techniques, such as Naïve Bayes, Support Vector Machines
(SVM), and other deep learning algorithms, companies are able to explore and discover hidden
relationships within their unstructured data.
Text is a one of the most common data types within databases. Depending on the database, this
data can be organized as:
Structured data: This data is standardized into a tabular format with numerous rows and
columns, making it easier to store and process for analysis and machine learning algorithms.
Structured data can include inputs such as names, addresses, and phone numbers.
Unstructured data: This data does not have a predefined data format. It can include text from
sources, like social media or product reviews, or rich media formats like, video and audio files.
Semi-structured data: As the name suggests, this data is a blend between structured and
unstructured data formats. While it has some organization, it doesn’t have enough structure to
meet the requirements of a relational database. Examples of semi-structured data include XML,
JSON and HTML files.
Text mining vs. text analytics
• The terms, text mining and text analytics, are largely synonymous in meaning in
conversation, but they can have a more nuanced meaning. Text mining and text analysis
identifies textual patterns and trends within unstructured data through the use of machine
learning, statistics, and linguistics. By transforming the data into a more structured format
through text mining and text analysis, more quantitative insights can be found through text
analytics. Data visualization techniques can then be harnessed to communicate findings to
wider audiences.
Text mining techniques
The process of text mining comprises several activities that enable you to deduce information
from unstructured text data. Before you can apply different text mining techniques, you must
start with text preprocessing, which is the practice of cleaning and transforming text data into
a usable format. This practice is a core aspect of natural language processing (NLP) and it
usually involves the use of techniques such as language identification, tokenization, part-of-
speech tagging, chunking, and syntax parsing to format data appropriately for analysis. When
text preprocessing is complete, you can apply text mining algorithms to derive insights from
the data. Some of these common text mining techniques include:
Information retrieval
Information retrieval (IR) returns relevant information or documents based on a pre-defined
set of queries or phrases. IR systems utilize algorithms to track user behaviors and identify
relevant data. Information retrieval is commonly used in library catalogue systems and popular
search engines, like Google. Some common IR sub-tasks include:
• Tokenization: This is the process of breaking out long-form text into sentences and
words called “tokens”. These are, then, used in the models, like bag-of-words, for text
clustering and document matching tasks.
• Stemming: This refers to the process of separating the prefixes and suffixes from words
to derive the root word form and meaning. This technique improves information retrieval by
reducing the size of indexing files.
Natural language processing (NLP)
Natural language processing, which evolved from computational linguistics, uses methods
from various disciplines, such as computer science, artificial intelligence, linguistics, and data
science, to enable computers to understand human language in both written and verbal forms.
By analyzing sentence structure and grammar, NLP sub-tasks allow computers to “read”.
Common sub-tasks include:
Summarization: This technique provides a synopsis of long pieces of text to create a concise,
coherent summary of a document’s main points.
Part-of-Speech (PoS) tagging: This technique assigns a tag to every token in a document based
on its part of speech—that is, denoting nouns, verbs, adjectives, and so on. This step enables
semantic analysis on unstructured text.
Text categorization: This task, which is also known as text classification, is responsible for
analyzing text documents and classifying them based on predefined topics or categories. This
sub-task is particularly helpful when categorizing synonyms and abbreviations.
Sentiment analysis: This task detects positive or negative sentiment from internal or external
data sources, allowing you to track changes in customer attitudes over time. It is commonly
used to provide information about perceptions of brands, products, and services. These insights
can propel businesses to connect with customers and improve processes and user experiences.
Information extraction
Information extraction (IE) surfaces the relevant pieces of data when searching various
documents. It also focuses on extracting structured information from free text and storing these
entities, attributes, and relationship information in a database. Common information extraction
sub-tasks include:
Feature selection, or attribute selection, is the process of selecting the important features
(dimensions) to contribute the most to output of a predictive analytics model.
Feature extraction is the process of selecting a subset of features to improve the accuracy of a
classification task. This is particularly important for dimensionality reduction.
Named-entity recognition (NER) also known as entity identification or entity extraction, aims
to find and categorize specific entities in text, such as names or locations. For example, NER
identifies “California” as a location and “Mary” as a woman’s name.
Text mining applications:
Text analytics software has impacted the way that many industries work, allowing them to
improve product user experiences as well as make faster and better business decisions. Some
use cases include:
Customer service: There are various ways in which we solicit customer feedback from our
users. When combined with text analytics tools, feedback systems, such as chatbots, customer
surveys, NPS (net-promoter scores), online reviews, support tickets, and social media profiles,
enable companies to improve their customer experience with speed. Text mining and sentiment
analysis can provide a mechanism for companies to prioritize key pain points for their
customers, allowing businesses to respond to urgent issues in real-time and increase customer
satisfaction. Learn how Verizon is using text analytics in customer service.
Risk management: Text mining also has applications in risk management, where it can provide
insights around industry trends and financial markets by monitoring shifts in sentiment and by
extracting information from analyst reports and whitepapers. This is particularly valuable to
banking institutions as this data provides more confidence when considering business
investments across various sectors. Learn how CIBC and EquBot are using text analytics for
risk mitigation.
Maintenance: Text mining provides a rich and complete picture of the operation and
functionality of products and machinery. Over time, text mining automates decision making by
revealing patterns that correlate with problems and preventive and reactive maintenance
procedures. Text analytics helps maintenance professionals unearth the root cause of challenges
and failures faster.
Healthcare: Text mining techniques have been increasingly valuable to researchers in the
biomedical field, particularly for clustering information. Manual investigation of medical
research can be costly and time-consuming; text mining provides an automation method for
extracting valuable information from medical literature.
Spam filtering: Spam frequently serves as an entry point for hackers to infect computer systems
with malware. Text mining can provide a method to filter and exclude these e-mails from
inboxes, improving the overall user experience and minimizing the risk of cyber-attacks to end
users.
Web Mining:
Web mining is the process of extracting valuable information from the vast data available on
the World Wide Web. The internet is an enormous repository of information, and web mining
techniques allow organizations to leverage this data for various purposes, such as marketing,
customer relationship management, and business intelligence. In this article, we will answer
some questions, such as, what is web mining, what is the process of web mining in data mining,
what are applications of web mining, and how web mining is different from data mining.
What is Web Mining?
Web mining refers to the process of discovering and extracting useful information from a large
amount of data available on the World Wide Web. It involves applying various data mining
techniques to web data to identify patterns, trends, and relationships. Web mining is a
multidisciplinary field that combines techniques from data mining, machine learning, artificial
intelligence, statistics, and information retrieval.
One example of web mining is to analyze website traffic and user behavior. By analyzing
clickstream data and other user interactions with a website, organizations can gain insights into
how users navigate their site, what content is most popular, and where users are dropping off.
This information can be used to optimize website design and improve user experience.
Web mining is broadly classified into three categories based on the type of data being analyzed
and the techniques used for analysis, as shown below -
• Web Content Mining -
Web content mining is the process of extracting useful information from web pages, including
text, images, and multimedia content. This involves techniques such as text mining, natural
language processing, and image analysis. Web content mining can be used to extract structured
and unstructured data from web pages, including product descriptions, reviews, and user-
generated content. The extracted information can be used for various purposes, such as
sentiment analysis, product recommendation, and opinion mining.
• Web Structure Mining -
Web structure mining focuses on analyzing the web structure and the relationships between
web pages. This includes analyzing links between pages, identifying communities of pages,
and detecting patterns in website design. Web structure mining techniques are used to improve
search engine results, identify authoritative pages, and detect web spam.
• Web Usage Mining -
Web usage mining involves analyzing user behavior on the web, including clickstream data,
search queries, and other interactions with web pages. Web usage mining can help identify user
preferences, behavior patterns, and trends. This information can be used to personalize content,
improve website design, and target advertising. Web usage mining can also be used for security
purposes, such as detecting fraud and identifying potential security threats.
Applications of Web Mining
Web mining has numerous applications in various fields, including business, marketing, e-
commerce, education, healthcare, and more. Some common applications of web mining include
-
Marketing and Advertising -
Web mining is used to analyze consumer behavior, identify trends, and personalize marketing
campaigns. This includes targeted advertising, product recommendation, and customer
segmentation.
Business Intelligence -
Web mining is used to extract valuable insights from web data, including competitor analysis,
market trends, and customer preferences.
E-commerce -
Web mining is used to analyze user behavior on e-commerce websites, including purchase
history, search queries, and clickstream data. This information can be used to optimize website
design, personalize product recommendations, and improve customer experience.
Fraud Detection -
Web mining is used to detect fraudulent activities, such as credit card fraud, identity theft, and
online scams. This includes analyzing user behavior patterns, detecting anomalies, and
identifying potential security threats.
Social Network Analysis -
Web mining is used to analyze social media data and identify social networks, communities,
and influencers. This information can be used to understand social dynamics, sentiment
analysis, and targeted advertising.
Process of Web Mining
The process of web mining typically involves the following steps -
Data collection -
Web data is collected from various sources, including web pages, databases, and APIs.
Data pre-processing -
The collected data is pre-processed to remove irrelevant information, such as advertisements
and duplicate content.
Data integration -
The pre-processed data is integrated and transformed into a structured format for analysis.
Pattern discovery -
Web mining techniques are applied to identify patterns, trends, and relationships.
Evaluation -
The discovered patterns are evaluated to determine their significance and usefulness.
• Visualization -
The analysis results are visualized through graphs, charts, and other visualizations.
Databases, data warehouses, and other Web pages, weblogs, social media, and other
Data Source
data repositories web-related data sources
Applications Marketing, finance, healthcare, etc. E-commerce, social media, search engines, etc.
Data quality, scalability, and privacy Data heterogeneity, ambiguity, and dynamic
Challenges
concerns nature of the web