0% found this document useful (0 votes)

19 views44 pages

Type of Data

Types of data

Uploaded by

Jaimin Sathavara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views44 pages

Type of Data

Types of data

Uploaded by

Jaimin Sathavara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 2 “Fundamentals of Business Analytics”

Types of Digital Data RN Prasad and Seema Acharya

• Today, data undoubtedly is an invaluable asset of any enterprise

(big or small). Even though professionals work with data all the
time, the understanding, management and analysis of data from
heterogeneous sources remains a serious challenge.
• In this lecture, the various formats of digital data (structured,
semi-structured and unstructured data), data storage mechanism,
data access methods, management of data, the process of
extracting desired information from data, challenges posed by
various formats of data, etc. will be explained.
• Data growth has seen exponential acceleration since the advent
of the computer and Internet.
Digital Data

In fact, the computer and Internet duo has imparted the digital form to data.
Digital data can be classified into three forms:
– Unstructured
– Semi-structured
– Structured

• Usually, data is in the unstructured format which makes extracting

information from it difficult.
• According to Merrill Lynch, 80–90% of business data is either unstructured
or semi-structured.
• Gartner also estimates that unstructured data constitutes 80% of the whole
enterprise data.
Formats of Digital Data

Here is a percent distribution of the three forms of data -

Data Forms Defined-
Unstructured data: This is the data which does not conform to a
data model or is not in a form which can be used easily by a
computer program. About 80—90% data of an organization is in
this format; for example, memos, chat rooms, PowerPoint
presentations, images, videos, letters, researches, white papers, body
of an email, etc.
Semi-structured data: This is the data which does not conform to a
data model but has some structure. However, it is not in a form
which can be used easily by a computer program; for example,
emails, XML, markup languages like HTML, etc. Metadata for this
data is available but is not sufficient.
Structured data: This is the data which is in an organized form
(e.g., in rows and columns) and can be easily used by a computer
program. Relationships exist between entities of data, such as
classes and their objects. Data stored in databases is an example of
structured data.
Unstructured Data
Unstructured Data – Getting to Know
• Dr. Ben, Dr. Stanley, and Dr. Mark work at the medical facility of “GoodLife”. Over
the past few days, Dr. Ben and Dr. Stanley had been exchanging long emails about a
particular case of testinal problem. Dr. Stanley has chanced upon a particular
combination of drugs that has cured gastro-intestinal disorders in his patients. He has
written an email about this combination of drugs to Dr. Ben.
• Dr. Mark has a patient in the “GoodLife” emergency unit with quite a similar case of
gastro-intestinal disorder whose cure Dr. Stanley has chanced upon. Dr. Mark has already
tried regular drugs but with no positive results so far. He quickly searches the
organization's database for answers, but with no luck. The information he wants is tucked
away in the email conversation between two other “GoodLife” doctors, Dr. Ben and Dr.
Stanley. Dr. Mark would have accessed the solution with few mouse clicks had the
storage and analysis of unstructured data been undertaken by “GoodLife”.
• As is the case at “GoodLife”, 80-85% of data in any organization is unstructured and
is an alarming rate. An enormous amount of knowledge is buried in this data. In the
above Stanley's email to Dr. Ben had not been successfully updated into the medical
system in the unstructured format.
• Unstructured data, thus, is the one which cannot be stored in the form of rows and as
in a database and does not conform to any data model, i.e. it is difficult to determine the
meaning of the data. It does not follow any rules or semantics. It can be of any type and
is hence unpredictable.
Characteristics of Unstructured Data

Does not
conform to any
data model
Cannot be
stored in form
Has no easily of rows and
identifiable columns as in a
structure database

Unstructured
data

Not in any
Does not particular
follow any rule format or
or semantics sequence
Not easily
usable by a
program
Where does Unstructured Data Come from?

Web pages

Memos

Videos (MPEG, etc.)

Images (JPEG, GIF, etc.)

Body of an e-mail

Unstructured data Word document

PowerPoint presentations

Chats

Reports

Whitepapers

Surveys
Where does Unstructured Data Come from?
Broadly speaking, anything in a non-database form is unstructured
data.

It can be classified into two broad categories:

• Bitmap objects : For example, image, video, or audio files.
• Textual objects : For example, Microsoft Word documents,
emails, or Microsoft Excel spread-sheets.

Refer to figure in the previous slide - Let us take the above example
of the email communication between Dr. Ben and Dr. Stanley. Even
though email messages like the ones exchanged by Dr. Ben and Dr.
Stanley are organized in databases such as Microsoft Exchange or
Lotus Notes, the body of the email is essentially raw data, i.e. free form
text without any structure.
 A lot of unstructured data is also noisy text such as chats, emails and
SMS texts.
 The language of noisy text differs significantly from the standard
form of language.
A Myth Demystified

• Web pages are said to be unstructured data even though they

are defined by HTML, a markup language which has a rich
structure.
• HTML is solely used for rendering and presentations.
• The tagged elements do not capture the meaning of the data
that the HTML page contains. This makes it difficult to
automatically process the information in the HTML page.
• Another characteristic that makes web pages unstructured data
is that they usually carry links and references to external
unstructured content such as images, XML files, etc.
How to Manage Unstructured Data?
Let us look at a few generic tasks to be performed to enable storage and search of unstructured data:
Indexing: Let us go back to our understanding of the Relational Database Management
System(RDBMS). In this system, data is indexed to enable faster search and retrieval. On the basis
of some value in the data, index is defined which is nothing but an identifier and represents the large
record in the data set. In the absence of an index, the whole data set/ document will be scanned for
retrieving the desired information. In the case of unstructured data too, indexing helps in searching
and retrieval. Based on text or some other attributes, e.g. file name, the unstructured data is indexed.
Indexing in unstructured data is difficult because neither does this data have any predefined attributes
nor does it follow any pattern or naming conventions. Text can be indexed based on a text string but
in case of non-text based files, e.g. audio/video, etc., indexing depends on file names. This becomes a
hindrance when naming conventions are not being followed.
Tags/Metadata:: Using metadata, data in a document, etc. can be tagged. This enables search and
retrieval. But in unstructured data, this is difficult as little or no metadata is available. Structure of
data has to be determined which is very difficult as the data itself has no particular format and is
coming from more than one source.
Classification/Taxonomy: Taxonomy is classifying data on the basis of the relationships that exist
between data. Data can be arranged in groups and placed in hierarchies based on the taxonomy
prevalent in an organization. However, classifying unstructured data is difficult asidentifying
relationships between data is not an easy task. In the absence of any structure ormetadata or schema,
identifying accurate relationships and classifying is not easy. Since the datais unstructured, naming
conventions or standards are not consistent across an organization, thusmaking it difficult to classify
[Link] (Content Addressable Storage): It stores data based on their metadata. It assigns 2
uniquename to every object stored in it. The object is retrieved based on its content and not its
[Link] is used extensively to store emails, etc.
How to Store Unstructured Data?

Sheer volume of unstructured data and its unprecedented

Storage growth makes it difficult to store. Audios, videos, images,
Space etc. acquire huge amount of storage space

Scalability becomes an issue with increase

Scalability in unstructured data

Retrieving and recovering unstructured

Retrieve data are cumbersome
information
Challenges faced
Ensuring security is difficult due to varied
Security sources of data (e.g. e-mail, web pages)

Update and Updating, deleting, etc. are not easy due to

delete the unstructured form

Indexing
and Indexing becomes difficult with increase in data.
searching Searching is difficult for non-text data
How to Store Unstructured Data?
Unstructured data may be be converted to formats which are easily
Change managed, stored and searched. For example, IBM is working on
formats providing a solution which converts audio , video, etc. to text

Create hardware which support unstructured data

New either compliment the existing storage devices or be a
hardware stand alone for unstructured data

Store in relational databases which support

RDBMS/
Possible solutions BLOBs
BLOBs which is Binary Large Objects

XML Store in XML which tries to give some structure to

unstructured data by using tags and elements

CAS Organize files based on their metadata

How to Extract Information from Unstructured
Data?
Unstructured data is not easily interpreted by conventional
Interpretation search algorithms

As the data grows it is not possible to put tags

Tags manually

Designing algorithms to understand the meaning

Indexing of the document and then tag or index them
accordingly is difficult
Challenges faced
Deriving Computer programs cannot automatically derive
meaning meaning/structure from unstructured data

File formats Increasing number of file formats make it difficult to

interpret data

Classification/ Different naming conventions followed across the

Taxonomy organization make it difficult to classify data.
How to Extract Information from Unstructured
Data?
Unstructured data can be stored in a virtual repository and be
Tags automatically tagged. For example, Documentum provides this
type of solution

Text mining tools help in grouping and classifying

Text mining unstructured data and analyze by considering
grammar, context, synonyms ,etc.

Application platforms like XOLAP help

Application extract information from e-mail and XML
Possible solutions platforms based documents

Classification/ Taxonomies within the organization can be

Taxonomy managed automatically to organize data in
hierarchical structures

Naming conventions/ Following naming conventions or standards

standards across an organization can greatly improve
storage and retrieval
UIMA
 UIMA (Unstructured Information Management Architecture) is an open
source platform from IBM which integrates different kinds of analysis
engines to provide a complete solution for edge discovery from
unstructured data.
 In UIMA, the analysis engines integration and analysis of unstructured
information and bridge the gap between structured and unstructured data.
 UIMA stores information in a structured format. The structured resources
can be mined, searched, and put to other uses. The information obtained
from structured sources is also for sub-sequent analysis of unstructured
data.
 Various analysis engines analyze unstructured data in different ways such
as:
– Breaking up of documents into separate words.
– Grouping and classifying according to taxonomy.
– Detecting parts of speech, grammar, and synonyms.
– Detecting events and times.¢ Detecting relationships between various elements.
Further Reading

• [Link]
• [Link]
61_2
• [Link]
[Link]
• [Link]
html
Answer a Quick Question

Ask the participants of the learning program to state some more examples of
Unstructured data
Do it Exercise

Search, think and write about two best practices for managing the growth of
unstructured data
Semi-structured Data
Semi-structured Data
• Semi-structured data does not conform to any data model i.e. it is difficult to
determine the meaning of data neither can data be stored in rows and columns as
in a database but semi-structured data has tags and markers which help to group
data and describe how data is stored, giving some metadata but it is not sufficient
for management and automation of data.

• Similar entities in the data are grouped and organized in a hierarchy. The
attributes or the properties within a group may or may not be the same. For
example two addresses may or may not contain the same number of properties as
in
Address 1
<house number><street name><area name><city>
Address 2
<house number><street name><city>
• For example an e-mail follows a standard format
To: <Name>
From: <Name>
Subject: <Text>
CC: <Name>
Body: <Text, Graphics, Images etc. >
• The tags give us some metadata but the body of the e-mail contains no format
neither is such which conveys meaning of the data it contains.
• There is very fine line between unstructured and semi-structured data.
What is Semi-structured Data?
Does not
conform to a
data model but
contains tags &
elements
(metadata) Cannot be
stored in form
Similar entities
of rows and
are grouped
columns as in a
database
Semi-
structured
data

Attributes in a The tags and

group may not elements
be the same describe how
data is stored

Not sufficient
Metadata
Where does Semi-structured Data Come from?

E-mail

XML

TCP/IP packets

Zipped files
Semi-structured
data
Binary
executables

Mark-up languages

Integration of data from

heterogeneous sources
How to Manage Semi-structured Data?

Some ways in which semi-structured data is managed and stored

Graph-based data
Schemas XML
models

• Describe the • Contain data on • Models the data

structure and the leaves of the using tags and
content of data to graph. Also known elements
some extent as ‘schema less’

• Assign meaning to • Used for data • Schemas are not

data hence exchange among tightly coupled to
allowing automatic heterogeneous data
search and sources
indexing
How to Store Semi-structured Data?

Storing data with their schemas increases cost

Storage cost

Semi-structured data cannot be stored in

RDBMS existing RDBMS as data cannot be mapped
into tables directly

Irregular and Some data elements may have extra

partial structure information while others none at all

Challenges faced
In many cases the structure is implicit.
Implicit structure Interpreting relationships and
correlations is very difficult

Schemas keep changing with

Evolving schemas requirements making it difficult to
capture it in a database

Distinction between Vague distinction between schema and data exists at times
schema and data making it difficult to capture data
How to Store Semi-structured Data?

XML allows to define tags and attributes to store data.

Data can be stored in a hierarchical/nested structure
XML

Semi-structured data can be stored in a relational

database by mapping the data to a relational
RDBMS schema which is then mapped to a table

Possible solutions
Special Databases which are specifically designed to store
purpose semi-structured data
DBMS

OEM Data can be stored and exchanged in the form of graph

where entities are represented as objects which are the
vertices in a graph
How to Extract Information from Semi-structured Data?

Semi-structured is usually stored in flat

files which are difficult to index and
Flat files search

Data comes from varied sources which is

Heterogeneous difficult to tag and search
Challenges faced sources

Incomplete/ Extracting structure when there is none and

irregular interpreting the relations existing in the structure
structure which is present is a difficult task
How to Extract Information from Semi-structured Data?

Indexing data in a graph-based model

Indexing enables quick search

Allows data to be stored in a graph-based data

OEM model which is easier to index and search

Possible solutions

XML Allows data to be arranged in a hierarchical or

tree-like structure which enables indexing and
searching

Mining Various mining tools are available which search

tools data based on graphs, schemas, structure, etc.
XML – A Solution for Semi-structured Data Management

XML Extensible MarkUp Language

Open-source mark up language written in plain text.

What is XML? It is hardware and software independent

Designed to store and transport data over the

Does what? Internet

It allows data to be stored in a hierarchical/nested

How? structure. It allows user to define tags to store the
data
XML – A Solution for Semi-structured Data Management

XML has no predefined tags

<message>
<to> XYZ </to>
<from> ABC </from>
<subject> Greetings </subject>
<body> Hello! How are you? </body>
</message>

The words in the <> (angular brackets) are user-defined tags

XML is known as self-describing as data can exist without a schema and
schema can be added later
Schema can be described in XSLT or XML schema
Further Reading

• [Link]
• [Link]
• [Link]
html
• [Link]
264550,[Link]
• [Link]
gci1252122,[Link]
Answer a Quick Question

What is your take on this….

A Web Page is unstructured. If yes, why?

Structured Data
Structured Data
• Structured data is organized in semantic chunks
(entities)
• Similar entities are grouped together (relations or
classes)
• Entities in the same group have the same
descriptions (attributes)
• Descriptions for all entities in a group (schema)
have the same defined format
have a predefined length
are all present
and follow the same order
What Is Structured Data?

Conforms to a
data model
Data is stored in
form of rows and
Similar entities columns
are grouped (e.g., relational
database)

Structured
data

Attributes in a Data resides in

group are the fixed fields within
same a record or file

Definition, format
& meaning of data
is explicitly
known
Where does Structured Data Come from?

Databases (e.g., Access)

Spreadsheets

Structured Data
SQL

OLTP systems
Structured Data: Everything in its Place

Fully described datasets

Clearly defined categories and sub-categories

Data neatly placed in rows and columns

Data that goes into the records is regulated by a well-defined structure

Indexing can be easily done either by the DBMS itself or manually

Structured Data

Semi-structured Structured

Name E-mail First Name Last Name E-mail Id Alternate E-

mail Id

Patrick Wood ptw@[Link], Patrick Wood ptw@[Link] [Link]@ym

[Link]@[Link] [Link] [Link]

First name: Mark MarkT@[Link] Mark Taylor MarkT@dcs.

Last name: Taylor [Link]

Alex Bourdoo AlexBourdoo@[Link].a Alex Bourdoo AlexBourdoo

[Link] @[Link].a
[Link]
Ease with Structured Data-Storage

Data types – both defined and user defined help

Storage with the storage of structured data

Scalability is not generally an issue with

Scalability increase in data

Ease with structured

data
Security

Update and Updating, deleting, etc. is easy due to

delete structured form
Ease with Structured Data-Retrieval

Retrieve A well-defined structure helps in easy

information retrieval of data

Data can be indexed based not only on a

Indexing and text string but other attributes as well. This
searching enables streamlined search

Ease with structured

data
Structured data can be easily mined and
Mining data knowledge can be extracted from it

BI works extremely well with structured data.

BI operations Hence data mining, warehousing, etc. can be
easily undertaken
Further Readings

• [Link]
• [Link]
Do it Exercise

Think and write about an instance where data was presented to you in
Unstructured, semi-structured and structured data format
Summary please…

Ask a few participants of the learning program to summarize the lecture.

Understanding Unstructured Data
No ratings yet
Understanding Unstructured Data
82 pages
Understanding Digital Data Formats
No ratings yet
Understanding Digital Data Formats
5 pages
Bussiness Analytics Chep-2
No ratings yet
Bussiness Analytics Chep-2
36 pages
Big Data Programming Essentials
No ratings yet
Big Data Programming Essentials
80 pages
Understanding Structured vs Unstructured Data
No ratings yet
Understanding Structured vs Unstructured Data
3 pages
Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
No ratings yet
Database Data: Definition - Unstructured Data Is A Generic Label For Describing Any Corporate Information That Is Not
14 pages
Module 1
No ratings yet
Module 1
27 pages
Understanding Data Types: A Guide
No ratings yet
Understanding Data Types: A Guide
16 pages
Chapter 2-Converted BI
No ratings yet
Chapter 2-Converted BI
39 pages
Unit 1
No ratings yet
Unit 1
62 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
79 pages
Big Data: Structured, Semi-Structured, Unstructured
No ratings yet
Big Data: Structured, Semi-Structured, Unstructured
36 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
32 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
38 pages
BIG DATA System: Big Data and Analytics by Seema Acharya and Subhashini Chellappan
No ratings yet
BIG DATA System: Big Data and Analytics by Seema Acharya and Subhashini Chellappan
62 pages
BigData 1
No ratings yet
BigData 1
14 pages
UNIT 1 INTRODUCTION TO BIGDATA by MIT
No ratings yet
UNIT 1 INTRODUCTION TO BIGDATA by MIT
12 pages
Types of Digital Data
No ratings yet
Types of Digital Data
26 pages
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
No ratings yet
Unit I Types of Digital Data: CO1: Explain About Big Data Paradigm
37 pages
Chapter 2
67% (3)
Chapter 2
39 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Types of Digital Data Explained
68% (19)
Types of Digital Data Explained
40 pages
Big Data - Unit-1 - KCS-061
No ratings yet
Big Data - Unit-1 - KCS-061
63 pages
Understanding Data Types in Big Data Analytics
No ratings yet
Understanding Data Types in Big Data Analytics
5 pages
Business Intelligence - Concepts
100% (3)
Business Intelligence - Concepts
162 pages
Big Data Unit-1 Kcs-061
No ratings yet
Big Data Unit-1 Kcs-061
64 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Bi Mid 1
No ratings yet
Bi Mid 1
173 pages
44 Recognizing Your Data Types: Structured and Unstructured Data
No ratings yet
44 Recognizing Your Data Types: Structured and Unstructured Data
8 pages
CH 2
No ratings yet
CH 2
42 pages
Big Data and Analytics Cse448 Module 1 L
No ratings yet
Big Data and Analytics Cse448 Module 1 L
38 pages
02-Types of Digital Data
No ratings yet
02-Types of Digital Data
33 pages
Big Data Aktu Unit 1
No ratings yet
Big Data Aktu Unit 1
85 pages
Data and Data Storage
No ratings yet
Data and Data Storage
29 pages
1 - Data and Organizations
No ratings yet
1 - Data and Organizations
5 pages
Sources of Digital Data
No ratings yet
Sources of Digital Data
34 pages
Overview of DW & Big Data
No ratings yet
Overview of DW & Big Data
34 pages
Managing Unstructured Data Challenges
No ratings yet
Managing Unstructured Data Challenges
2 pages
Unit 1-2
No ratings yet
Unit 1-2
78 pages
Types of Digital Data
No ratings yet
Types of Digital Data
19 pages
Big Data and Business Analytics
No ratings yet
Big Data and Business Analytics
76 pages
Understanding Digital Data Types
No ratings yet
Understanding Digital Data Types
40 pages
Module 1
No ratings yet
Module 1
40 pages
Bi - Unit 1
No ratings yet
Bi - Unit 1
382 pages
Structured, Semi-Structured and Unstructured Data (M-2)
No ratings yet
Structured, Semi-Structured and Unstructured Data (M-2)
3 pages
Big Data & Analytics (CSE448) L1
No ratings yet
Big Data & Analytics (CSE448) L1
51 pages
Chapter 1 - Types of Digital Data
No ratings yet
Chapter 1 - Types of Digital Data
18 pages
Assignment On Business Analytics
No ratings yet
Assignment On Business Analytics
6 pages
Structured and Unstructured Data
No ratings yet
Structured and Unstructured Data
3 pages
Data Management Basics Explained
No ratings yet
Data Management Basics Explained
17 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Structured vs. Unstructured Data Explained
No ratings yet
Structured vs. Unstructured Data Explained
9 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
80 pages
Christ Lecture 1 and 2 - Semistructured, Structured and Unstructured Data
No ratings yet
Christ Lecture 1 and 2 - Semistructured, Structured and Unstructured Data
130 pages
Unit - I Part I
No ratings yet
Unit - I Part I
48 pages
Types of Digital Data Explained
No ratings yet
Types of Digital Data Explained
33 pages
Introduction To Data
No ratings yet
Introduction To Data
13 pages
Understanding Data Types & Big Data
No ratings yet
Understanding Data Types & Big Data
12 pages
QB3
No ratings yet
QB3
16 pages
QB 2
No ratings yet
QB 2
21 pages
AISE Ch12
No ratings yet
AISE Ch12
31 pages
Questions
No ratings yet
Questions
18 pages
Chapter 6 Fund Distribution and Channel Management Practices
No ratings yet
Chapter 6 Fund Distribution and Channel Management Practices
12 pages
Managerial Economics and The Analysis of Public Goods (1) - 2
No ratings yet
Managerial Economics and The Analysis of Public Goods (1) - 2
18 pages
Hypothesis Testing RJ
No ratings yet
Hypothesis Testing RJ
42 pages
Text Classification and Processing Using NLP
No ratings yet
Text Classification and Processing Using NLP
21 pages
Big Data Overview and Implications
No ratings yet
Big Data Overview and Implications
9 pages
Data Architecture A Primer For The Data Scientist A Primer For The Data Scientist 2nd Edition W.H. Inmon Online PDF
100% (1)
Data Architecture A Primer For The Data Scientist A Primer For The Data Scientist 2nd Edition W.H. Inmon Online PDF
135 pages
TextMining PAKDD1999
No ratings yet
TextMining PAKDD1999
7 pages
Agentic RAG: Survey on AI Advancements
No ratings yet
Agentic RAG: Survey on AI Advancements
39 pages
YOLOv5 for Document Layout Detection
No ratings yet
YOLOv5 for Document Layout Detection
14 pages
Automation Hero - SDR - PLAYBOOK V3.2
No ratings yet
Automation Hero - SDR - PLAYBOOK V3.2
34 pages
Laudon Mis15 PPT Ch11
No ratings yet
Laudon Mis15 PPT Ch11
24 pages
Unit-2 Types of Digital Data
No ratings yet
Unit-2 Types of Digital Data
41 pages
Module-1 Chapter 2
No ratings yet
Module-1 Chapter 2
26 pages
Service Design Worksheet 2024
100% (1)
Service Design Worksheet 2024
12 pages
Actionable Intelligence For Sales Call Recordings
No ratings yet
Actionable Intelligence For Sales Call Recordings
5 pages
Unit 1 - BD - Introduction To Big Data
No ratings yet
Unit 1 - BD - Introduction To Big Data
83 pages
Introduction to Big Data Concepts
100% (2)
Introduction to Big Data Concepts
33 pages
Multimedia Data Mining Research Papers
No ratings yet
Multimedia Data Mining Research Papers
6 pages
Generative AI in Intelligence Analysis
No ratings yet
Generative AI in Intelligence Analysis
42 pages
Leveraging Text Mining for Business Insights
No ratings yet
Leveraging Text Mining for Business Insights
1 page
Big Data Seminar S
No ratings yet
Big Data Seminar S
2 pages
Market Guide For Intelligent Document Processing Solutions 757528 NDX
No ratings yet
Market Guide For Intelligent Document Processing Solutions 757528 NDX
42 pages
A Information Retrieval Based On Questio
No ratings yet
A Information Retrieval Based On Questio
23 pages
NLP in Medical
No ratings yet
NLP in Medical
11 pages
Big Data Explosion
No ratings yet
Big Data Explosion
9 pages
Challenges in Implementing MIS Systems
No ratings yet
Challenges in Implementing MIS Systems
4 pages
Data Science & Big Data Basics
No ratings yet
Data Science & Big Data Basics
29 pages
Handle With Open Ended Questionnaire
No ratings yet
Handle With Open Ended Questionnaire
7 pages
DBMS Basics for CSIT Students
No ratings yet
DBMS Basics for CSIT Students
10 pages
Elemental Impurities Risk Management PDF
100% (1)
Elemental Impurities Risk Management PDF
8 pages
Agentic RAG with ApertureDB & SmolAgents
No ratings yet
Agentic RAG with ApertureDB & SmolAgents
26 pages
Big Data: Beginning With Capture, Organize, Integrate, Analyze, and Act
100% (1)
Big Data: Beginning With Capture, Organize, Integrate, Analyze, and Act
23 pages

Type of Data

Uploaded by

Type of Data

Uploaded by

Chapter 2 “Fundamentals of Business Analytics”

Types of Digital Data RN Prasad and Seema Acharya

• Today, data undoubtedly is an invaluable asset of any enterprise

• Usually, data is in the unstructured format which makes extracting

Here is a percent distribution of the three forms of data -

Videos (MPEG, etc.)

Images (JPEG, GIF, etc.)

Unstructured data Word document

It can be classified into two broad categories:

• Web pages are said to be unstructured data even though they

Sheer volume of unstructured data and its unprecedented

Scalability becomes an issue with increase

Retrieving and recovering unstructured

Update and Updating, deleting, etc. are not easy due to

Create hardware which support unstructured data

Store in relational databases which support

XML Store in XML which tries to give some structure to

CAS Organize files based on their metadata

As the data grows it is not possible to put tags

Designing algorithms to understand the meaning

File formats Increasing number of file formats make it difficult to

Classification/ Different naming conventions followed across the

Text mining tools help in grouping and classifying

Application platforms like XOLAP help

Classification/ Taxonomies within the organization can be

Naming conventions/ Following naming conventions or standards

Attributes in a The tags and

Integration of data from

Some ways in which semi-structured data is managed and stored

• Describe the • Contain data on • Models the data

• Assign meaning to • Used for data • Schemas are not

Storing data with their schemas increases cost

Semi-structured data cannot be stored in

Irregular and Some data elements may have extra

Schemas keep changing with

XML allows to define tags and attributes to store data.

Semi-structured data can be stored in a relational

OEM Data can be stored and exchanged in the form of graph

Semi-structured is usually stored in flat

Data comes from varied sources which is

Incomplete/ Extracting structure when there is none and

Indexing data in a graph-based model

Allows data to be stored in a graph-based data

XML Allows data to be arranged in a hierarchical or

Mining Various mining tools are available which search

XML Extensible MarkUp Language

Open-source mark up language written in plain text.

Designed to store and transport data over the

It allows data to be stored in a hierarchical/nested

XML has no predefined tags

The words in the <> (angular brackets) are user-defined tags

What is your take on this….

A Web Page is unstructured. If yes, why?

Attributes in a Data resides in

Databases (e.g., Access)

Fully described datasets

Clearly defined categories and sub-categories

Data neatly placed in rows and columns

Data that goes into the records is regulated by a well-defined structure

Indexing can be easily done either by the DBMS itself or manually

Name E-mail First Name Last Name E-mail Id Alternate E-

Patrick Wood ptw@[Link], Patrick Wood ptw@[Link] [Link]@ym

First name: Mark MarkT@[Link] Mark Taylor MarkT@dcs.

Alex Bourdoo AlexBourdoo@[Link].a Alex Bourdoo AlexBourdoo

Data types – both defined and user defined help

Scalability is not generally an issue with

Ease with structured

Update and Updating, deleting, etc. is easy due to

Retrieve A well-defined structure helps in easy

Data can be indexed based not only on a

Ease with structured

BI works extremely well with structured data.

Ask a few participants of the learning program to summarize the lecture.

You might also like