0% found this document useful (0 votes)
35 views41 pages

Chapter 1

big data and business intelligence chapter 1

Uploaded by

lalisagutama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views41 pages

Chapter 1

big data and business intelligence chapter 1

Uploaded by

lalisagutama
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 41

FUNDAMENTALS OF BIG DATA

AND BUSINESS INTELLIGENCE

CHAPTER ONE -INTRODUCTION TO BIG DATA


2 CONTENTS

 What is Big Data?


 Handling and Processing Big Data
 Methodological Challenges and Problems
 Business Intelligence
3 WHAT IS BIG DATA?

 Big Data is a field dedicated to the analysis, processing, and


storage of large collections of data that frequently originate from
disparate sources.
 Big Data solutions and practices are typically required when
traditional data analysis, processing and storage technologies and
techniques are insufficient.
 The management and analysis of large datasets has been a long-
standing problem from labor intensive approaches of early census
efforts to the actuarial science behind the calculations of insurance
premiums.
 Big Data science has evolved from these roots.
4 WHAT IS BIG DATA?

 The analysis of Big Data datasets is an interdisciplinary endeavor


that blends mathematics, statistics, computer science and subject
matter expertise.
 Data within Big Data environments generally accumulates from
being amassed within the enterprise via applications, sensors and
external sources.
 Data processed by a Big Data solution can be used by enterprise
applications directly or can be fed into a data warehouse to enrich
existing data there.
5 WHAT IS BIG DATA?

 The results obtained through the processing of Big Data can lead
to a wide range of insights and benefits, such as:
 operational optimization
 identification of new markets
 accurate predictions
 fault and fraud detection
 more detailed records
 improved decision-making
 scientific discoveries
6 WHAT IS BIG DATA?

 As a starting point, several fundamental concepts and terms need


to be defined and understood.
 Datasets : collections or groups of related data are generally
referred to as datasets.
 Data Analysis : data analysis is the process of examining data to
find facts, relationships, patterns, insights and/or trends.
 The overall goal of data analysis is to support better decision
making.
 A simple data analysis example is the analysis of ice cream sales
data in order to determine how the number of ice cream cones
sold is related to the daily temperature.
7 WHAT IS BIG DATA?

 Data Analytics : Data analytics is a broader term that


encompasses data analysis.
 Data analytics is a discipline that includes the management of the
complete data lifecycle, which encompasses collecting, cleansing,
organizing, storing, analyzing and governing data.
 The term includes the development of analysis methods, scientific
techniques and automated tools.
8 WHAT IS BIG DATA?

 There are four general categories of analytics that are


distinguished by the results they produce:
 descriptive analytics
 diagnostic analytics
 predictive analytics
 prescriptive analytics
 The different analytics types leverage different techniques and
analysis algorithms.
9 WHAT IS BIG DATA?

 This implies that there may be varying data, storage and


processing requirements to facilitate the delivery of multiple types
of analytic results.
 The generation of high value analytic results increases the
complexity and cost of the analytic environment.
10 WHAT IS BIG DATA?

 Descriptive Analytics
 Descriptive analytics are carried out to answer questions about
events that have already occurred.
 This form of analytics contextualizes data to generate information.
 Sample questions can include:
 What was the sales volume over the past 12 months?
 What is the monthly commission earned by each sales agent?
 It is estimated that 80% of generated analytics results are
descriptive in nature.
 The reports are generally static in nature and display historical
data that is presented in the form of data grids or charts.
11 WHAT IS BIG DATA?

 Diagnostic Analytics
 Diagnostic analytics aim to determine the cause of a phenomenon
that occurred in the past using questions that focus on the reason
behind the event.
 The goal of this type of analytics is to determine what information
is related to the phenomenon in order to enable answering
questions that seek to determine why something has occurred.
 Such questions include:
 Why were Q2 sales less than Q1 sales?
 Why was there an increase in patient re-admission rates over the past
three months?
12 WHAT IS BIG DATA?

 Predictive Analytics
 Predictive analytics are carried out in an attempt to determine the
outcome of an event that might occur in the future.
 With predictive analytics, information is enhanced with meaning to
generate knowledge that conveys how that information is related.
 Questions are usually formulated using a what-if rationale, such as
the following:
 What are the chances that a customer will default on a loan if
they have missed a monthly payment?
 What will be the patient survival rate if Drug B is administered
instead of Drug A?
13 WHAT IS BIG DATA?

 Prescriptive Analytics
 Prescriptive analytics build upon the results of predictive analytics
by prescribing actions that should be taken.
 The focus is not only on which prescribed option is best to follow,
but why.
 In other words, prescriptive analytics provide results that can be
reasoned about because they embed elements of situational
understanding.
 Sample questions may include:
 Among three drugs, which one provides the best results?
 When is the best time to trade a particular stock?
14 BUSINESS INTELLIGENCE

 BI enables an organization to gain insight into the performance of


an enterprise by analyzing data generated by its business
processes and information systems.
 The results of the analysis can be used by management to steer
the business in an effort to correct detected issues or otherwise
enhance organizational performance.
 BI applies analytics to large amounts of data across the enterprise,
which has typically been consolidated into an enterprise data
warehouse to run analytical queries.
15 DIFFERENT TYPES OF DATA

 The data processed by Big Data solutions can be human-generated


or machine-generated, although it is ultimately the responsibility
of machines to generate the analytic results.
 Human-generated data is the result of human interaction with
systems, such as online services and digital devices.
 Machine-generated data is generated by software programs and
hardware devices in response to real-world events.
 An example of machine-generated data would be information
conveyed from the numerous sensors in a cellphone that may be
reporting information, including position and cell tower signal
strength.
16 DIFFERENT TYPES OF DATA

 Structured data
 Structured data conforms to a data model or schema and is often
stored in tabular form.
 It is used to capture relationships between different entities and is
therefore most often stored in a relational database.
 Structured data is frequently generated by enterprise applications.
 Due to the abundance of tools and databases that natively support
structured data, it rarely requires special consideration in regards
to processing or storage.
 Examples of this type of data include banking transactions,
invoices, and customer records.
17 DIFFERENT TYPES OF DATA

 Unstructured Data
 Data that does not conform to a data model or data schema is known
as unstructured data.
 It is estimated that unstructured data makes up 80% of the data
within any given enterprise.
 Unstructured data has a faster growth rate than structured data.
 This form of data is either textual or binary and often conveyed via
files that are self-contained and non-relational.
 A text file may contain the contents of various tweets or blog
postings.
 Binary files are often media files that contain image, audio or video
data.
18 DIFFERENT TYPES OF DATA

 Semi-structured
 Semi-structured data has a defined level of structure and
consistency, but is not relational in nature.
 Instead, semi-structured data is hierarchical or graph-based.
 This kind of data is commonly stored in files that contain text.
19 CHARACTERISTICS OF BIG DATA

 For a dataset to be considered Big Data, it must possess one or


more characteristics that require accommodation in the solution
design and architecture of the analytic environment.
20 CHARACTERISTICS OF BIG DATA

 Volume
 The anticipated volume of data that is processed by Big Data
solutions is substantial and ever-growing.
 High data volumes impose distinct data storage and processing
demands, as well as additional data preparation, curation and
management processes.
 Typical data sources that are responsible for generating high data
volumes can include:
 online transactions, such as point-of-sale and banking
 sensors, such as GPS sensors, smart meters and telematics
 social media, such as Facebook and Twitter
21 CHARACTERISTICS OF BIG DATA

 Velocity
 In Big Data environments, data can arrive at fast speeds, and
enormous datasets can accumulate within very short periods of
time.
 From an enterprise’s point of view, the velocity of data translates
into the amount of time it takes for the data to be processed once
it enters the enterprise’s perimeter.
 Coping with the fast inflow of data requires the enterprise to
design highly elastic and available data processing solutions and
corresponding data storage capabilities.
22 CHARACTERISTICS OF BIG DATA

 Variety
 Data variety refers to the multiple formats and types of data that
need to be supported by Big Data solutions.
 Data variety brings challenges for enterprises in terms of data
integration, transformation, processing, and storage.
23 CHARACTERISTICS OF BIG DATA

 Veracity
 Veracity refers to the quality or fidelity of data.
 Data that enters Big Data environments needs to be assessed for
quality, which can lead to data processing activities to resolve
invalid data and remove noise.
 In relation to veracity, data can be part of the signal or noise of a
dataset.
 Data with a high signal-to-noise ratio has more veracity than data
with a lower ratio.
 Data that is acquired in a controlled manner, for example via online
customer registrations, usually contains less noise than data
acquired via uncontrolled sources, such as blog postings.
24 CHARACTERISTICS OF BIG DATA

 Value
 Value is defined as the usefulness of data for an enterprise.
 The value characteristic is intuitively related to the veracity
characteristic in that the higher the data fidelity, the more value it
holds for the business.
 The longer it takes for data to be turned into meaningful
information, the less value it has for a business.
25 HANDLING AND PROCESSING BIG DATA

 Big data management is the systematic organization,


administration as well as governance of massive amount of data.
 The process includes management of both structured and
unstructured data.
 The primary objective is to ensure the data is of high quality and
accessible for business intelligence along with big data analytics
application.
 The data involves several terabytes or even petabytes of data that
has been saved in a broad range of file formats.
 Effective data management enables the organization to find
valuable information with ease irrespective of how large or
unstructured the data is.
26 HANDLING AND PROCESSING BIG DATA

 Here are some ways of effectively handle big data


 Outline your goal
 The first tick on the checklist when it comes to handling Big Data is
knowing what data to gather and the data that need to be
collected.
 To do this one has to determine clearly defined goals.
 Failure to accomplish this will lead one to gather large amount of
data which is not aligned with business’ continuous requirements.
27 HANDLING AND PROCESSING BIG DATA

 Secure the data


 The next step in managing Big data is to ensure the relevant data
collected is secured with a broad range of measures.
 To ensure the data secured is both accessible and secure, it must
be protected by firewall security measures, spam filtering,
malware scanning and elimination, along with most importantly
team permission control.
 It is wise not to take data management in lightly since securing
organizational data is the highest priority in Big Data
management.
28 HANDLING AND PROCESSING BIG DATA

 Keep the data protected


 A database is susceptible to threats from not just human
influences and synthetic anomalies, but also is prone to damage
from the elements of nature such as heat, humidity, and extreme
cold.
 All of which can easily corrupt data.
 Organizations have to safeguard database against adverse
environmental situations which would corrupt data.
 It is essential to create and maintain/update a backup of the
database elsewhere, in addition to implementation of safety
features.
29 HANDLING AND PROCESSING BIG DATA

 Data has to be interlinked


 Since organizational database are bound to be accessed by
number of channels, it is not recommended to use different
software for the required solutions.
 In essence, all organizational data must be able to talk to each
other.
 Cloud storage solution is the best answer for data interlinking
issue.
30 HANDLING AND PROCESSING BIG DATA

 Know the data you need to capture.


 Organizations are required to know which data has to be collected
and when.
 Adapt to new challenges
 One of the most important aspects of Big Data management is
keeping up with the latest trends in the same.
 Being flexible and open to new trends and technologies will go a
long way in giving you an edge over the competition.
31 CHALLENGES OF BIG DATA

 The volume of data is already enormous and increasingly every


day.
 The velocity of its generation and growth is increasing.
 The variety of data being generated is also expanding and
organization’s capability to capture and process this data is
limited.
 Current technology, architecture, management and analysis
approaches are unable to cope with the flood of data, and
organizations will need to change the way they think about, plan,
govern, manage, process and report on data to realize the
potential of big data.
32 CHALLENGES OF BIG DATA

 Data storage and analysis


 The size of data is increasingly rapidly by various means such as
mobile devices, aerial sensory devices technologies, remote
sensing and etc.
 Some useful data may be deleted as there is no free space to store
such huge data.
 Therefore the first challenges of big data analysis is storage
mediums and higher input/output speed.
 In such cases, the data accessibility must be top priority for the
knowledge discovery and representation.
33 CHALLENGES OF BIG DATA

 With the ever growing of datasets, the data mining tasks has
significantly increased.
 This is another big challenge for big data.
 While dealing with large datasets, data reduction, data selection,
feature selection are used.
 Hadoop and MapReduce make it possible to collect large amount
of semi structured and unstructured data in a reasonable amount
of time.
 The key engineering challenge is how to effectively analyze these
data for obtaining better knowledge.
34 CHALLENGES OF BIG DATA

 Computational complexities and knowledge discovery


 Knowledge discovery and representation is a prime issue in big
data.
 There are several tools for knowledge discovery and
representation such as: fuzzy set, rough set, soft set, and etc.
 Since the size of big data keeps increasing exponentially, the
available tools may not be efficient to process these data for
obtaining meaningful information.
35 CHALLENGES OF BIG DATA

 Information security
 In big data analysis massive amount of data are correlated,
analyzed, and mined for meaningful patterns.
 All organizations have different policies to safe guard their
sensitive information.
 Preserving sensitive information is a major issue in big data
analysis.
 There is a huge security risk associated with big data.
36 CHALLENGES OF BIG DATA

• Data Volume: Managing and Storing Massive Amounts of


Data
 Challenge: The most apparent challenge with Big Data is the
sheer volume of data being generated.
 This vast amount of data requires advanced storage infrastructure,
which can be costly and complex to maintain.
 Solution: Adopting scalable cloud storage solutions, such
as Amazon S3, Google Cloud Storage, or Microsoft
Azure, can help manage large volumes of data.
37 CHALLENGES OF BIG DATA

• Data Variety: Handling Diverse Data Types


• Challenge: Big Data encompasses a wide variety of data types,
including structured data (e.g., databases), semi-structured data
(e.g., XML, JSON), and unstructured data (e.g., text, images,
videos).
• The diversity of data types can make it difficult to integrate,
analyze, and extract meaningful insights.
• Solution: To address the challenge of data variety, organizations
can employ data integration platforms and tools like Apache Nifi,
Talend, or Informatica.
38 CHALLENGES OF BIG DATA

• Data Velocity: Processing Data in Real-Time


• Challenge: The speed at which data is generated and needs to be
processed is another significant challenge.
• For instance, IoT devices, social media platforms, and financial
markets produce data streams that require real-time or near-real-
time processing.
• Delays in processing can lead to missed opportunities and
inefficiencies.
• Solution: To handle high-velocity data, organizations can
implement real-time data processing frameworks such as Apache
Kafka, Apache Flink, or Apache Storm.
39 CHALLENGES OF BIG DATA

• Data Veracity: Ensuring Data Quality and Accuracy


• Challenge: With Big Data, ensuring the quality, accuracy, and
reliability of data referred to as data veracity becomes increasingly
difficult. Inaccurate or low-quality data can lead to misleading
insights and poor decision-making.
• Data veracity issues can arise from various sources, including data
entry errors, inconsistencies, and incomplete data.
• Solution: Implementing robust data governance frameworks is
crucial for maintaining data veracity.
• Tools like Trifacta, Talend Data Quality, and Apache Griffin can help
automate and streamline data quality management processes.
40 CHALLENGES OF BIG DATA

• Data Security and Privacy: Protecting Sensitive


Information
• Challenge: As organizations collect and store more data, they
face increasing risks related to data security and privacy.
 Solution: To mitigate security and privacy risks, organizations
must adopt comprehensive data protection strategies.
 This includes implementing encryption, access controls, and
regular security audits
41 CHALLENGES OF BIG DATA

• Data Integration: Combining Data from Multiple Sources


• Challenge: Integrating data from various sources, especially
when dealing with legacy systems, can be a daunting task.
• Data silos, where data is stored in separate systems without easy
access, further complicate the integration process, leading to
inefficiencies and incomplete analysis.
• Solution: Data integration platforms like Apache Camel, MuleSoft,
and IBM DataStage can help streamline the process of integrating
data from multiple sources.

You might also like