0% found this document useful (0 votes)
40 views34 pages

Big Data

Big data refers to vast and complex datasets that traditional data management systems struggle to handle, characterized by the 5Vs: Volume, Variety, Velocity, Veracity, and Value. Big data analytics enables organizations to derive actionable insights from these datasets, utilizing advanced technologies like machine learning and AI for better decision-making and operational efficiency. The document also discusses the types of data (structured, unstructured, and semi-structured), the processes involved in big data analytics, and its benefits, including improved customer engagement and optimized risk management strategies.

Uploaded by

skshajalal88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views34 pages

Big Data

Big data refers to vast and complex datasets that traditional data management systems struggle to handle, characterized by the 5Vs: Volume, Variety, Velocity, Veracity, and Value. Big data analytics enables organizations to derive actionable insights from these datasets, utilizing advanced technologies like machine learning and AI for better decision-making and operational efficiency. The document also discusses the types of data (structured, unstructured, and semi-structured), the processes involved in big data analytics, and its benefits, including improved customer engagement and optimized risk management strategies.

Uploaded by

skshajalal88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Big Data

What Is Big Data?


•Big data refers to extremely large and diverse collections of structured,
unstructured, and semi-structured data that continues to grow exponentially over
time. These datasets are so huge, incredible and complex that traditional data
management systems cannot store, process, and analyze them.
•Big data analytics refers to the systematic collection, processing, storage, and
analysis of large amounts of data and complex data sets to extract valuable
insights. It allows for the uncovering of trends, patterns and correlations in large
amounts of raw data to help analysts make data-informed decisions.
•This process allows organizations to leverage the exponentially
growing data generated from diverse sources, including internet-
of-things (IoT) sensors, social media, financial transactions and
smart devices to derive actionable intelligence through advanced
analytic techniques.
• Big data has only gotten bigger as recent technological breakthroughs have
significantly reduced the cost of storage and compute, making it easier and less
expensive to store more data than ever before.
• The most important characteristics of big data can be
summarized as the 5Vs, which are as follows:
1. Volume
2. Variety
3. Velocity
4. Veracity
5. Value
Volume: With big data, we have to process high volumes of low-density,
unstructured data. For some organizations, this might be tens of terabytes of data.
For others, it may be hundreds of petabytes.
• Big data technologies and cloud-based storage solutions enable
organizations to store and manage these vast data sets cost-
effectively, protecting valuable data from being discarded due to
storage limitations.
Variety: In terms of data variety, the types of data we can use have
also rapidly expanded. While in the past, relational data in the form of
data tables composed of numbers was mainly used for analysis, many
forms of data sets including text data, image data, audio data, video
data, and so on are now used for this purpose.
• This variety demands flexible data management systems to handle and
integrate disparate data types for comprehensive analysis to derive
meaning. NoSQL databases, data lakes and schema-on-read
technologies provide the necessary flexibility to accommodate the
diverse nature of big data.
Velocity: Velocity is the fast rate at which data is received and
(perhaps) acted on. In terms of data generation speed, various types
of data are being generated in real time at a speed that is
incomparable to the past.
• The velocity at which data flows into organizations requires robust
processing capabilities to capture, process and deliver accurate
analysis in near real-time. Stream processing frameworks and in-
memory data processing are designed to handle these rapid data
streams and balance supply with demand..
Veracity: How truthful is our data—and how much can we rely on it?
The idea of veracity in data is tied to other functional concepts, such
as data quality and data integrity. Ultimately, these all overlap and
steward the organization to a data repository that delivers high-
quality, accurate, and reliable data to power insights and decisions.
• Techniques and tools for data cleaning, validation and verification
are integral to ensuring the integrity of big data, enabling
organizations to make better decisions based on reliable information.
• Value: Lastly, the significance of big data analytics exists only when value can be
created through big data.
• In South Korea, many companies introduced big data analysis techniques into
business without any clear purpose. For this reason, many big data-based business
projects have failed.
• The value that big data analysis can bring will be maximized only when preceded by
clear and concrete recognition of areas that cannot be solved with existing data and
analysis tools.
• Big data analytics aims to extract actionable insights that offer
tangible value. This involves turning vast data sets into meaningful
information that can inform strategic decisions, uncover new
opportunities and drive innovation.
• Advanced analytics, machine learning and AI are key to unlocking
the value contained within big data, transforming raw data into
strategic assets.
Types of big data
Structured Data
•Structured data refers to highly organized information that is easily
searchable and typically stored in relational databases or spreadsheets.
It adheres to a rigid schema, meaning each data element is clearly
defined and accessible in a fixed field within a record or file.
Examples of structured data include:
Customer names and addresses in a customer relationship
management (CRM) system;
Transactional data in financial records, such as sales figures and
account balances;
Employee data in human resources databases, including job
titles and salaries.
• Structured data's main advantage is its simplicity for entry,
search and analysis, often using straightforward database
queries like SQL. However, the rapidly expanding universe of
big data means that structured data represents a relatively
small portion of the total data available to organizations.
Unstructured Data
• Unstructured data lacks a pre-defined data model, making it more difficult to
collect, process and analyze. It comprises the majority of data generated
today, and includes formats such as:
Textual content from documents, emails and social media posts;
Multimedia content, including images, audio files and videos;
Data from IoT devices, which can include a mix of sensor data, log files and
time-series data.
Structured data versus unstructured data
• The primary challenge with unstructured data is its complexity and
lack of uniformity, requiring more sophisticated methods for indexing,
searching and analyzing. NLP, machine learning and advanced
analytics platforms are often employed to extract meaningful insights
from unstructured data.
Semi-structured data
• Semi-structured data occupies the middle ground between structured
and unstructured data. While it does not reside in a relational
database, it contains tags or other markers to separate semantic
elements and enforce hierarchies of records and fields within the
data.
Examples include:
JSON (JavaScript Object Notation) and XML (eXtensible Markup
Language) files, which are commonly used for web data interchange;
Email, where the data has a standardized format (e.g., headers,
subject, body) but the content within each section is unstructured;
NoSQL databases, can store and manage semi-structured data more efficiently
than traditional relational databases.
• Semi-structured data is more flexible than structured data but easier to analyze
than unstructured data, providing a balance that is particularly useful in web
applications and data integration tasks.
* A relational database is a type of database that stores and provides
access to data points that are related to one another.
Relational databases are based on the relational model, an intuitive,
straightforward way of representing data in tables. In a relational
database, each row in the table is a record with a unique ID called the
key. The columns of the table hold attributes of the data, and each record
usually has a value for each attribute, making it easy to establish the
relationships among data points.
How Big Data Works
•Big data works by providing insights that shine a light on new opportunities and
business models. Once data has been ingested, getting started involves several key
actions:
Collect data: The first step involves gathering data, which can be a
mix of structured and unstructured forms from myriad sources like
cloud, mobile applications and IoT sensors. This step is where
organizations need to bring in the data, integrate it, and make sure it’s formatted
and available in a form that our business analysts can get started with.
Process data: After being collected, data must be systematically
organized, extracted, transformed and then loaded into a storage
system to ensure accurate analytical outcomes. Processing involves
converting raw data into a format that is usable for analysis, which
might involve aggregating data from different sources, converting data
types or organizing data into structure formats.
• Given the exponential growth of available data, this stage can be
challenging.
• Processing strategies may vary between batch processing, which
handles large data volumes over extended periods and stream
processing, which deals with smaller real-time data batches.
Big data also requires storage. Our storage solution can be in the cloud, on-premises,
or both. We can store our data in any form we want and bring our desired processing
requirements and necessary process engines to those data sets on an on-demand
basis.
• Many people choose their storage solution according to where their data is currently
residing.
• Data lakes are gradually gaining popularity because it supports our current compute
requirements and enables us to spin up resources as needed.
Clean data: Regardless of size, data must be cleaned to ensure quality
and relevance.
• Cleaning data involves formatting it correctly, removing duplicates and
eliminating irrelevant entries. Clean data prevents the corruption of output
and safeguard’s reliability and accuracy.
Analyze data: Our investment in big data pays off when we analyze and act on our
data. A visual analysis of our varied data sets gives us new clarity.
• We will explore the data further to make new discoveries.
• We can share our findings with others, or put our data to work for our organization. We can
build data models with machine learning and artificial intelligence.
• Advanced analytics, such as data mining, predictive analytics, machine
learning and deep learning, are employed to sift through the processed
and cleaned data.
• These methods allow users to discover patterns, relationships and trends
within the data, providing a solid foundation for informed decision-
making.
Four main data analysis methods
These are the four methods of data analysis at work within big data:
Descriptive analytics: The "what happened" stage of data analysis.
Here, the focus is on summarizing and describing past data to
understand its basic characteristics.
Diagnostic analytics: The “why it happened” stage. By delving deep
into the data, diagnostic analysis identifies the root patterns and
trends observed in descriptive analytics.
Predictive analytics: The “what will happen” stage. It uses historical
data, statistical modeling and machine learning to forecast trends.
Prescriptive analytics: Describes the “what to do” stage, which goes
beyond prediction to provide recommendations for optimizing future
actions based on insights derived from all previous.
Development phase of data analytics
• The purpose of Descriptive analytics, as well as Diagnostic
analytics, is to process the analysis target itself to be able to understand
and explain.
o To meet this purpose, we aim to effectively extract significant information
about an analysis target from a sea of numerous data, or big data.
• For example, from a corporate marketing point of view, it meets this purpose
when trying to establish a marketing strategy by classifying similar types of
consumers based on the structured and unstructured data of numerous
customers.
• In other words, it is possible to classify individual customers into clusters and
develop promotional activities specialized for that group.
• Also, it can be said that data analysis conducted in this context discovers
common factors that can well capture the characteristics of the analysis target
in big data consisting of hundreds of thousands of variables.
o The purpose of Predictive analytics, as well as Prescriptive
analytics, is to predict what will happen in the future based on the available data.
The subject of prediction may differ from field to field.
• For example, in the medical field, it can be used to predict in advance whether a
specific tumor is a malignant or benign tumor based on the patient’s MRI image
data.
• In the economic field, the future changes of major economic indicators (e.g., GDP
growth rate, stock price index, exchange rate) can be subjects for prediction.
• In the banking sector, big data can be used to predict whether a borrower will
default in the future.
• Factors that can predict mechanical failures may be deeply buried in structured data—
think the year, make, and model of equipment—as well as in unstructured data that
covers millions of log entries, sensor data, error messages, and engine temperature
readings. By analyzing these indications of potential issues before problems happen,
organizations can deploy maintenance more cost effectively and maximize parts and
equipment uptime.
Big Data Benefits
Following are the benefits organizations can realize once they see success with big
data analytics:
Better insights: When organizations have more data, they’re able to derive
better insights. A larger pool of data uncovers previously hidden connections and
expands potentially missed perspectives.
•All of this allows organizations to have a more comprehensive understanding into
the how and why of things, particularly when automation allows for faster, easier
processing of big data.
Real-time intelligence: One of the standout advantages of big
data analytics is the capacity to provide real-time intelligence.
Real-time insight allows businesses to make quick decisions,
respond to market changes instantaneously and identify and act
on opportunities as they arise.
Decision-making: With better insights, organizations can make data-driven decisions
with more reliable projections and predictions. A deeper understanding equips
leaders and decision-makers with the information needed to strategize effectively,
enhancing business decision-making in supply chain management, e-commerce,
operations and overall strategic direction.
Cost savings: Big data analytics drives cost savings by identifying
business process efficiencies and optimizations.
• Organizations can pinpoint wasteful expenditures by analyzing large
datasets, streamlining operations and enhancing productivity.
• Moreover, predictive analytics can forecast future trends, allowing
companies to allocate resources more efficiently and avoid costly
missteps.
Better customer engagement: Understanding customer needs,
behaviors and sentiments is crucial for successful engagement and
big data analytics provides the tools to achieve this understanding.
• Big data allows organizations to build customer profiles through a combination of
customer preferences, sales data, industry demographic data, and related data such as
social media activity and marketing campaign engagement.
• Before automation and analytics, this type of personalization was impossible due to its
sheer scope; with big data, this level of granularity improves engagement and enhances
the customer experience.
Improved operational efficiency: Every department of an organization can benefit
from data on an operational level for tasks such as detecting process anomalies,
identifying patterns for maintenance and resource use, and highlighting hidden drivers
of human error. Whether technical problems or staff performance issues, big data
produces insights about how an organization operates—and how it can improve.
Optimized risk management strategies: Big data analytics enhances
an organization's ability to manage risk by providing the tools to
identify, assess and address threats in real time. Predictive analytics
can foresee potential dangers before they materialize, allowing
companies to devise preemptive strategies.
• In the case of big data analysis for prediction, the accuracy of prediction
becomes the most important concern, and methods to increase the predictive
power are closely related to artificial intelligence/machine learning. In other
words, new data is accumulated in real time, and learning continues to occur to
improve the accuracy of prediction. This point is what makes a big difference
between traditional data analysis and big data analysis.
• In big data analysis, it is not meaningful to explain the past data well, and the
most important issue is how to accurately predict the future based on currently
available data.
• For example, the core goal of a coronavirus-related big data analysis can be
summarized as predicting in advance whether it will lead to a serious or death
event given the information of the patient, which makes it easier for us to
come up with an appropriate treatment or intervention in advance.
Big Data Analytics-Related Technology Solutions
• Compared to past data, big data is overwhelmingly large in terms of size and
diversity, so there is a limit to handling it with traditional analysis tools.
• Therefore, in the process of big data analysis, there is no choice but to form
an ecosystem where technologies in various fields interact with each other.
• The general flow of big data analysis can be expressed as shown in the
following figure, and detailed technologies play a role in each step.
Flow of big data analytics
1. Data collection: it refers to the act of actively collecting data on the target of
analysis. The IoT and web crawling algorithms are used to collect online data.
2. Data storage: it is no exaggeration to say that the development of Hadoop has
made big data analysis easier. The Hadoop system is a system that can process large
amounts of data at a low cost and has established itself as a standard platform
(Java-based open-source framework) for big data processing.
• Hadoop is a free solution that provides a service that stores and processes large
amounts of data by integrating multiple computers into one. It stores data in
thousands of distributed storage devices and distributed computing devices.
3. Analysis and visualization: Splunk, a big data analysis solution competitor of
Hadoop, provides machine data (various servers, networks, IoT equipment, and
various sources) through a web-based interface.
• It is a representative solution that provides a real-time distributed computing
platform that can collect, store, analyze, visualize applications, and so on.
• It has the function to collect and process even unstructured data including text and
voice regardless of the format and capacity of the data, and it is a solution that
enables the overall function of big data processing without complicated coding or the
help of external solutions.
Big Data Application Business Cases
• There are some business applications of big data:
Amazon Inc.: Amazon Inc., a world-class e-commerce company, provides a cloud-
based solution (AWS) that enables big data analysis for other companies, while also
providing effective inventory management and data analysis of its customers.
• It can be said to be an iconic company that led to an increase in sales using big data
analytics. The data Amazon collects is not limited to just the data left behind by
customers online.
• For example, in the case of AmazonGo, an unmanned store operated by Amazon, customers
can directly visit the store to purchase products, and the sensors and cameras installed in
the store allow customers to see how they act in physically existing stores.
• Data about customer actions (e.g., which products are carefully examined) are
also collected and used to analyze customers’ purchasing behavior. Such
information is used to establish a meaningful marketing strategy by combining
various information about customers (race, age, gender, etc.).
• Purchase patterns of online customers are also subject to collection and analysis.
• In addition, in the case of e-commerce companies, efficient inventory
management has a great influence on profitability. According to the analysis of
data on the purchasing behavior of users, it promotes inventory exhaustion
through discount promotions and efficient geographical distribution of inventory
products through prediction of purchasing patterns.
Starbucks: Starbucks, which started out as a small coffee shop in Seattle, has
grown into a global coffee chain company over the past 30 years. There may be
many factors for Starbucks’s success, but big data is evaluated as contributing to
Starbucks’s growth.
• In the case of coffee stores, their geographic location has a great influence on the sales of
the store, and Starbucks is known to have made a thorough data-based decision on the
business district in the area when entering a new area.
• In other words, it is known that decisions about the optimal location are made based on
the analysis of various data on the relevant commercial area, such as the floating
population, traffic volume, and population of the corresponding area.
• When Starbucks enters a new commercial district, the store location is selected by
estimating the impact on other Starbucks stores in the vicinity. In addition to selecting the
location of the store, new menu development and menu recommendations are made
based on data on customers collected from Starbucks-only applications.
Netflix: Netflix is a global company that supplies various media content online. In
the early days, it started a business that sold or rented DVDs by mail, but in 2010, it
entered the online market and transformed into a data-based media content
streaming company. Netflix’s own movies and dramas, as well as various movies
and drama shows, are provided in a streaming method.
• From the point of online business, Netflix made it possible to collect data on users,
and in the end, a “recommendation function” based on customer behavior data,
which is the core technology of Internet TV, became possible.
• As many people consume media content through Netflix, it is possible to acquire
microscopic data on what genre of content people consume at what time. Based on
this data, recommending optimized media content to customers and making users
continue to subscribe to Netflix is a way for Netflix to continuously generate
revenue.
• The recommendation algorithm that Netflix is using is the result of big data such as
data accumulated by Netflix users around the world, evaluation of media content,
and subscription patterns.
Bank of America: Bank of America is a representative company that uses big data
from an all around view of business operations. Bank of America’s marketing
strategy shifted its focus to “event-based marketing” based on user data analysis.
• No matter what channel (branch visit, online banking, etc.) the customer uses to
contact the bank, Bank of America focuses on financial products (home mortgage
loans, credit cards, etc.) that meet the customer’s tendency and which they are
likely to purchase. Therefore, when a customer visits a branch and consults with a
bank employee, information about the financial product most attractive to the
customer is automatically transmitted, and the bank employee can conduct
marketing activities based on this information.
• The main feature of the product group by consumer customization is that it
maintains consistency no matter which channel the customer uses.
• In terms of risk management, the use of big data has significantly reduced the
tangible and intangible costs of banks.
IBM Watson Health: The medical industry is regarded as a representative field in
which the potential of big data can be maximized. The use of big data is expected
to promote hospital operation efficiency, accurate diagnosis, and reduction of
medical costs. IBM’s Watson Health is a company that provides a representative
data analysis platform service in the medical field.
• The Watson Health system provides various options and guidelines for the cause of
the disease and the appropriate treatment method for a specific patient based on
the vast amount of data related to a specific disease (information on other patients
in the past, medical papers, medical textbooks, pharmaceutical information).
• The Watson Health system is an artificial intelligence-based system, and learning
was done from doctors in actual medical institutions.
• If a patient is diagnosed with cancer, the doctor who treats it simply registers
medical information about the patient in the system and, based on the vast amount
of data on the disease accumulated by Watson Health, the doctor is provided the
appropriate treatment method for the patient.
• Watson Health can be said to be a big data analysis service with very high
accessibility because it can be operated on doctors’ smartphones or tablets.
• Financial companies regularly calculate the default probability for borrowers. Instead of
estimating the risk of a loan portfolio according to the default probability by relying on an
external probability prediction model provided as in the past, a dedicated computing
platform based on parallel computing was introduced.
• This reduced the time to calculate the risk of a portfolio of more than 100 million
loan accounts from 96 hours to 1 hour. This shift allows banks to achieve higher
efficiencies in terms of risk management and enables faster decision-making.
RegTech: RegTech is a compound word combining Regulation and Technology. It
utilizes information and communication technology (ICT) to streamline regulatory
compliance, compliance monitoring, and operational risk management granted by
regulatory authorities of companies (financial companies).
• It can be defined as a set of data-driven services that enable firms to do regulatory
compliance easier.
• In the case of Quantex, based on a wide range of customer data (including data on
human networks between customers) held by financial companies, Quantex
supports the operation of financial companies related to money laundering
prevention based on artificial intelligence.
• It provides a service that reflects the various transaction records of customers
taking place in the market in real-time analysis.
The Future of Big Data
• It is emphasized that data collected from various sources will play a key role in
creating value for the future economic system. In other words, it is not an
exaggeration to see that the fourth industrial revolution is a kind of data revolution.
• In many business areas, big data analysis is expected to play a leading role in the
development of new business models and services, as well as a secondary role in
the process of strategy establishment and decision-making. Many think it will play
an important role in bridging the gap between the unsatisfied demand of consumers
and supply ability between suppliers, a problem that existing business models
cannot solve.
• Until now, big data analysis has been widely used in the marketing domain through
consumer behavior analysis based on the digital footprints of users formed on
interconnected networks. However, just because there is a lot of data does not
always guarantee there will be good results, and on the contrary, it cannot be said
that a small amount of data brings bad results.
• The added value that big data analysis can bring may vary depending on the nature
of the problem to be solved in individual business sites and the goals pursued.
• Therefore, it is necessary to adopt big data analysis according to the nature of the
business and the characteristics of the application field.
• Besides, some worry about the alienation of humans from the labor market due to
the big data revolution.
• In other words, human labor can be rapidly replaced in some occupations due to
the automation of processes or office processing through artificial intelligence. For
example, many companies have introduced a “chatbot,” which is a trend in which
computer algorithms play the role of customer service.
• Despite this trend, it will be difficult to replace all areas of the business scene with
robots or artificial intelligence-based algorithms.
• This is because humans will eventually be responsible for designing the
algorithm’s infrastructure, and humans will eventually be responsible for
interpreting the results of data analysis and gaining insights from it.
• Therefore, to successfully prepare for the era of the 4th Industrial Revolution,
it is necessary to have the ability to collect, process, and analyze various types
of data, as well as have insight into the business environment and changes in
consumer behavior.

You might also like