0% found this document useful (0 votes)
7 views39 pages

Big Data Analytics_Drivers

Uploaded by

baibhav2811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views39 pages

Big Data Analytics_Drivers

Uploaded by

baibhav2811
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

Big Data Analytics

Big Data Applications: Transforming Industries with


Data-Driven Insights
Big Data refers to the massive volumes of structured, semi-
structured, and unstructured data that organizations collect
daily. When processed and analyzed, this data unlocks powerful
insights across industries. Below are key applications of Big
1. Healthcare
Data in real-world scenarios.
Applications:
•Predictive Analytics: Identifying disease outbreaks (e.g.,
tracking Infection spread).
•Personalized Medicine: Genomic data analysis for tailored
treatments.
•EHR Optimization: Reducing medical errors by analyzing
patient records.
•Drug Discovery: AI-driven analysis of clinical trial data.
Example:
IBM Watson Health uses Big Data to analyze medical research
2. Finance & Banking
Applications:
•Fraud Detection: Machine learning models detect unusual
transactions in real time.
•Risk Management: Predictive analytics for credit scoring.
•Algorithmic Trading: Analyzing market trends for high-
frequency trading.
•Customer Insights: Personalized banking
recommendations.
Example:
PayPal uses Big Data to prevent fraudulent transactions,
saving millions annually.
3. Retail & E-Commerce
Applications:
•Recommendation Engines: Amazon & Netflix suggest
products/movies based on user behavior.
•Inventory Optimization: Predictive analytics for stock
management.
•Dynamic Pricing: Real-time price adjustments (e.g., Uber
surge pricing).
•Customer Sentiment Analysis: Social media monitoring for
brand perception.
Example:
Walmart analyzes 2.5 petabytes of customer data hourly to
optimize supply chains.
4. Manufacturing & IoT
Applications:
•Predictive Maintenance: Sensors detect machine failures
before they happen.
•Supply Chain Optimization: Real-time tracking of goods
via RFID/sensors.
•Smart Factories: AI-driven automation in Industry 4.0.
Example:
Tesla uses Big Data from vehicle sensors to improve self-
driving algorithms.
5. Telecommunications
Applications:
•Network Optimization: Analyzing call drops and
bandwidth usage.
•Customer Churn Prediction: Identifying users likely
to switch providers.
•5G Deployment: Managing massive data traffic
efficiently.
Example:
AT&T processes 500+ TB of data daily to enhance
network performance.
7. Government & Smart Cities
Applications:
•Crime Prediction: AI models analyze crime patterns
(e.g., PredPol).
•Traffic Management: Real-time analysis of
congestion data.
•Disaster Response: Social media & satellite data for
emergency management.
Example:
Singapore’s "Smart Nation" initiative uses Big Data for
urban planning.
8. Energy & Utilities
Applications:
•Smart Grids: Real-time electricity demand
forecasting.
•Oil & Gas Exploration: Analyzing seismic data for
drilling.
•Renewable Energy: Optimizing wind/solar farm
efficiency.
Example:
GE’s Predix platform analyzes sensor data from
turbines to prevent failures.
9. Media & Entertainment
Applications:
•Content Personalization: Spotify’s music
recommendations.
•Audience Analytics: Predicting box office success.
•Piracy Detection: Identifying illegal streaming.
Example:
Disney+ uses Big Data to recommend shows based
on viewing history.
10. Agriculture (AgTech)
Applications:
•Precision Farming: Drones & sensors monitor crop
health.
•Livestock Monitoring: Wearables track animal
health.
•Weather Prediction: AI models forecast
droughts/floods.
Example:
John Deere’s FarmSight uses Big Data to optimize
harvests.
6. Transportation & Logistics
Applications:
•Route Optimization: GPS + traffic data for efficient
deliveries.
•Autonomous Vehicles: Real-time sensor data
processing.
•Fleet Management: Predictive analytics for fuel
efficiency.
Example:
Uber uses Big Data to calculate ETAs and optimize
driver routes.
Categorization of Data
In statistics and data analysis, data can be classified into four main
types. These classifications help determine the appropriate statistical
methods, visualizations, and machine learning techniques to apply.

• Nominal
• Ordinal
• Ratio
• Interval
Nominal Data (Categorical, No Order)
•Definition: Categories with no inherent order or
ranking.
•Key Properties:
• Used for labeling variables.
• No mathematical meaning (only mode is
meaningful).
•Examples:
• Gender (Male, Female, Non-binary)
• Colors (Red, Blue, Green)
Analysis Methods:
• Country names (USA, Japan, Germany)
Frequency counts (Bar charts, Pie charts).
•Chi-square tests (checking relationships between
categories).
2. Ordinal Data (Ordered Categories, Unequal Intervals)
•Definition: Categories with a meaningful order but unknown
differences between them.
•Key Properties:
• Can be ranked, but arithmetic operations (mean, subtraction)
are invalid.
• Median & mode are meaningful, but mean is misleading.
•Examples:
• Education level (High School < Bachelor’s < Master’s < PhD)
• Customer ratings (Poor < Fair < Good < Excellent)
• Economic class (Low < Middle < High income)

Analysis Methods:
•Non-parametric tests (Mann-Whitney U, Kruskal-Wallis).
•Spearman’s rank correlation (measures ordinal relationships).
3. Interval Data (Ordered + Fixed Intervals, No True
Zero)
•Definition: Numeric data with consistent intervals but
no absolute zero.
•Key Properties:
• Differences between values are meaningful, but ratios
are not.
• Negative values are possible.
•Examples:
• Temperature (°C or °F) → 0°C doesn’t mean "no
temperature."
• Calendar years (2020, 2021, 2022) → Year 0 is arbitrary.

Analysis Methods:
•Mean, standard deviation.
•T-tests, ANOVA (parametric tests).
4. Ratio Data (Ordered + Fixed Intervals + True Zero)
•Definition: Numeric data with a true zero point, allowing
ratio comparisons.
•Key Properties:
• All arithmetic operations (+, −, ×, ÷) are valid.
• Cannot have negative values.
•Examples:
• Height, Weight → 0 kg means "no weight."
• Sales revenue → $0 means "no sales."
• Age → 0 years means birth.

Analysis Methods:
•All statistical methods apply (mean, median, regression).
•Geometric mean, coefficient of variation.
BIG DATA BUSINESS DRIVERS

Current Business Problems Provide Opportunities for Organizations to Become More


Analytical & Data Driven
1. Desire to Optimize Business Operations
•What it means: Companies use Big Data to streamline
processes, reduce costs, and maximize efficiency.
•Examples:
• Sales:
Analyzing customer purchase patterns to boost revenue.
• Pricing:
Dynamic pricing models (e.g., Uber surge pricing, airline
ticket adjustments).
• Profitability:
Identifying high-margin products or services.
• Efficiency:
2. Desire to Identify Business Risk
•What it means: Big Data helps detect and mitigate risks
proactively.
•Examples:
• Customer Churn:
Predicting which customers might leave (e.g., telecom
companies offering retention discounts).
• Fraud:
Real-time detection of suspicious transactions (e.g., credit
card fraud alerts).
• Default:
Assessing loan repayment risks (e.g., banks using credit
scoring models).
3. Predict New Business Opportunities
•What it means: Leveraging data to uncover growth
avenues.
•Examples:
• Upsell/Cross-sell: Recommending
complementary products (e.g., Amazon’s
“Frequently bought together”).
• Best New Customer Prospects: Targeted
marketing using demographic/behavioral data
(e.g., LinkedIn ad targeting).
4. Comply with Laws or Regulatory Requirements
•What it means: Using Big Data to meet legal and industry
standards.
•Examples:
• Anti-Money Laundering (AML): Banks tracking
unusual transaction patterns.
• Fraud Prevention: Healthcare providers detecting
false insurance claims.
• Fair Lending: Ensuring unbiased loan approvals via
algorithmic audits.
• Basel II: Financial institutions calculating risk-weighted
assets.
Why This Matters for Businesses

•Competitive Edge: Companies like Netflix and Walmart use these drivers to
outperform rivals.
•Cost Savings: Reducing fraud or inefficiencies directly impacts the bottom line.
•Innovation: Data-driven insights fuel new products/services (e.g., Tesla’s
autonomous driving).
Big Data Mart

A Big Data Mart is a specialized subset of a data


warehouse designed for a specific business function,
department, or subject area. Unlike traditional data marts (which
rely on structured data), a Big Data Mart incorporates large-
scale, multi-structured data (structured, semi-structured, and
unstructured) from sources like IoT, social media, logs, and
transactional systems.

It enables faster, more focused analytics for business units


(e.g., marketing, finance, operations) by providing curated, high-
performance access to relevant data.
Big Data Mart Types

Big Data Marts can be classified based on their data sourcing strategy,
architectural design, and business use cases.

1. Dependent Big Data Mart

A dependent data mart is built directly from an existing enterprise


data warehouse (EDW) or data lake, ensuring consistency with the
organization's central data repository.

Key Characteristics:

✔ Source: Pulls data from a centralized EDW or data lake.


✔ Governance: Follows the same data definitions, schemas, and
security policies as the EDW.
✔ Use Case: When business units need department-specific
analytics without compromising data integrity.
Advantages:

✅ Consistency – Data aligns with the enterprise warehouse.


✅ Lower Redundancy – Avoids duplicate data storage.
✅ Easier Maintenance – Changes in EDW automatically reflect in the mart.

Disadvantages:

❌ Slower Deployment – Requires coordination with central IT.


❌ Less Flexibility – Must adhere to EDW’s structure.

Example:

•A sales team extracts customer transaction data from the corporate EDW to
build a sales performance mart.
•A bank’s risk management team creates a fraud detection mart from
the central data warehouse.
2. Independent Big Data Mart
An independent data mart is created without relying on
a central data warehouse, often using direct data
feeds from operational systems or external sources.

Key Characteristics:
✔ Source: Built standalone, often from departmental
databases, cloud apps, or external APIs.
✔ Governance: Managed independently, leading to
potential silos.
✔ Use Case: When a business unit needs quick,
autonomous analytics without enterprise-wide
dependencies.
Advantages:
✅ Fast Deployment – No dependency on EDW.
✅ Flexibility – Can use custom schemas and unstructured
data.
✅ Cost-Effective – No need for large-scale EDW integration.

Disadvantages:
❌ Data Silos – May not align with enterprise data.
❌ Redundancy – Same data may exist in multiple marts.
❌ Governance Challenges – Harder to enforce compliance.

Example:
•A marketing team builds a campaign analytics mart using
Google Ads, Facebook, and CRM data.
•A healthcare research team creates a patient outcomes
mart from EHR and IoT wearable data.
3. Hybrid Big Data Mart
A hybrid data mart combines data from both the EDW
and external sources, offering a balance between central
governance and departmental flexibility.

Key Characteristics:
✔ Source: Mix of EDW data + external datasets (e.g.,
market trends, social media, IoT).
✔ Governance: Partially controlled by central IT but allows
custom integrations.
✔ Use Case: When business units need enriched
analytics beyond what the EDW provides.
Advantages:
✅ Best of Both Worlds – Combines enterprise data with external
insights.
✅ Enhanced Analytics – Enables 360-degree views (e.g., customer
behavior + market trends).
✅ Scalable – Can grow with business needs.

Disadvantages:
❌ Complex Integration – Requires ETL/ELT pipelines for merging
data.
❌ Higher Maintenance – Needs coordination between central and
local teams.

Example:
•A retailer’s pricing team combines internal sales
data with competitor pricing feeds (from web scraping).
•A financial services firm merges transaction records (from
EDW) with credit bureau data for risk modeling.
Data Lake?

A data lake is a centralized repository that stores raw,


unstructured, semi-structured, and structured
data in its native format (without predefined schema). It
allows organizations to store massive volumes of diverse
data (like text, images, logs, IoT streams, and databases)
for future processing and analysis.
Key Characteristics of a Data
Lake
Feature Description
Data is stored as-is; structure is applied
Schema-on-Read
only when read/analyzed.
Built on low-cost systems like Hadoop
Scalable Storage
(HDFS), Amazon S3, or Azure Blob.
Supports CSV, JSON, images, videos, logs,
Multi-Format
etc.
Cheaper than traditional databases (pay for
Cost-Effective
storage, not compute).
Enables SQL queries, ML, real-time
Flexible Analytics
processing, and batch analytics.

You might also like