0% found this document useful (0 votes)
3 views12 pages

DSBA unit 3

Big Data refers to vast datasets that require advanced technologies for analysis, characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and Value. It originates from various sources including social media, IoT devices, and enterprise systems. The Data Analytic Lifecycle outlines a structured approach to transforming raw data into actionable insights through six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize.

Uploaded by

Sameer Inamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

DSBA unit 3

Big Data refers to vast datasets that require advanced technologies for analysis, characterized by the 5 Vs: Volume, Velocity, Variety, Veracity, and Value. It originates from various sources including social media, IoT devices, and enterprise systems. The Data Analytic Lifecycle outlines a structured approach to transforming raw data into actionable insights through six phases: Discovery, Data Preparation, Model Planning, Model Building, Communicate Results, and Operationalize.

Uploaded by

Sameer Inamdar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

DSBA

UNIT-1
Introduction to Big Data
and Data Analytic Lifecycle

 Big Data refers to extremely large datasets that may


be analyzed computationally to reveal patterns,
trends, and associations.
 The Data Analytic Lifecycle is a systematic process
that guides data professionals through the
transformation of raw data into actionable insights.
 This presentation explores Big Data, its sources, and
the stages of the analytic lifecycle.
What is Big Data?

 Definition: Big Data is a term used to describe


massive volumes of structured and unstructured
data.
 It requires advanced technologies and methods for
storage, processing, and analysis.
 5 Vs of Big Data:
 - Volume: Petabytes to exabytes of data
 - Velocity: Data flowing in at unprecedented speed
 - Variety: Text, images, video, audio, sensor data,
etc.
 - Veracity: Trustworthiness and quality of data
 - Value: Potential for actionable insights and
Sources of Big Data

 Social Media: User-generated content, likes, shares,


comments
 IoT Devices: Smart sensors, connected cars, health
trackers
 Mobile Apps: Location data, usage patterns, user
behavior
 Enterprise Systems: Transactional data from ERP, CRM
systems
 Public Repositories: Weather data, census, government
portals
 Multimedia Content: Videos, images, audio files from
various platforms
Introduction to the Data
Analytic Lifecycle
 Lifecycle provides a roadmap for analytics
projects.
 It helps align data science work with business
goals.
 Stages: Discovery, Data Preparation, Model
Planning, Model Building, Communicate Results,
Operationalize
Phase 1 - Discovery

 Understand the business domain and objectives.


 Identify data sources and define the analytics
problem.
 Conduct stakeholder interviews and assess
project feasibility.
 Estimate timelines, risks, and success metrics.
Phase 2 - Data
Preparation
 Data Collection: Aggregate from relevant internal and
external sources.
 Data Cleaning: Address missing data, remove
duplicates, fix inconsistencies.
 Data Transformation: Normalize, categorize, and
encode data as needed.
 Feature Engineering: Create new variables to improve
model performance.
Phase 3 - Model Planning

 Understand the structure of the data using


statistical techniques.
 Select modeling techniques: regression, clustering,
classification, etc.
 Decide on tools and environments: Jupyter,
RStudio, Spark, etc.
 Develop a data partition strategy: training,
validation, test sets.
Phase 4 - Model Building

 Apply selected algorithms to build predictive or


descriptive models.
 Train models using historical or labeled data.
 Evaluate performance using metrics like accuracy,
precision, recall.
 Use cross-validation and hyperparameter tuning to
refine models.
Phase 5 - Communicate
Results
 Translate technical findings into business-relevant
insights.
 Create compelling visualizations using tools like
Tableau or Power BI.
 Generate reports and dashboards tailored to
stakeholders.
 Support data-driven decision-making with
actionable recommendations.
Phase 6 - Operationalize

 Deploy models to production environments


using APIs or batch processes.
 Establish monitoring for accuracy, drift, and
performance.
 Plan for model retraining and updates.
 Ensure governance, compliance, and data
security in deployment.
Summary

 Big Data encompasses large, complex data that


requires specialized tools.
 Sources range from social media to IoT to enterprise
systems.
 The Data Analytic Lifecycle provides structure to
analytics projects.
 Each phase contributes to extracting maximum
value from data.

You might also like