Introduction to Big Data Analytics - Notes
1.1 Introduction
1. Characteristics of Data:
- Volume: Massive amount of data generated every second.
- Velocity: Speed at which new data is created and processed.
- Variety: Different types of data - structured, unstructured, semi-structured.
- Veracity: Trustworthiness and quality of data.
- Value: Useful insights that can be derived from the data.
2. Evolution of Big Data:
- Earlier, data was small and manageable with traditional tools.
- Now, due to internet, IoT, and digitalization, huge data is generated.
- Big Data emerged to handle this growth effectively.
3. Definition of Big Data:
- Big Data refers to datasets that are large, complex, and cannot be managed by traditional systems.
4. Challenges with Big Data:
- Storing large volumes.
- Processing speed.
- Data security and privacy.
- Managing unstructured data (like videos, images, etc.).
5. What is Big Data?
- A term used for datasets that are huge and require advanced tools for storage and processing.
6. Why Big Data?
- Helps in better decision-making.
- Predicts customer behavior.
- Improves business strategies.
Introduction to Big Data Analytics - Notes
- Used in healthcare, banking, marketing, etc.
1.2 Introduction to Big Data Analytics
1. What is Big Data Analytics?
- It is the process of analyzing large datasets to discover patterns, trends, and insights.
2. Classification of Analytics:
- Descriptive Analytics: Understand past data (e.g., reports).
- Diagnostic Analytics: Find reasons for past outcomes.
- Predictive Analytics: Forecast future events.
- Prescriptive Analytics: Suggest actions to achieve desired outcomes.
3. Why is Big Data Analytics Important?
- Enhances business efficiency.
- Provides customer insights.
- Detects fraud and risks.
- Optimizes processes.
1.3 Data Science and Big Data Environment
1. What is Data Science?
- A field that uses statistics, machine learning, and algorithms to analyze and interpret data.
2. Responsibilities of a Data Scientist:
- Collect and clean data.
- Analyze data patterns.
- Build models for predictions.
- Communicate results to stakeholders.
Introduction to Big Data Analytics - Notes
3. Terminologies in Big Data:
- Hadoop: Framework for storing and processing big data.
- Spark: Fast data processing engine.
- MapReduce: Programming model for processing large data.
- NoSQL: Non-relational database for unstructured data.
- Data Lake: Central repository for storing raw data.
- ETL: Process of extracting, transforming, and loading data into a system.