0% found this document useful (0 votes)

4 views10 pages

BA THEORY

The document provides an overview of data science, data analysis, and analytics, outlining key concepts such as data types, big data characteristics, and various analytics classifications. It also discusses data preparation techniques, visualization methods, and regression models, emphasizing their applications in business and challenges faced in data analytics. Additionally, it covers textual data analysis, its significance, and methods for extracting insights from unstructured text.

Uploaded by

Sanjana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views10 pages

BA THEORY

Uploaded by

Sanjana

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

UNIT 1

1. Data and Data Science

• Data is the foundation of all analytical processes. It represents raw facts that are
collected for reference or analysis. It can be:
o Structured (organized in tables, e.g., Excel, databases),
o Unstructured (text, images, audio), or
o Semi-structured (like XML, JSON).
• Data Science is the process of analyzing data to extract meaningful insights. It
combines:
o Mathematics & Statistics (for modeling),
o Programming (to handle and analyze data), and
o Domain Knowledge (to make insights actionable).

2. Data Analysis vs. Data Analytics

• Data Analysis is more about examining datasets to discover trends, patterns, or

summaries.
o It’s mostly retrospective – tells what happened.
o Example: Analyzing last month's sales to find top-selling products.
• Data Analytics includes analysis but goes further – it uses advanced techniques to
predict future outcomes and suggest actions.
o Involves predictive and prescriptive techniques.
o Example: Forecasting next month’s sales using machine learning models.

3. Classification of Analytics

1. Descriptive Analytics – Summarizes historical data to understand trends.

o E.g., Monthly revenue reports.
2. Diagnostic Analytics – Explores data to find the reason behind outcomes.
o E.g., Why sales dropped in a region.
3. Predictive Analytics – Uses historical data to forecast future outcomes.
o E.g., Predicting customer churn.
4. Prescriptive Analytics – Suggests actions based on data-driven predictions.
o E.g., Recommending marketing strategies to boost retention.

4. Applications of Analytics in Business

Analytics helps companies:

• Make informed decisions (e.g., which product to launch),
• Improve efficiency (e.g., reduce delivery time),
• Understand customers better (e.g., personalized offers), and
• Gain a competitive edge.

Some examples:

• Marketing: Targeted advertising using customer data.

• Finance: Predicting loan defaults.
• Operations: Automating supply chain management.

5. Types of Data

Type Explanation Example

Nominal Categorical with no order Eye color, Country, Gender
Customer satisfaction: Poor, Fair,
Ordinal Ordered categories
Good
Scale Numeric data with measurable Temperature (Interval), Income
(Interval/Ratio) difference (Ratio)

Understanding data types is essential for selecting the right statistical or analytical technique.

6. Big Data and Its Characteristics

Big Data refers to extremely large datasets that can’t be processed using traditional tools. It
includes social media data, transaction logs, sensors, etc.

Key Characteristics (5 Vs):

• Volume – Massive data size.

• Velocity – High speed of data generation.
• Variety – Structured, semi-structured, unstructured.
• Veracity – Accuracy and trustworthiness.
• Value – Potential to extract insights.

7. Applications of Big Data

• Healthcare: Monitoring patient health in real-time using wearables.

• Retail: Tailoring offers using purchase history.
• Banking: Real-time fraud detection.
• Agriculture: Predicting crop yield using climate data.
• Government: Analyzing traffic patterns for smart cities.
8. Challenges in Data Analytics

Despite its benefits, analytics faces several challenges:

1. Data Quality: Incomplete or incorrect data leads to bad insights.

2. Integration: Combining data from various sources (e.g., CRM, ERP).
3. Security and Privacy: Ensuring compliance with laws like GDPR.
4. Scalability: Managing growing data efficiently.
5. Talent Shortage: Need for skilled data professionals.

UNIT 2
1. Data Preparation and Cleaning

This is the first step in data analysis—removing errors and formatting data for analysis.

• Examples: fixing typos, converting text to numbers, removing extra spaces, handling
missing values.

2. Sort and Filter

• Sort: Arranges data in ascending/descending order (e.g., sort sales from highest to
lowest).
• Filter: Displays only data that meets specific conditions (e.g., sales in one region).

3. Conditional Formatting

Applies visual formatting rules to highlight data automatically.

• Example: Highlight sales below ₹10,000 in red.

4. Text to Column
Splits text from one column into multiple columns using a delimiter (like commas or spaces).

• Example: Splitting "John,Smith" into First Name and Last Name.

5. Removing Duplicates

Eliminates repeated rows or values to ensure data uniqueness.

• Useful in cleaning customer lists or transaction records.

6. Data Validation

Sets rules for what data can be entered in a cell.

• Example: Restricting values to dates only or dropdown menus for departments.

7. Identifying Outliers in the Data

Outliers are data points that differ significantly from others.

• Detected using charts (box plots) or statistical methods (Z-score, IQR).

• Important because they can skew results.

8. Covariance and Correlation Matrix

• Covariance shows how two variables change together (direction).

• Correlation shows the strength and direction of a linear relationship between
variables (range: -1 to 1).
• Correlation Matrix is a table showing correlations between multiple variables.

9. Moving Averages

Smooths out short-term fluctuations to reveal trends over time.

• Used in forecasting and time-series analysis.

• Example: 3-month moving average of sales.
10. Finding the Missing Value from Data

• Methods: Replace with mean/median, forward fill, or use statistical models.

• Important for maintaining dataset completeness and avoiding errors in analysis.

11. Summarisation

Reduces large datasets into meaningful summaries (totals, averages, counts).

• Tools: Excel functions (SUM, AVERAGE), group-by in Python/Pandas, or SQL

aggregation.

12. Visualisation Techniques

Helps in understanding and communicating patterns in data.

Chart Type Use Case

Scatter Plot Shows relationship between two variables (e.g., sales vs. ads)
Line Chart Displays trends over time (e.g., monthly revenue)
Histogram Shows distribution of data (e.g., age of customers)

13. Pivot Tables

Summarise data dynamically by rows and columns.

• Example: Total sales by product and region.

14. Pivot Charts

Graphs based on pivot table data.

• Automatically update when pivot table changes.

15. Interactive Dashboards

Combines charts, slicers, and KPIs into a single view for real-time insights.

• Created in Excel, Power BI, or Tableau.

• Example: A sales dashboard with region-wise and monthly performance.
UNIT 3
UNIT 4
1. Importing Data File

You can import data into your analysis tool from:

• CSV (Comma-Separated Values): Most common, works in Excel, R, Python, etc.

o In R: read.csv("file.csv")
o In Python (Pandas): pd.read_csv("file.csv")
• Excel (.xlsx):
o In R: readxl::read_excel("file.xlsx")
o In Python: pd.read_excel("file.xlsx")
• Other formats: JSON, SQL, text files.

2. Data Visualisation Using Charts

Visualization helps to explore and present data effectively. Here are the common types:

Chart Type Purpose / Use Case

Shows frequency distribution of a single variable. Useful for

Histogram
understanding data distribution.

Bar Chart Compares categories (e.g., sales by product).

Box Plot Displays spread and outliers of numerical data using quartiles.

Line Graph Shows trends over time (e.g., monthly revenue).

Displays relationship between two continuous variables (e.g., height

Scatter Plot
vs. weight).

Pie Chart (less used in data

Shows parts of a whole – better alternatives usually exist.
science)

3. Data Description – Descriptive Statistics

A. Measures of Central Tendency

These show the center of a data set:

• Mean: Average value.

• Median: Middle value when sorted.
• Mode: Most frequent value.

B. Measures of Dispersion

These describe how spread out the data is:

• Range: Max – Min.

• Variance: Average squared deviation from mean.
• Standard Deviation (SD): Square root of variance; shows average distance from the mean.
• Interquartile Range (IQR): Middle 50% of data (Q3 − Q1); used in box plots.

4. Relationship Between Variables

Understanding how two variables move together:

Measure Explanation

Indicates direction of the linear relationship (positive/negative), but not

Covariance
strength.

Ranges from -1 to +1. Shows strength and direction of a linear

Correlation (r)
relationship.

Coefficient of Proportion of variance in one variable explained by another (0 to 1).

Determination (R²) Higher R² = stronger fit in regression models.

Tip: Correlation is standardized, while covariance is not.

UNIT 5
📈 1. Simple Linear Regression Model

• Models the relationship between two variables: one independent (X) and one dependent
(Y).
• Equation:

Y=a+bX+εY = a + bX + \varepsilonY=a+bX+ε
o a = intercept
o b = slope (regression coefficient)
o ε = error term

Goal: Predict Y based on the value of X (e.g., predicting sales based on advertising).

📊 2. Confidence and Prediction Intervals

• Confidence Interval: Range of values for the mean prediction of Y for a given X.
o Example: “We are 95% confident the average sales at ₹10k ads is between ₹20k–
₹25k.”
• Prediction Interval: Wider range that predicts a new individual outcome.
o Example: Predicting next month's sales for ₹10k ads.

🧮 3. Multiple Linear Regression

• Extends simple regression to include more than one independent variable.

• Equation:

Y=a+b1X1+b2X2+...+bnXn+εY = a + b_1X_1 + b_2X_2 + ... + b_nX_n + \varepsilonY=a+b1X1

+b2X2+...+bnXn+ε

• Example: Predicting house price based on size, location, and number of bedrooms.

📌 4. Interpretation of Regression Coefficients

• Each coefficient (b₁, b₂, etc.) shows the effect of that variable on Y, holding other variables
constant.
• Example: In housing data, if b₁ = 5000, it means each extra sq. ft. increases price by ₹5000.

⚠️ 5. Heteroscedasticity

• When the variance of errors is not constant across values of X.

• Problem: Violates regression assumptions and affects prediction accuracy.
• Detected using residual plots; corrected using transformations or robust standard errors.

🔀 6. Multicollinearity

• When independent variables are highly correlated with each other.

• Problem: Makes it hard to interpret coefficients; inflates standard errors.
• Detected using VIF (Variance Inflation Factor); fixed by removing or combining variables.
📚 7. Basics of Textual Data Analysis

• Text data (like reviews, tweets, emails) is unstructured and requires preprocessing before
analysis.
• Common steps: Tokenization, cleaning, removing stop words, stemming.

✅ Significance and Applications of Textual Analysis

• Significance: Helps extract insights from non-numeric data.

• Applications:
o Sentiment analysis (e.g., customer feedback)
o Topic detection (e.g., trending topics on Twitter)
o Spam filtering, chatbot training, etc.

🧠 8. Challenges in Textual Data Analysis

• Noise in text: slang, abbreviations, misspellings.

• Context understanding: Words have different meanings.
• Language diversity: Multilingual or regional variations.

🧪 9. Introduction to Textual Analysis Using R

• Popular packages:
o tm (Text Mining),
o tidytext (tidy text processing),
o textclean,
o syuzhet (for sentiment analysis)

Example:

library(tidytext)

library(dplyr)

# Sample analysis

data("stop_words")

tokens <- unnest_tokens(my_data, word, text_column) %>%

anti_join(stop_words)
🧰 10. Methods and Techniques of Textual Analysis
Method Description

Text Mining Extracting patterns from large text data (keywords, frequency, word clouds)

Categorization Classifying text into predefined categories (e.g., spam vs. non-spam)

Sentiment Analysis Detecting positive, negative, or neutral emotion in text

Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
Capstone Notes-2
No ratings yet
Capstone Notes-2
27 pages
CF Week 9 Assignment Template
No ratings yet
CF Week 9 Assignment Template
6 pages
Computational Laboratory For Economics
0% (1)
Computational Laboratory For Economics
461 pages
Data Analytics Syllabus PDF
No ratings yet
Data Analytics Syllabus PDF
5 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
DATA ANALYTICS
No ratings yet
DATA ANALYTICS
6 pages
Data_Science_and_Analytics_Theory_Complete
No ratings yet
Data_Science_and_Analytics_Theory_Complete
11 pages
Excel
No ratings yet
Excel
22 pages
Week 1
No ratings yet
Week 1
50 pages
ISPFL9 Module1
100% (1)
ISPFL9 Module1
22 pages
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
No ratings yet
UNIT 1 - INTRODUCTION ( DATA ANALYTICS AND BIG DATA )_60515294_2025_05_15_17_42
25 pages
BA Th Exam
No ratings yet
BA Th Exam
38 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Unit_1.pptx
No ratings yet
Unit_1.pptx
57 pages
Data Analytics
No ratings yet
Data Analytics
30 pages
BA NOTES ETE
No ratings yet
BA NOTES ETE
16 pages
BA NOTES SHORT
No ratings yet
BA NOTES SHORT
50 pages
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
No ratings yet
Data Preparation and Exploration: DSCI 5240 Data Mining and Machine Learning For Business Russell R. Torres
28 pages
DA Unit 1
No ratings yet
DA Unit 1
43 pages
Unit-1 DA
No ratings yet
Unit-1 DA
23 pages
Antim-Prahar-Data-Analytics-for-Business-Decisions-2025_compressed
No ratings yet
Antim-Prahar-Data-Analytics-for-Business-Decisions-2025_compressed
44 pages
DecodingDataYourJourneytoInsights_sK0tHmRR
No ratings yet
DecodingDataYourJourneytoInsights_sK0tHmRR
12 pages
Data Analyst Syllabus(for Aundh)
No ratings yet
Data Analyst Syllabus(for Aundh)
8 pages
Business Analytics Notes
No ratings yet
Business Analytics Notes
8 pages
Lecture 01 Overview of Business Analytics
No ratings yet
Lecture 01 Overview of Business Analytics
52 pages
Lecture 1
No ratings yet
Lecture 1
27 pages
BADM lý thuyết 1-6
No ratings yet
BADM lý thuyết 1-6
11 pages
DATA ANALYSIS JURY DOCUMENT
No ratings yet
DATA ANALYSIS JURY DOCUMENT
24 pages
UNIT 2 Data Analysis
No ratings yet
UNIT 2 Data Analysis
19 pages
Summary_ Introduction to Data Analytics (2)-3978
No ratings yet
Summary_ Introduction to Data Analytics (2)-3978
7 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Data Analysis CheatSheet
No ratings yet
Data Analysis CheatSheet
34 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
Data Science Course Agenda
No ratings yet
Data Science Course Agenda
29 pages
UNITWISE-IMP-NOTES
No ratings yet
UNITWISE-IMP-NOTES
34 pages
MODULE-2
No ratings yet
MODULE-2
18 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
9 pages
Da End Sem
No ratings yet
Da End Sem
5 pages
Microsoft Excel - Introduction To Data Sceince
No ratings yet
Microsoft Excel - Introduction To Data Sceince
22 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
Evans Analytics2e PPT 01
No ratings yet
Evans Analytics2e PPT 01
48 pages
Vivek1
No ratings yet
Vivek1
91 pages
DATA ANALYTICS SYLLABUS
No ratings yet
DATA ANALYTICS SYLLABUS
12 pages
Note GG Data Analytics Course
No ratings yet
Note GG Data Analytics Course
16 pages
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
No ratings yet
Unit1 Introduction To Data Analytics and Data Analytics Lifecycle Notes
13 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Manan1
No ratings yet
Manan1
65 pages
Data Analytics Template - Task 3 - Final
No ratings yet
Data Analytics Template - Task 3 - Final
11 pages
ADM 2302: Introduction To Business Analytics
No ratings yet
ADM 2302: Introduction To Business Analytics
49 pages
889e5783-4a7a-4838-89a8-daf772cf3b8d_UNIT_2
No ratings yet
889e5783-4a7a-4838-89a8-daf772cf3b8d_UNIT_2
11 pages
Basic Business Analytics Using Excel, Chapter 01
No ratings yet
Basic Business Analytics Using Excel, Chapter 01
21 pages
Data Analytics Complete Guide 2
No ratings yet
Data Analytics Complete Guide 2
3 pages
Lecture 0_dd96a9317d5537072feea03a885dc911
No ratings yet
Lecture 0_dd96a9317d5537072feea03a885dc911
21 pages
Chapter 1 - Intro To Business Analytics
No ratings yet
Chapter 1 - Intro To Business Analytics
52 pages
Data Analytics
No ratings yet
Data Analytics
36 pages
DA-1,2,3[1]_merged
No ratings yet
DA-1,2,3[1]_merged
39 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
Data Analytics Unit 1
No ratings yet
Data Analytics Unit 1
10 pages
DA Unit 2
No ratings yet
DA Unit 2
16 pages
BA
No ratings yet
BA
25 pages
PBS - 3 (1)
No ratings yet
PBS - 3 (1)
20 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Basic Stat 1
No ratings yet
Basic Stat 1
50 pages
Clinical Practice: Development and Validation of A New Index To Measure Emergency Department Crowding
No ratings yet
Clinical Practice: Development and Validation of A New Index To Measure Emergency Department Crowding
5 pages
Advanced R Notes
No ratings yet
Advanced R Notes
28 pages
Data Preprocessing - DWM
No ratings yet
Data Preprocessing - DWM
42 pages
Spss Manual
No ratings yet
Spss Manual
27 pages
Non Parametrical Statics Biological With R PDF
No ratings yet
Non Parametrical Statics Biological With R PDF
341 pages
3is Lesson Plan
No ratings yet
3is Lesson Plan
7 pages
1.untitled: 4. What Is The Mean of Customer Age? Interpret Result
No ratings yet
1.untitled: 4. What Is The Mean of Customer Age? Interpret Result
8 pages
Unit 2 - Summarizing Data - Charts and Tables
100% (1)
Unit 2 - Summarizing Data - Charts and Tables
33 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
19 pages
Gmath Finals Module 2 Chapter 4
100% (3)
Gmath Finals Module 2 Chapter 4
39 pages
CNS Multiparameter Optimization Desirability: Application in Drug Discovery
No ratings yet
CNS Multiparameter Optimization Desirability: Application in Drug Discovery
9 pages
In Class Assignment - SBWijesundara - 159382M
No ratings yet
In Class Assignment - SBWijesundara - 159382M
6 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
135 pages
Longitudinal Composite 3D Faces and Facial Growth Trends in Children 6 11 Years of Age Using 3D Cephalometric Surface Imaging
No ratings yet
Longitudinal Composite 3D Faces and Facial Growth Trends in Children 6 11 Years of Age Using 3D Cephalometric Surface Imaging
11 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
Visualization
No ratings yet
Visualization
27 pages
Quartiles and Interquartile Range
No ratings yet
Quartiles and Interquartile Range
30 pages
Fin Math
100% (1)
Fin Math
151 pages
Wombat Statistical Analysis U3227719
No ratings yet
Wombat Statistical Analysis U3227719
5 pages
Data Science Unit 4
No ratings yet
Data Science Unit 4
15 pages
Complete (Ebook PDF) Modern Business Statistics With Microsoft Office Excel 6th Edition PDF For All Chapters
100% (5)
Complete (Ebook PDF) Modern Business Statistics With Microsoft Office Excel 6th Edition PDF For All Chapters
41 pages
Exploratory Data Analysis Presentation Handout
No ratings yet
Exploratory Data Analysis Presentation Handout
38 pages
Complete Download Essentials of Statistics For Business and Economics 7th Edition (Ebook PDF) PDF All Chapters
100% (2)
Complete Download Essentials of Statistics For Business and Economics 7th Edition (Ebook PDF) PDF All Chapters
41 pages
VinQCheck: An Intelligent Wine Quality Assessment
No ratings yet
VinQCheck: An Intelligent Wine Quality Assessment
9 pages
Unit 10 Practice Problems
No ratings yet
Unit 10 Practice Problems
22 pages
R: Devore Solutions
No ratings yet
R: Devore Solutions
29 pages