0% found this document useful (0 votes)

12 views13 pages

Data Analysis

Uploaded by

Nermine Limeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views13 pages

Data Analysis

Uploaded by

Nermine Limeme

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Wejden zormati

Supervised Learning
In supervised learning, the algorithm is provided with labeled training data, and the goal is to
learn a mapping from inputs (features) to outputs (labels or target values).
If there’s a prediction then you’re in the supervising learning
Characteristics:
● Labeled Data: The training data includes both the input data and the correct output. This
allows the algorithm to learn from the provided examples.
● Prediction Goal: The primary objective is to make accurate predictions for new, unseen
data based on what the model has learned from the training data.
● Feedback Loop: The algorithm can adjust itself based on the error of its predictions on
the training data.
Example : you can predict if a telecom customer is planning on shutting down and changing his
line or not based on his infos (chart analysis) , also prepare menu based on previous receipt of
customers .
Unsupervised Learning
In unsupervised learning, the algorithm is given data without explicit instructions on what to do
with it. The system tries to learn the patterns and the structure from the data without any labeled
responses to guide the learning process.
Characteristics:
● Unlabeled Data: The training data doesn't have any labels or target values.
● Discovery Goal: The primary objective is to identify patterns, relationships, or structures
in the data.
● No Feedback: There's no feedback loop based on prediction error, as there are no
"correct" answers.
Sampling :
- Random + growing size
probability sampling technique ( there’s a frame )
Probability sampling is a technique wherein every member of the population has a known,
non-zero chance of being selected.
● Simple Random Sampling (SRS): Every individual has an equal probability of being
selected.
Non-probability sampling ( there is no frame , big enough )
The Questionnaire:

● Definition: A questionnaire is described as a research instrument that consists

of a series of questions. Its main objective is to gather information from
respondents.
● Advantages: Although the full details of the advantages aren't provided in the
initial extract, it's likely that the document discusses the benefits of using
questionnaires. Common advantages include cost-effectiveness, ease of
distribution, and the ability to collect data from a large number of respondents in
a relatively short time.

Generalities about Sampling:

This section would discuss the foundational concepts of sampling. Topics such as the
distinction between sampling and a full census might be explained, highlighting the
scenarios where each is most applicable. Another probable topic is the justification for
sampling — why researchers often prefer sampling over other methods of data
collection.

Probability Sampling Techniques:

This section delves into various standardized probability sampling methods. Each
method might be elaborated upon, discussing its advantages, disadvantages,
use-cases, and potential pitfalls.

Some of the methods mentioned include:

● Simple Random Sampling (SRS)

● Systematic Sampling: every nth individual , K=N/n
● Sampling with Probability Proportional to the Unit's Size : The unit with
larger size has the greatest chance of being included in the sample.
● Stratified Random Sampling : The population is subdivided into strata
(relatively homogeneous groups) which are mutually exclusive. From each strata
the same proportion of individuals is drawn. The sampling rate is the same in all
strata.
● Disproportionate stratified sampling The only difference between
proportionate and disproportionate stratified random sampling is their sampling
fractions. With disproportionate sampling, the different strata have different
sampling fractions.
● Cluster Sampling
● Multistage SamplingSimilar to cluster sampling, except that in this case a
sample is taken from each cluster. We have at least two degrees. The first one
identifies large clusters (primary units). In the second degree, within each cluster,
the units (secondary units) that will be part of the sample are selected.
● is a sampling method that consists of ensuring the representativity of a sample

Non-Probability Methods:

Unlike probability sampling techniques where each member of the population has a
known, non-zero chance of being selected, non-probability sampling doesn't give every
member a chance. Methods such as Judgmental Sampling, Quota Method selected by
an investor (with importance ), and Route Sampling (directions and locations) would be
discussed in detail, with emphasis on when and why they might be used.
Sample Adjustment:

It's important in sampling to ensure the sample accurately represents the broader
population. This section might talk about techniques to adjust samples. These could
range from adjusting according to quantitative variables, by weight, or using other
methodologies to ensure representativeness.

This is a broad overview based on the initial extraction. If you have specific questions or
topics within this chapter that you'd like to explore further, please let me know, and I'll do
my best to provide insights based on my knowledge.

● 𝑌𝑞 = 𝑦 µ/𝑥
- Ӯ = the average turnover
- µ =the average number
- 𝑥 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑚𝑒𝑎𝑛

Adjustment
By deleting: In order to have the same characteristics of the reference population, we
can select randomly some observations from the sample to delete.
By bootstrapping: In order to have the same characteristics of the reference
population, we can create a new bootstrap sample from the available sample
Chapter 2 : Bivariate analysis
Imagine you're at a dance party. You're not just dancing alone; you have a partner. How
you move, whether you're in sync, and the space between you two can tell a lot about
your partnership. That's bivariate analysis - the dance of two variables!

1️⃣ Statistical Characters: Dependence and Interdependence

● Dependency Relationships: Think of it as a one-way street. How one thing (like

age) affects another (like brand preference).
● Interdependence Relationships: Now, it's a roundabout where both variables
affect each other, like a back-and-forth dialogue.

2️⃣ Pearson Correlation Analysis

● This is about the dance rhythm! If both you and your partner (two variables)
move together in perfect sync, that's a strong correlation.
● Pearson's r: Tells you about the dance style:
○ Close to +1: Perfect sync in the same direction (Positive Correlation)
○ Close to -1: Perfect sync but in opposite directions (Negative Correlation)
○ Around 0: You two are doing your own thing (No Correlation)

3️⃣ Hypothesis testing for a risk level α

3️⃣ Cross Table Analysis and Chi-Square Test

● Imagine you're trying to find out if wearing sunglasses at the party (one variable)
is related to dance style (another variable). The Chi-Square test helps you see if
the variables are busting moves independently or in a pattern.
4️⃣ Comparison Test of Two Averages

● This is like having a dance-off between two groups to see if they're really
different. For example, do groups with different DJs (categories) really have
different energy levels (means)?

5️⃣ Analysis of Variance (ANOVA)

● Now, it's a multi-group dance-off! ANOVA checks if there are real differences in
the performance (means) of more than two dance crews (groups/categories).

🎯 In Summary
Bivariate analysis is like being a dance judge. You're watching the interaction of two
dancers (variables) and deciding:
● Are they in sync or not? (Correlation)
● Is their performance related to other factors like dance style or the song playing?
(Chi-Square Test)
● How do different pairs or groups compare? (t-Tests and ANOVA)

🔍 We use different 'judging criteria' (statistical tests) based on what we're trying to find
out. And that, dear students, is the art of understanding the dance floor of data through
Bivariate Analysis! 🕺💃
Chapter 3: Normalized Principal Components Analysis

Initial Data
Imagine you're an artist with a vast canvas made up of tiny squares. Each square can be
filled with colors representing different data points (like happiness, wealth, population, etc.),
and each row of squares is a detailed portrait of a country.

In technical terms, you have a matrix where columns represent quantitative variables, and
rows are statistical units.

Technical Objectives

As a treasure hunter, you have a detailed map and your goal is to:

Simplify the map by finding the

most straightforward path.
Group similar landmarks
together and identify unique
ones that stand out.
Understand the relationships
between different paths and
landmarks.
In data analysis, this means
reducing data dimensions,
clustering similar data points,
and analyzing variable
relationships.
Cases Studies and Methodologies

Imagine you're now an architect studying various building designs. You use special lenses
(mathematical tools and projections) to view the buildings from different angles,
understanding their structure more clearly.

These slides are likely discussing specific case studies (like analyzing consumer
perceptions) and the technical methods (like orthogonal projections) used for analysis.
Individuals Scatter Plot Analysis

Back to being an artist, but this time, you're critiquing art (data points). You realize that
certain colors (variables) are more dominant and help in understanding the overall painting
better.

This slide talks about how specific variables provide a better understanding of data
dispersion and the importance of visual representation in data analysis.

Variable Scatter Plot Analysis

Now, you're a photographer, adjusting your camera lens to get the perfect shot of different
birds (data points) in a forest (data set). The angle and focus (mathematical calculations
and projections) are crucial to understanding each bird's importance and role in the forest's
ecosystem.

These slides discuss how each data point is analyzed in relation to others through scatter
plots, highlighting the technical process behind it
Understanding Variability

Imagine you're an explorer looking at different paths in a forest. Each path is unique, leading
to various destinations (outcomes). Your goal is to understand which paths (variables) have
the most influence on your journey (the system).

In technical terms, this is about understanding which variables contribute most to the
variability in your data. By identifying these, you can simplify complex multi-dimensional
data into more manageable forms, focusing on the most critical factors.
Dimension Reduction Techniques

Now, think of yourself as a gardener. Your garden is vast, filled with numerous types of
plants (variables). However, you need to make it simpler for visitors to tour. So, you decide
to organize the plants by certain similarities, grouping them (dimension reduction) and
creating specific paths (principal components) highlighting the most distinctive and
interesting plants.
In data analysis, this is what dimension reduction techniques do. They reduce the number of
variables to consider, making data interpretation more straightforward without significantly
losing important information.

Identifying Outliers

As a teacher in a classroom, you notice that while most students perform similarly, a few
perform exceptionally well or need extra help. These students stand out (outliers), and
understanding them can offer insights into what makes exceptional cases different and
perhaps what influences overall performance.

This rate defines the explanatory power of the k first axis (or factors): it represents the part
of total variance taken into account by these k axis. However, its appreciation must take into
account the number of variables and the number of individuals.

In your data, outliers can provide valuable insights. They might indicate errors, or they might
reveal areas for innovation or improvement.

Correlation Between Variables

Imagine being at a dance where pairs of dancers move in sync. When two dancers
(variables) move closely together, they are highly correlated. When one moves
independently of the other, they are less correlated or uncorrelated.

Understanding the relationships between variables helps in predicting how changes in one
variable might affect another, which is crucial in planning and decision-making processes.

Slide 18: Visualization Techniques

You're a tourist using a map of a complex subway system. The map simplifies information,
helping you understand how to navigate the various routes. Similarly, data visualization
techniques allow complex data to become more understandable and accessible, aiding in
identifying patterns, trends, and insights that might be less obvious in a tabular format.

Slide 19: Interpretation and Decision Making

Back in the village, after your thorough investigation and research, you gather the villagers
(stakeholders) to discuss your findings. You interpret the data, highlight important insights,
and guide decision-making for the village’s future. This stage is crucial because the way
data is interpreted directly influences the decisions made and the actions taken.

Slide 20: Conclusion and Next Steps

As the sun sets, you conclude your adventure. You recap the journey, what you've learned,
and suggest the next steps. Maybe more paths need exploring, or perhaps the village will
start a new tradition based on your findings.

Similarly, you conclude your presentation by summarizing key points, reflecting on the
project’s significance, and possibly suggesting future research areas or strategies based on
your findings.
REVISE PCA IN MIDTERM ? CHAPTER 3

Fitzpatrick's Dermatology in General Medicine, 7th Ed
50% (2)
Fitzpatrick's Dermatology in General Medicine, 7th Ed
27 pages
DATA I Revision Data Analysis
No ratings yet
DATA I Revision Data Analysis
16 pages
ML-Lecture-6-7-preprocess
No ratings yet
ML-Lecture-6-7-preprocess
43 pages
Deep Jain QA
No ratings yet
Deep Jain QA
13 pages
ECONOMICS SEM 4 Notes Sakshi
No ratings yet
ECONOMICS SEM 4 Notes Sakshi
10 pages
مبادئ الاحصاء
No ratings yet
مبادئ الاحصاء
66 pages
An Overview of Descriptive Statistics
No ratings yet
An Overview of Descriptive Statistics
6 pages
Statistics and Statistical Thinking
No ratings yet
Statistics and Statistical Thinking
6 pages
Technical Background - Lecture Notes
No ratings yet
Technical Background - Lecture Notes
24 pages
RSU - Statistics - Lecture 1 - Final - myRSU
100% (1)
RSU - Statistics - Lecture 1 - Final - myRSU
44 pages
Data Analysis Notes
No ratings yet
Data Analysis Notes
9 pages
Assignment 3 Dsml Grp4.PDF
No ratings yet
Assignment 3 Dsml Grp4.PDF
8 pages
Basicof Stats
No ratings yet
Basicof Stats
7 pages
SMA 160 -Stds notes (2025)
No ratings yet
SMA 160 -Stds notes (2025)
40 pages
Statistics and Probability Q3
No ratings yet
Statistics and Probability Q3
6 pages
Lecture 1 9a3d8bcb1bc29be9e922fa564cba0c3a
No ratings yet
Lecture 1 9a3d8bcb1bc29be9e922fa564cba0c3a
26 pages
DMBA103
No ratings yet
DMBA103
9 pages
Marketing Research UNIT -5 important question Nd answer
No ratings yet
Marketing Research UNIT -5 important question Nd answer
6 pages
Statistics For Data Science
No ratings yet
Statistics For Data Science
30 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
54 pages
Stat 1
No ratings yet
Stat 1
41 pages
Lecture 05
No ratings yet
Lecture 05
29 pages
2a. Sources of Data
No ratings yet
2a. Sources of Data
27 pages
C1 STS
No ratings yet
C1 STS
3 pages
Research Methodogy Class 5
No ratings yet
Research Methodogy Class 5
29 pages
Research Methodogy Class 4
No ratings yet
Research Methodogy Class 4
29 pages
PR2Q2 Reviewer
No ratings yet
PR2Q2 Reviewer
12 pages
№1 лабораториялық жұмыс
No ratings yet
№1 лабораториялық жұмыс
2 pages
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
No ratings yet
APznzaZmf FjNZzQU2KZGNWcTIMyEPNieeXpEIC4txhLpx IW9aIcijwEdcvmrObIy4gDpcU78AYLsB6msaeqj47x3Fc6z9vdKhe5EnyMTtReSpFg 23R3DG W66DWWysqOW PfB BJrKuEN CsrKXdSrdM OKOdbGKa2ND0ltkJXrievcwimUpSlHEYiQCPleUm8zmyjmaz7 PPZRnRfUuizv
24 pages
Final SB: Chapter1: Overview of Statistics
No ratings yet
Final SB: Chapter1: Overview of Statistics
32 pages
UNIT III
100% (1)
UNIT III
36 pages
Seminar 4
No ratings yet
Seminar 4
43 pages
Lecture Notes - Prob and Stat
No ratings yet
Lecture Notes - Prob and Stat
229 pages
WK 1 3
No ratings yet
WK 1 3
5 pages
Bio Statistics
No ratings yet
Bio Statistics
217 pages
Research Methods: Dr. Abeer Yasin
No ratings yet
Research Methods: Dr. Abeer Yasin
109 pages
Data Science Q&A - Latest Ed (2020) - 2 - 2
No ratings yet
Data Science Q&A - Latest Ed (2020) - 2 - 2
2 pages
Stats Midterms Cheat Sheet
No ratings yet
Stats Midterms Cheat Sheet
3 pages
Statistics Lec 1
No ratings yet
Statistics Lec 1
28 pages
EDA- Reviewer Midterm
No ratings yet
EDA- Reviewer Midterm
9 pages
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
No ratings yet
Week 1: To Statistics: 1.1 An Overview of Statistics 1.2 Data Classification 1.3 Sampling Technique and Data Collection
27 pages
Introduction to Biostatistics
No ratings yet
Introduction to Biostatistics
67 pages
Chapter 4 Data Collection and Sampling Method
No ratings yet
Chapter 4 Data Collection and Sampling Method
25 pages
EDA- Reviewer Midterm
No ratings yet
EDA- Reviewer Midterm
8 pages
MATH-REVIEWER
No ratings yet
MATH-REVIEWER
7 pages
Data Science Interview Q - A
No ratings yet
Data Science Interview Q - A
165 pages
Statistics and Data Management
No ratings yet
Statistics and Data Management
8 pages
Tuesday, 16 January 2024 2:58 PM
No ratings yet
Tuesday, 16 January 2024 2:58 PM
46 pages
STA207 Notes 1
No ratings yet
STA207 Notes 1
26 pages
STATISTICS
No ratings yet
STATISTICS
12 pages
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
No ratings yet
Chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2chapters 1 and 2
47 pages
Statistical Treatment
No ratings yet
Statistical Treatment
7 pages
Analytics PrepBook AnSoc 2017 PDF
100% (1)
Analytics PrepBook AnSoc 2017 PDF
41 pages
Data Analysis Procedure
0% (1)
Data Analysis Procedure
27 pages
Mathematics Statistics
No ratings yet
Mathematics Statistics
4 pages
Sampling
No ratings yet
Sampling
34 pages
DAV Short Notes
No ratings yet
DAV Short Notes
5 pages
GEDS 802 Note - Descriptive Stat - pt.2
No ratings yet
GEDS 802 Note - Descriptive Stat - pt.2
27 pages
Statistical Theory and Its Solutions
From Everand
Statistical Theory and Its Solutions
Pasquale De Marco
No ratings yet
Glossary of Research Methodology
From Everand
Glossary of Research Methodology
Dr. Awadhesh Kishore
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Macabacus Quick Start Guide
No ratings yet
Macabacus Quick Start Guide
2 pages
Reporting Skills Booklet 1
No ratings yet
Reporting Skills Booklet 1
30 pages
0000 - Eb Approach Inherent and Residual Risk File
No ratings yet
0000 - Eb Approach Inherent and Residual Risk File
10 pages
NN
No ratings yet
NN
1 page
Negotiation Roleplay V2
No ratings yet
Negotiation Roleplay V2
3 pages
Chapter 4 Markov Chain
No ratings yet
Chapter 4 Markov Chain
39 pages
Employee Evaluation: Responsibilities
No ratings yet
Employee Evaluation: Responsibilities
2 pages
Investment Strategy
No ratings yet
Investment Strategy
13 pages
Summary For Moubarkis Online Session About K Means
No ratings yet
Summary For Moubarkis Online Session About K Means
12 pages
Revision
No ratings yet
Revision
2 pages
Tuto 1
No ratings yet
Tuto 1
2 pages
Density Based
No ratings yet
Density Based
52 pages
Chapter 6
No ratings yet
Chapter 6
5 pages
Tutorial 2 Solutions
No ratings yet
Tutorial 2 Solutions
7 pages
Probability Sampling - Formulas Sheet
No ratings yet
Probability Sampling - Formulas Sheet
3 pages
Project Management
No ratings yet
Project Management
3 pages
Question and Answer PCA
No ratings yet
Question and Answer PCA
4 pages
Econometrics Summary
No ratings yet
Econometrics Summary
3 pages
Chapter 1 With Case KCL(1)
No ratings yet
Chapter 1 With Case KCL(1)
93 pages
Bernardes-et-al-2017
No ratings yet
Bernardes-et-al-2017
22 pages
Sample Paper June 2025 y7....... - Copy
No ratings yet
Sample Paper June 2025 y7....... - Copy
8 pages
Full Olympic Cities City Agendas Planning and The World S Games 1896 2016 2nd Edition John R. Gold Ebook All Chapters
100% (5)
Full Olympic Cities City Agendas Planning and The World S Games 1896 2016 2nd Edition John R. Gold Ebook All Chapters
84 pages
Applied Thermal Engineering: Zan Wu, Lei Wang, Bengt Sund En, Lars Wads o
No ratings yet
Applied Thermal Engineering: Zan Wu, Lei Wang, Bengt Sund En, Lars Wads o
8 pages
Biological Nutrient Removal: Sneha G K 4SU17CV039 VIII Sem, Civil Engg SDM I T, Ujire
No ratings yet
Biological Nutrient Removal: Sneha G K 4SU17CV039 VIII Sem, Civil Engg SDM I T, Ujire
23 pages
Sand Control
No ratings yet
Sand Control
4 pages
PART6
No ratings yet
PART6
43 pages
Automotive Parts - Unscreened Low-Voltage Cables: Japanese Automobile Standard JASO D611:2009
100% (1)
Automotive Parts - Unscreened Low-Voltage Cables: Japanese Automobile Standard JASO D611:2009
25 pages
Agroforestry Primer 05
No ratings yet
Agroforestry Primer 05
18 pages
Paper 2
No ratings yet
Paper 2
11 pages
Erhard Sytems Technological Institute
No ratings yet
Erhard Sytems Technological Institute
5 pages
1 Final Teacher's Guide
No ratings yet
1 Final Teacher's Guide
13 pages
Scarlet Ibis Lesson Plan
No ratings yet
Scarlet Ibis Lesson Plan
2 pages
Revise The First Conditional With A Song
No ratings yet
Revise The First Conditional With A Song
3 pages
Umw Lab 01
No ratings yet
Umw Lab 01
1 page
Math 001 Worksheets 201 (1)
No ratings yet
Math 001 Worksheets 201 (1)
91 pages
Vessel Management Plan 0
100% (1)
Vessel Management Plan 0
22 pages
Affordable Housing in Karnataka
No ratings yet
Affordable Housing in Karnataka
11 pages
TG-8000 Check Sheet (MAINTENANCE)
100% (1)
TG-8000 Check Sheet (MAINTENANCE)
3 pages
For Questions 1-8, Read The Text Below and Decide Which Answer A, B, C or D Best Fits Each Space
No ratings yet
For Questions 1-8, Read The Text Below and Decide Which Answer A, B, C or D Best Fits Each Space
1 page
English teachers’ practices and beliefs towards instructional media in Indonesia
No ratings yet
English teachers’ practices and beliefs towards instructional media in Indonesia
10 pages
Sample LAT
No ratings yet
Sample LAT
6 pages
De Thi Giua Ki 2 Tieng Anh 10 Global Success de So 1 1676626147
No ratings yet
De Thi Giua Ki 2 Tieng Anh 10 Global Success de So 1 1676626147
19 pages
Detailed Lesson Plan
No ratings yet
Detailed Lesson Plan
5 pages
Basic Survey 1
No ratings yet
Basic Survey 1
39 pages
Cross Tabs
No ratings yet
Cross Tabs
2 pages
Lec.2 Soil Classification
No ratings yet
Lec.2 Soil Classification
45 pages
STS Midterm
100% (2)
STS Midterm
14 pages

Data Analysis

Uploaded by

Data Analysis

Uploaded by

Wejden zormati

● Definition: A questionnaire is described as a research instrument that consists

Generalities about Sampling:

Probability Sampling Techniques:

Some of the methods mentioned include:

● Simple Random Sampling (SRS)

1️⃣ Statistical Characters: Dependence and Interdependence

● Dependency Relationships: Think of it as a one-way street. How one thing (like

2️⃣ Pearson Correlation Analysis

3️⃣ Hypothesis testing for a risk level α

5️⃣ Analysis of Variance (ANOVA)

Simplify the map by finding the

Variable Scatter Plot Analysis

Correlation Between Variables

Slide 18: Visualization Techniques

Slide 19: Interpretation and Decision Making

Slide 20: Conclusion and Next Steps

You might also like