0% found this document useful (0 votes)

8 views6 pages

Data Management

How to manage data

Uploaded by

22101740

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Data Management

How to manage data

Uploaded by

22101740

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

odule 4.1.

1 Data Management
M
- Practice of organizing and maintaining
data processes to meet ongoing
information lifecycle needs

egend:
L
𝑥 = repeating value
𝑤 = number of occurrences of weight
𝑥 (w/ a line on top) = mean

. M
2 edian
- Middle score for a set ofdatathathas
been arranged in order of magnitude
- Formula (location of median):
Measures of central tendency median = (n+1)/2
. M
1 ean Legend:
- The sum of all values in a dataset n = total number of data values in the sample
divided by the total number of values Note:
*if n is odd, median is the middle value
pros cons
* if n is even, median is the average of the 2
an be used with
C It is susceptible to middle values
both discrete and the influence of
continuous data outliers . M
3 ode
- The most frequent score in the data
(what value pops up the most)

Type of variable (based Best measure

of level of measurement)

nominal mode

ordinal median

interval/ratio (not skewed) mean

interval/ratio (skewed) median

.1 Arithmetic Mean (population mean)
1
- Average of a complete set of data egend:
L
(population) Nominal = data can be categorized
1.2 Sample Mean Ordinal = can be categorized and ranked
- The average of a subset of data taken Interval = can be categorized, ranked, and
from a larger population (sample) evenly spaced
1.3 Weighted Mean Ratio = can be categorized, ranked, evenly
- An average computed by giving spaced, and has a natural zero
different weights to some of the
individual values.
Examples w/ explanation
- It is a non-negative

nominal Ethnicity
- Can't be ranked

ordinal Top 5 olympic medalist

- Does not tell you how close or far
they are in terms of number of wins

interval Temp in celsius

- There are equal intervals of one
degree, but the zero point is not
true as there that measurement can
reach negative degree celsius egend:
L
ratio height
xi = each individual value
μ = population mean
xˉ = sample mean
Measures of variation
N = number of values in the population
- Gives information on the spread or
n = number of values in the sample
variability of the data values
- Although 2 dataset’s could have the
teps on how to calculate:
S
same center, there variation could be
1. Calculate the mean
very different
2. Subtract the mean from each value
3. Square each deviation
4. Add up all the squared deviations
5. Divide by the number of values
5.1 if population, do it as is
5.2 for sample, do n-1
. R
1 ange
6. For standard deviation:
- Used to compare obvious data sets
6.1 square root the variance
- Formula:Range=highestvalue-lowest
value
4.Coefficient of variation
- Considered as the weakestmeasureof
- This kind ofmeasureallowstoormore
spread
distributions measured in same r
Note: heavily influenced by extreme values
different units to be compared
(outliers) and only compares 2 values
- How big is the standard deviation
compared to the mean
. V
2 ariance (σ2)
- Formula:CV =σ/(x w/ a line on top) x 100 %
- A measure of howfarasetofdataare
Legend:
dispersed from the mean
σ = St Dev
- Itisnon-negativesinceeachterminthe
x w/ a line on top = mean
variance is squared
- Allunitsissquared,ex:asetofweights
ote:thelowertheCV,thelesserthedispersion
N
in kg will be given in kg squared
of data values

. S
3 tandard deviation (σ)
5.Interquartile Range (IQR)
- Measuredthedeviationofdatafromits
- Definesthedifferencebetweenthethird
mean
and the first quartile
- ormula: upper quartile -lowerquartile
F
= q3 - q1
How to calculate the quartiles:
Note:
Q1 = 25% of data falls below this value
Q2 = 50% of the data falls below this value
Q3 = 75% of the data falls below this value
1. Arrange the data (smallest to largest)
2. Find q2 (median)
3. Find Q1 (median of the lower half of
data)
4. Find Q3 (median of the upper half of
data)

Correlation (Pearson)
- easure describing the way two
M
variables vary together
- Statisticthatmeasuresthestrengthand
direction of a linear relationship
between two quantitative variables
- r represents the correlation coefficient

Size of Correlation Interpretation

.90 to 1 (-0.90 to -1) ery high positive

V
(negative) correlation

.70 to .90 (-.70 to -.90) igh positive (negative)

H
correlation

.50 to .70 (-.50 to -.70) oderate positive

M odule 4.4 Simple Regression
M
(negative) correlation
- A statistical tool that is used in the
.30 to .50 (-.30 to -.50) ow positive (negative)
L quantification of the relationship
correlation between a single independent variable
.00 to .30 (.00 to -.30) Negligible correlation and a single dependent variable on
observationsthathavebeencarriedout
in the past

Formula for regression: y = a + bx

Legend! lternative Hypothesis (H₁ or Ha)
A
y = dependent variable - Represents what you aim to support or
x = independent variable prove
a = intercept (value of y when x = 0) - Indicates the presence of an effect,
b = slope (change in y for every 1 unit difference, or relationship
increase in x) - Always contains inequality ( ≠, >, < )
—---------------------------------------------------- Purpose:proposed if H₀ is rejected
Formula for slope: Examples:
● H₁: μ ≠ 50 (the mean is not 50)
= number of variables
n ● H₁: p < 0.7 (the proportion is less than
—---------------------------------------------------- 70%)
Formula for intercept a: Note: Direction depends on research
question
and y (w/ a line): represents the means
x ● One-tailed: H₁ uses < or >
*Meansmustbecalculatedfirstbeforeyoucan
get the intercept ● Two-tailed: H₁ uses ≠
—----------------------------------------------------
Interpretation of Regression Steps in Hypothesis Testing
“For every 1 extra unit of smth, the dependent . S
1 tate H₀ and H₁
variable (y) will increase by (b). If independent 2. Choose a significance level (α, usually
variable (x) stays at 0, the predicted value will be 0.05)
(a)” 3. Collect and analyze sample data
4. Compute test statistic (e.g., z, t)
Hypothesis Testing 5. Compare with critical value or use
p-value
- A method of making statistical decisions using 6. Make a decision:
experimental data ○ If p-value ≤ α →Reject H₀
Used to test assumptions (claims) about a ○ If p-value > α →Fail to reject
population parameter based on sample data H₀

Null Hypothesis (H₀) ype I and Type II Errors

T
- epresents the default or status quo
R - Types of incorrect decisions that may
assumption occur in hypothesis testing
- Assumes no effect, no difference, or no - Related to the truth or falsity of the null
relationship between variables hypothesis (H₀) and what decision is
- Always contains equality ( =, ≥, ≤ ) made based on the data
Purpose:to test whether there's
enough evidenceagainstit ype I Error (α)
T
Examples: - Occurs when we reject the null
● H₀: μ = 50 (the population mean is 50) hypothesis (H₀) even though it is
● H₀: p ≥ 0.7 (the population proportion is actually true
at least 70%) - Also called a false positive
Note: If evidence is strong, we reject H₀ - We detect an effect that isn’t really there
Example: inomial Distribution
B
- A person is diagnosed with a disease – Used when there are repeated trials, each with
(reject H₀) but is actually healthy (H₀ is two possible outcomes: success or failure
true) – The probability of success stays the same for
Controlled by: Significance level (α) each trial
– Common value: α = 0.05 – Each trial is independent (one doesn’t affect
the others)
ype II Error (β)
T – Think: “yes or no” situations, repeated several
- Occurs when we fail to reject the null times
hypothesis (H₀) even though it is
actually false xample:
E
- Also called a false negative – Flipping a coin 10 times and counting how
- We fail to detect an effect that is really many heads
there – Guessing on a multiple-choice quiz with 5
Example: questions
- A person is told they’re healthy (fail to Conditions to use binomial:
reject H₀) but actually has the disease – Fixed number of trials (n)
(H₀ is false) – Two outcomes: success/failure
Related to: Power of the test – Constant probability (p)
– Power = 1 - β – Trials are independent
Formula:
rror Summary Table
E
Decision Made
– H₀ is True → Type I Error (α)
– H₀ is False → Type II Error (β)
– Reject H₀ → may lead to Type I Error if H₀ is
true
– Fail to Reject H₀ → may lead to Type II Error if
H₀ is false

egend:
L
– H₀ = Null Hypothesis Poisson Distribution
– α = Probability of Type I Error – Used to count how often an event happens
– β = Probability of Type II Error over time or space
– Power = Probability of detecting a true effect – You don’t know the number of trials, but you
(1 − β) know the average rate
Note: – Events happen randomly and independently
– Lowering α reduces Type I errors but – Best for rare events
increases the chance of Type II errors
– Increasing sample size helps reduce both xample:
E
errors - Number of emails received in an hour
– Cars arriving at a toll booth in a minute
– Number of errors in a book
Conditions to use Poisson:
– Events occur randomly and independently
– Happen at a constant average rate (λ) Z-Score
– Based on time, area, volume, or distance
– A Z-score tells you how far a data point is
from the mean, in terms of standard deviations
– Helps standardize different data values for
comparison
– A positive Z-score means the value is above
the mean
– A negative Z-score means the value is below
the mean
Hypergeometric Distribution
– Used when sampling without replacement
from a group
– The probability of success changes with each
draw
– The trials are dependent on each other
– Useful when the population is small and
known

xample:
E
- Drawing 5 cards from a deck without
replacement and counting the number of kings
– Selecting students from a class and counting
how many are seniors
– Picking colored balls from a bag and not
putting them back teps to Calculate Z-score:
S
Conditions to use hypergeometric: .
1 Find the mean of the data
– Population is finite (N) 2. Find the standard deviation
– Known number of successes in the 3. Subtract the mean from the data point
population (K) 4. Divide the result by the standard
– Drawing without replacement (n) deviation

Session 2
No ratings yet
Session 2
49 pages
Biostatistics Notes Part 1
No ratings yet
Biostatistics Notes Part 1
9 pages
Combinepdf
No ratings yet
Combinepdf
137 pages
MMW 0607
No ratings yet
MMW 0607
29 pages
Statistics - Exam Reviewer (Final)
No ratings yet
Statistics - Exam Reviewer (Final)
10 pages
Intro to Stats: Key Concepts
No ratings yet
Intro to Stats: Key Concepts
7 pages
Lecture 1
No ratings yet
Lecture 1
72 pages
Group 1
No ratings yet
Group 1
79 pages
Midterms Statistics Reviewer
No ratings yet
Midterms Statistics Reviewer
10 pages
MMW Data Management
No ratings yet
MMW Data Management
87 pages
Statistics
No ratings yet
Statistics
28 pages
Chapter 5 - RM
No ratings yet
Chapter 5 - RM
22 pages
Geo-Statistical Analysis and Presentation
No ratings yet
Geo-Statistical Analysis and Presentation
19 pages
Data Analysis Training Workshop - Day 2 Presentation
No ratings yet
Data Analysis Training Workshop - Day 2 Presentation
52 pages
Unit IV - Analytics Tasks (Students)
No ratings yet
Unit IV - Analytics Tasks (Students)
127 pages
Statistics Equationls
No ratings yet
Statistics Equationls
5 pages
Statistics - Reviewer
No ratings yet
Statistics - Reviewer
12 pages
Topic Review - Statistics
No ratings yet
Topic Review - Statistics
5 pages
8
No ratings yet
8
6 pages
Assignment No 2 Course Codes 8614
No ratings yet
Assignment No 2 Course Codes 8614
15 pages
Unit-2 Data Analytics Approaches
No ratings yet
Unit-2 Data Analytics Approaches
24 pages
Data Visualization Notes Ou
No ratings yet
Data Visualization Notes Ou
125 pages
DV Unit 1&2 Notes
No ratings yet
DV Unit 1&2 Notes
50 pages
Tutoring Study Plan
No ratings yet
Tutoring Study Plan
17 pages
A-Level Statistics Overview and Analysis
No ratings yet
A-Level Statistics Overview and Analysis
19 pages
Measures
No ratings yet
Measures
8 pages
Statistical Techniques - Bda
No ratings yet
Statistical Techniques - Bda
33 pages
Correlation and Hypothesis Testing Guide
No ratings yet
Correlation and Hypothesis Testing Guide
10 pages
Data Management Part 1 2024
No ratings yet
Data Management Part 1 2024
68 pages
Module 3 4 MMW
No ratings yet
Module 3 4 MMW
6 pages
Statistics SS2020
No ratings yet
Statistics SS2020
12 pages
Chapter 4 Data Management Reviewer
No ratings yet
Chapter 4 Data Management Reviewer
1 page
Data Management
No ratings yet
Data Management
36 pages
Module I. Basic Calculations. Average, Standard Deviation by Excel
No ratings yet
Module I. Basic Calculations. Average, Standard Deviation by Excel
48 pages
Statistics: An Introduction and Overview
No ratings yet
Statistics: An Introduction and Overview
51 pages
Statistics Is A Branch of
No ratings yet
Statistics Is A Branch of
6 pages
Lecture 7.descriptive and Inferential Statistics
100% (1)
Lecture 7.descriptive and Inferential Statistics
44 pages
Data Analysis Basics and Techniques
No ratings yet
Data Analysis Basics and Techniques
31 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
7 pages
Business Data & Statistics Guide
No ratings yet
Business Data & Statistics Guide
84 pages
Data Processing and Anlysis
No ratings yet
Data Processing and Anlysis
41 pages
MMW Data Management and Analysis
No ratings yet
MMW Data Management and Analysis
96 pages
Statistics
No ratings yet
Statistics
63 pages
Statss 2
No ratings yet
Statss 2
7 pages
Stats 201
No ratings yet
Stats 201
5 pages
CG8 Data-Analysis
No ratings yet
CG8 Data-Analysis
63 pages
Data Analysi Quantitative Lesson 4
No ratings yet
Data Analysi Quantitative Lesson 4
24 pages
2statistical Analysis of Data 2
No ratings yet
2statistical Analysis of Data 2
43 pages
Blue Modern Marketing and Emotions Presentation - 20230909 - 071331 - 0000.pptx - 20250108 - 191853 - 0000
No ratings yet
Blue Modern Marketing and Emotions Presentation - 20230909 - 071331 - 0000.pptx - 20250108 - 191853 - 0000
25 pages
Islamabad Semester Terminal Exam Autumn 2020 Name Zeenat Bibi Roll Number By479775 Program Bs English Course Name Introduction To Statistics
100% (1)
Islamabad Semester Terminal Exam Autumn 2020 Name Zeenat Bibi Roll Number By479775 Program Bs English Course Name Introduction To Statistics
23 pages
Chapter 3
No ratings yet
Chapter 3
17 pages
ISA Summary Toya
No ratings yet
ISA Summary Toya
38 pages
3 Matm111
No ratings yet
3 Matm111
3 pages
Lecture 2 - MAT361 (21 JAN 2025)
No ratings yet
Lecture 2 - MAT361 (21 JAN 2025)
40 pages
Understanding Statistics Basics
No ratings yet
Understanding Statistics Basics
4 pages
Unit-IV of Data Science
No ratings yet
Unit-IV of Data Science
38 pages
Unit II: Basic Data Analytic Methods
No ratings yet
Unit II: Basic Data Analytic Methods
38 pages
Measures of Central Tendency
No ratings yet
Measures of Central Tendency
2 pages
Filipinos Perception of Democracy
No ratings yet
Filipinos Perception of Democracy
1 page
Learning Strategies
No ratings yet
Learning Strategies
2 pages
Income Taxation - Principles of Taxation and Other Fiscal Laws
No ratings yet
Income Taxation - Principles of Taxation and Other Fiscal Laws
4 pages
Income Taxation Notes
No ratings yet
Income Taxation Notes
5 pages
Standard Error vs. Standard Error of Measurement Standard Error vs. Standard Error of Measurement
No ratings yet
Standard Error vs. Standard Error of Measurement Standard Error vs. Standard Error of Measurement
6 pages
Understanding Statistical Manipulation
No ratings yet
Understanding Statistical Manipulation
4 pages
Probability Distributions and Expectations
No ratings yet
Probability Distributions and Expectations
4 pages
Probabilistic Engineering Design
No ratings yet
Probabilistic Engineering Design
269 pages
Lesson-Plan-Math-7-Measures of Central Tendency 20234
No ratings yet
Lesson-Plan-Math-7-Measures of Central Tendency 20234
12 pages
Statistics For Economics
No ratings yet
Statistics For Economics
58 pages
11statistics Sheet
No ratings yet
11statistics Sheet
41 pages
Testing The Difference Between Two Means, Two Proportions, and Two Variances
No ratings yet
Testing The Difference Between Two Means, Two Proportions, and Two Variances
39 pages
DLL COT v.2.0
No ratings yet
DLL COT v.2.0
4 pages
BBA Stats & Regression Guide
No ratings yet
BBA Stats & Regression Guide
10 pages
BSSEII
No ratings yet
BSSEII
12 pages
L&T Construction: Practice Aptitude Questions With Answer Key
100% (10)
L&T Construction: Practice Aptitude Questions With Answer Key
6 pages
6.2 What Is A Random Walk
No ratings yet
6.2 What Is A Random Walk
70 pages
Python Data Cleaning Guide
No ratings yet
Python Data Cleaning Guide
4 pages
Detailed Lesson Plan in Mathematics 7
100% (1)
Detailed Lesson Plan in Mathematics 7
11 pages
Probability and Statistics Homework Solutions
No ratings yet
Probability and Statistics Homework Solutions
20 pages
Basic Concepts 2
No ratings yet
Basic Concepts 2
18 pages
6.5 - The Central Limit Theorem: Objectives
No ratings yet
6.5 - The Central Limit Theorem: Objectives
6 pages
MATH 505 Midterm: Probabilities & Statistics
No ratings yet
MATH 505 Midterm: Probabilities & Statistics
2 pages
Gold Price Trend Analysis India
No ratings yet
Gold Price Trend Analysis India
42 pages
BISF Uniform Random Number Simulation
No ratings yet
BISF Uniform Random Number Simulation
6 pages
Applied Statistics Ebook
No ratings yet
Applied Statistics Ebook
178 pages
Measures of Skewness and Kurtosis
No ratings yet
Measures of Skewness and Kurtosis
19 pages
Class 7 - Math - Sample Paper Term 1
No ratings yet
Class 7 - Math - Sample Paper Term 1
4 pages
8614 Quiz File
No ratings yet
8614 Quiz File
68 pages
W s-10 Statistics
No ratings yet
W s-10 Statistics
2 pages
1st Sem Feb 2013
No ratings yet
1st Sem Feb 2013
17 pages
UP Statistics Lecture
100% (1)
UP Statistics Lecture
102 pages
Quality Control for Contractors
No ratings yet
Quality Control for Contractors
31 pages
Stats for Students: Estimation Basics
No ratings yet
Stats for Students: Estimation Basics
55 pages

Data Management

Uploaded by

Data Management

Uploaded by

​ odule 4.1.

​Type of variable (based​ ​Best measure​

​interval/ratio (not skewed)​ ​mean​

​interval/ratio (skewed)​ ​median​

​ordinal​ ​Top 5 olympic medalist​

​interval​ ​Temp in celsius​

​Size of Correlation​ ​Interpretation​

​.90 to 1 (-0.90 to -1)​ ​ ery high positive​

​.70 to .90 (-.70 to -.90)​ ​ igh positive (negative)​

​.50 to .70 (-.50 to -.70)​ ​ oderate positive​

​Formula for regression: y = a + bx​

​Null Hypothesis (H₀)​ ​ ype I and Type II Errors​

You might also like

odule 4.1.

Type of variable (based Best measure

interval/ratio (not skewed) mean

interval/ratio (skewed) median

ordinal Top 5 olympic medalist

interval Temp in celsius

Size of Correlation Interpretation

.90 to 1 (-0.90 to -1) ery high positive

.70 to .90 (-.70 to -.90) igh positive (negative)

.50 to .70 (-.50 to -.70) oderate positive

Formula for regression: y = a + bx

Null Hypothesis (H₀) ype I and Type II Errors