0% found this document useful (0 votes)
11 views30 pages

Maths

Maths

Uploaded by

koraseeka midhun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views30 pages

Maths

Maths

Uploaded by

koraseeka midhun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

MACHINE LEARNING STATISTICS

What is Statistics?
Statistics is the study of the collection, analysis, interpretation, presentation, and
organization of data. OR Statistics is the science of analysing data.. Father of Statistics is Sir
Ronald Aylmer Fisher.

What are Statistics Used for?


 Statistics is used in all kinds of science and business applications.
 Statistics gives us more accurate knowledge which helps us make better decisions.
 Statistics can focus on making predictions about what will happen in the future. It can
also focus on explaining how different things are connected.

NOTE:
Good statistical explanations are also useful for predictions

Typical Steps of Statistical Methods


 Gathering data
 Describing and visualizing data
 Making conclusions
How is Statistics Used?
Statistics can be used to explain things in a precise way. You can use it to understand and
make conclusions about the group that you want to know more about. This group is called
the population.
A population could be many different kinds of groups. It could be:
 All of the people in a country
 All the businesses in an industry
 All the customers of a business
 All people that play football who are older than 45
Statistics - Data Types
There are two main types of data: Qualitative (or 'categorical') and quantitative (or
'numerical'). These main types also have different sub-types depending on their
measurement level.
Qualitative Data
Information about something that can be sorted into different categories that can't be
described directly by numbers.
Examples:
Brands, Nationality, Professions

Quantitative Data
Information about something that is described by numbers.
Examples:
Income, Age, Height

Types of Statistics
There are 2 types of statistics:
 Descriptive Statistics
 Inferential Statistics
Descriptive Statistics
The information (data) from your sample or population can be visualized with graphs or
summarized by numbers. This will show key information in a simpler way than just looking at
raw data. It can help us understand how the data is distributed. Graphs can visually show
the data distribution.
Examples of graphs include:
 Histograms
 Pie charts
 Bar graphs
 Box plots

Descriptive Statistics is broken down into Tendency and Variability.


Tendency is about Centre Measures:
 The Mean (the average value)
 The Median (the mid point value)
 The Mode (the most common value)

The Mean
x̄=∑ x/n
The Mean Value is the Average of all values.
This table contains 11 values:

To find the Mean Value: Add all values and divide by the number of values.
The Mean Value is:
(7+8+8+9+9+9+10+11+14+14+15)/11 = 10.3636363636.
The Mean is the Sum divided by the Count.

Example Coding:
<!DOCTYPE html>
<html>
<body>
<h2>JavaScript Machine Learning</h2>
<p>Calculate the mean (average) value.</p>
<div id="demo"></div>
<script>
let mean = (7+8+8+9+9+9+10+11+14+14+15)/11;
document.getElementById("demo").innerHTML = mean;
</script>
</body>
</html>

The Median: Middle value in an ordered data set


A list of speed values:
99,86,87,88,111,86,103,87,94,78,77,85,86

The Median is the value in the middle (after the values are sorted):
77,78,85,86,86,86,87,87,88,94,99,103,111
Example Coding:
<!DOCTYPE html>
<html>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjs/9.3.0/math.js"></script>
<body>
<h1>Machine Learning</h1>
<p>The Median is the mid point value:</p>
<div id="demo"></div>
<script>
const speed = [99,86,87,88,111,86,103,87,94,78,77,85,86];
let median = math.median(speed);
document.getElementById("demo").innerHTML = median;
</script>
</body>
</html>

If there are two numbers in the middle, divide the sum of them by two.
77,78,85,86,86,86,87,87,88,94,99,103
(86 + 87) / 2 = 86.5

The Mode
The Mode Value is the value that appears the most number of times:
99,86,87,88,111,86,103,87,94,78,77,85,86

Example Code:
<!DOCTYPE html>
<html>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjs/9.3.0/math.js"></script>
<body>
<h1>Machine Learning</h1>
<p>The Mode is the most common value:</p>
<div id="demo"></div>
<script>
const speed = [99,86,87,88,111,86,103,87,94,78,77,85,86];
let mode = math.mode(speed);
document.getElementById("demo").innerHTML = mode;
</script>
</body>
</html>
Mode = Term with Highest Frequency
For example: {2, 3, 4, 2, 4, 6, 4, 7, 7, 4, 2, 4}
4 is the most frequent term in this data set.
Thus, mode is 4.

STATISTIC VARIABILITY (SPREAD)


Descriptive Statistics is broken down into Tendency and Variability.
Variability uses these measures:
 Min and Max (Range)
 Variance
 Deviation (Dispersion)

1. Range
It is a given measure of how to spread apart values in a sample set or data set.
Range = Maximum value – Minimum value

The Variance
 In statistics, the Variance is the average of the squared differences from the Mean
Value.
 In other words, the variance describes how far a set of numbers is Spread Out from
the mean (average) value.
 Mean value is described already
We will first use the data set with 10 observations to give an example of how we can
calculate the variance:

NOTE: Variance is often represented by the symbol Sigma Square: σ^2

Step 1 to Calculate the Variance: Find the Mean


We want to find the variance of Average_Pulse.
1. Find the mean:
(80+85+90+95+100+105+110+115+120+125) / 10 = 102.5
The mean is 102.5
Step 2: For Each Value - Find the Difference From the Mean
Find the difference from the mean for each value:
80 - 102.5 = -22.5
85 - 102.5 = -17.5
90 - 102.5 = -12.5
95 - 102.5 = -7.5
100 - 102.5 = -2.5
105 - 102.5 = 2.5
110 - 102.5 = 7.5
115 - 102.5 = 12.5
120 - 102.5 = 17.5
125 - 102.5 = 22.5
Step 3: For Each Difference - Find the Square Value
Find the square value for each difference:
(-22.5)^2 = 506.25
(-17.5)^2 = 306.25
(-12.5)^2 = 156.25
(-7.5)^2 = 56.25
(-2.5)^2 = 6.25
2.5^2 = 6.25
7.5^2 = 56.25
12.5^2 = 156.25
17.5^2 = 306.25
22.5^2 = 506.25
NOTE: WE MUST SQUARE THE VALUES TO GET THE TOTAL SPREAD.

Step 4: The Variance is the Average Number of These Squared Values


Sum the squared values and find the average:
(506.25 + 306.25 + 156.25 + 56.25 + 6.25 + 6.25 + 56.25 + 156.25 + 306.25 + 506.25) /
10 = 206.25
The variance is 206.25.

Use Python to Find the Variance of health_data


We can use the var() function from Numpy to find the variance (remember that we now use
the first data set with 10 observations):
Example:
import pandas as pd
import numpy as np
health_data = pd.read_csv("HealthData.csv", header=0, sep=",")
var = np.var(health_data)
print(var)

Use Python to Find the Variance of Full Data Set


Here we calculate the variance for each column for the full data set:
Example:
import pandas as pd
import numpy as np
full_health_data = pd.read_csv("FHealthData.csv", header=0, sep=",")
var = np.var(full_health_data)
print(var)

Example2:
This table contains 11 values:

Example:
<html>
<body>
<h1>Machine Learning</h1>
<p>Calculate the Variance.</p>
<div id="demo"></div>
<script>
// Calulate the Mean (m)
let m = (7+8+8+9+9+9+10+11+14+14+15)/11;
// Calculate the Sum of Sqares (ss)
let ss = (7-m)**2 + (8-m)**2 + (8-m)**2 + (9-m)**2 + (9-m)**2 + (9-m)**2 + (10-
m)**2 + (11-m)**2 + (14-m)**2 + (14-m)**2 + (15-m)**2;
// Calculate the Variance
let variance = ss / 11;
// Diplay the Variance
document.getElementById("demo").innerHTML = variance;
</script>
</body>
</html>
Standard Deviation
 Standard Deviation is a measure of how spread out numbers are.
 The symbol is σ (Greek letter sigma).
 The formula is the √ variance (the square root of the variance).
 Deviation is a measure of Distance.
 How far (on average), all values are from the Mean (the Middle).
A mathematical function will have difficulties in predicting precise values, if the observations
are "spread". Standard deviation is a measure of uncertainty. A low standard deviation
means that most of the numbers are close to the mean (average) value. A high standard
deviation means that the values are spread out over a wider range.
NOTE: Standard Deviation is often represented by the symbol Sigma: σ
Example:
import pandas as pd
import numpy as np
full_health_data = pd.read_csv("FHealthData.csv", header=0, sep=",")
std = np.std(full_health_data)
print(std)

Example Code:in JS
<!DOCTYPE html>
<html>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjs/9.3.0/math.js"></script>
<body>
<h2>JavaScript Machine Learning</h2>
<p>Calculate the Standard Deviation</p>
<div id="demo"></div>
<script>
// Calculate the Standard Deviation
const values = [7,8,8,9,9,9,9,10,11,14,15];
let std = math.std(values, "uncorrected");
document.getElementById("demo").innerHTML = std;
</script>
</body>
</html>
Coefficient of Variation
The coefficient of variation is used to get an idea of how large the standard deviation is.

Mathematically, the coefficient of variation is defined as:


Coefficient of Variation = Standard Deviation / Mean
We can do this in Python if we proceed with the following code:
Example:
import pandas as pd
import numpy as np
full_health_data = pd.read_csv("FHealthData.csv", header=0, sep=",")
cv = np.std(full_health_data) / np.mean(full_health_data)
print(cv)

Outliers
Outliers are values "outside" the other values:
99,86,87,88,111,86,103,87,94,78,300,85,86
Outliers can change the mean a lot. Sometimes we don't use them (they might be an error),
or we use the median or the mode instead.

Example:
<!DOCTYPE html>
<html>
<script src="https://cdnjs.cloudflare.com/ajax/libs/mathjs/9.3.0/math.js"></script>
<body>
<h1>Machine Learning</h1>
<p>Calculate the mean (average) value.</p>
<div id="demo"></div>
<script>
const values = [99,86,87,88,111,86,103,87,94,78,300,85,86];
let mean = math.mean(values);
document.getElementById("demo").innerHTML = mean;
</script>
</body>
</html>

Inferential Statistics
Inferential statistics are methods for quantifying properties of a population from a small
Sample:
You take data from a sample and make a prediction about the whole population.
For example, you can stand in a shop and ask a sample of 100 people if they like chocolate.
From your research, using inferential statistics, you could predict that 91% of all shoppers
like chocolate.
Incredible Chocolate Facts
• Nine out of ten people love chocolate.
• 50% of the US population cannot live without chocolate every day.
• You use Inferential Statistics to predict whole domains from small samples
of data.

Hypothesis Testing
It is a method to check if a claim about a population is true. More precisely, it checks how
likely it is that a hypothesis is true is based on the sample data.
There are different types of hypothesis testing.
 A single group
 Comparing one group to another
 Comparing the same group before and after a change

Examples of claims or questions that can be checked with hypothesis testing:


 90% of Australians are left handed
 Is the average weight of dogs more than 40kg?
 Do doctors make more money than lawyers?
Types of Inferential Statistics
Various types of inferential statistics are used widely nowadays and are very easy to
interpret. These are given below:
 One sample test of difference/One sample hypothesis test
 Confidence Interval
 Contingency Tables and Chi-Square Statistic
 T-test or Anova
 Pearson Correlation
 Bivariate Regression
 Multi-variate Regression
NUMBER THEORY
Number theory is a branch of mathematics that deals with the properties and relationships of
numbers. Number theory, originating in ancient Mesopotamia circa 1800 BC with Plimpton
322’s discovery, encompasses Pythagorean triplets and Babylonian astronomy.
Number System
The number system is a system for representing numbers on the Number Line in Number
Theory using a collection of symbols and rules.

Types of Numbers:
PROBABILITY
Probability means possibility. It is a branch of mathematics that deals with the occurrence
of a random event. The value is expressed from zero to one. Father of probability is Blaise
Pascal. Probability theory starts with basic concepts such as random experiments, sample
spaces, events, and the probability of events.

Terms in Probability

Probability of an Event
The probability of an event is a measure of the likelihood that the event will occur, expressed
as a number between 0 and 1. An event with a probability of 1 is considered certain to
happen, while an event with a probability of 0 is certain not to happen.
 Probability is about how Likely something is to occur, or how likely something is true.
 The mathematic probability is a Number between 0 and 1.
 0 indicates Impossibility and 1 indicates Certainty.
Different types of events in probability.
Equally Likely Events
Equally likely events are those whose chances or probabilities of happening are equal. Both
events are not related to one another. For example, there are equal possibilities of receiving
either a head or a tail when we flip a coin.
Exhaustive Events
We call an event exhaustive when the set of all experiment results is the same as the sample
space.
Mutually Exclusive Events
Events that are mutually exclusive cannot occur at the same time. For instance, the weather
may be hot or chilly simultaneously. We can’t have the same weather at the same time.

Probability Formula
P(E) = Number of favourable outcomes / Total number of outcomes
Note: where P(E) denotes the probability of an event E.
Probability Tree Diagram
A tree diagram in probability is a graphic representation that helps us in determining the
likely outcomes that is whether an event will occur or not. It helps us understand the all
possibilities of an event and which possibilities can occur and cannot occur.

Throwing Dices
When throwing a dice, there are 6 possible outcomes:

The possibility of throwing 3 fours at the same time is


(1/6)3 (Lands on 4 to the power of 3):

Example:
<!DOCTYPE html>
<html>
<body>
<h1>Machine Learning</h1>
<p>The possibility of throwing 3 fours at the same time is:</p>
<div id="demo"></div>
<script>
let p = Math.pow(1/6, 3);
document.getElementById("demo").innerHTML = p;
</script>
</body>
</html>
The possibility of throwing 3 likes at the same time is 6 times larger:
(lands on 1) + (Lands on 2) + ... + (Lands on 6)

Example:
<!DOCTYPE html>
<html>
<body>
<h1>Machine Learning</h1>
<p>The possibility of throwing 3 equal dices is:</p>
<div id="demo"></div>
<script>
let p = Math.pow(1/6, 3) * 6;
document.getElementById("demo").innerHTML = p;
</script>
</body>
</html>

6 Balls

 I have 6 balls in a bag: 3 reds, 2 are green, and 1 is blue.


 Blindfolded. What is the probability that I pick a green one?
 Number of Ways it can happen are 2 (there are 2 greens).
 Number of Outcomes are 6 (there are 6 balls).
 Probability = Ways / Outcomes
 The probability that I pick a green one is 2 out of 6: 2/6 = 0.333333.
 The probability is written P(green) = 0.333333.
Choosing a King

 The probability of choosing a king in a deck of cards is 4 in 52.


 Number of Ways it can happen are 4 (there are 4 kings).
 Number of Outcomes are 52 (there are 52 cards).
 Probability = Ways / Outcomes
 The probability is 4 out of 52: 4/52 = 0.076923.
 The probability is written P(king) = 0.076923.
Types of Probability
1. Theoretical Probability: It is focused on the likelihood of anything occurring.
2. Experimental Probability: It is founded on the results of an experiment.
3. Axiomatic Probability: A collection of laws or axioms that apply to all forms is
established in axiomatic probability.
Probability Theorems

Bayes’ Theorem on Conditional Probability


Bayes theorem (also known as the Bayes Rule or Bayes Law) is used to determine the
conditional probability of event A when event B has already occurred.
Bayes’ Theorem defines the probability of an event based on the condition of occurrence of
other events. It is also called conditional probability.
For example, let us assume you have two bags of marbles in various colours. Bag A has
three red and two blue marbles, while bag B has one red and four blue marbles. You pick
one of the bags at random, then you pick a marble at random from that bag. Given that you
selected a red marble, what is the probability that you selected bag A? Such probability is
called conditional probability.
Bayes Theorem Formula
P(A|B) = P(B|A)⋅P(A) P(B)
Where:
P(A|B) denotes how often event A happens on a condition that B happens.
P(B|A) denotes how often event B happens on a condition that A happens.
P(A) is the probability of event A occurring.
P(B) is the probability of event B occur.

1. Probability of Tossing Coin


Now let us take into account the case of coin tossing to understand probability in a better
way.
A) Tossing a Coin
A single coin when flipped has two possible outcomes, a head or a tail. The definition of
probability when applied here to find the probability of getting a head or getting a tail.
The total number of possible outcomes = 2

Sample Space = {H, T} H: Head, T: Tail


P(H) = Number of Heads/ Total Number of outcomes = 1/2
P(T) = Number of Tails/ Total Number of outcomes = 1/2

B) Tossing Two Coins


There are a total of four possible results when tossing two coins. We can calculate the
probability of two heads or two tails using the formula.
The probability of getting two tails can be calculated as :

Total number of outcomes = 4


Sample Space = {(H,H),(H,T),(T,H),(T,T)}
P(2T) = P(0H) = Number of outcomes with two tails/ Total number of outcomes = 1/4
P(1H) = P(1T) = Number of outcomes with one head/ Total number of outcomes = 2/4 =
1/2

C ) Probability of Tossing Three Coins


The number of total outcomes on tossing three coins is 23 = 8. For these outcomes, we can
find the probability of various cases such as getting one tail, two tails, three tails, and no
tails, and similarly can be calculated for several heads.
Total number of outcomes = 23 = 8

Sample space = {(H, H, H), (H, H, T),(H, T, H), (T, H, H), (T, T, H), (T, H, T), (H, T, T), (T, T,
T)}.
P(3T) = P(0 H) = Number of outcomes with three tails/ Total Number of outcomes = 1/8

2. Probability of Rolling Dice


Various games use dice to decide the movements of the player during the games. A dice has
six outcomes. Some games are played using two dice. Now let us calculate the outcomes,
and their probabilities for one dice and two dice respectively.

Rolling One Dice


The number of outcomes is 6 when a die is rolled and the sample space is = {1,2,3,4,5,6}.
Let us now discuss some cases.
P(Even Number) = Number of outcomes in which even number occur/Total Outcomes = 3/6
= 1/2
P(Prime Number) = Number of prime number outcomes/ Total Outcomes = 3/6 = 1/2

Rolling Two Dice


The number of total outcomes, when two dice are rolled, is 62 = 36.
Let us check a few cases in the above example,
Probability of getting a doublet(Same number) = 6/36 = 1/6
Probability of getting a number 3 on at least one dice = 11/36

3. Probability of Cards
Spades, clubs, diamonds, and hearts make up the four suits that form a deck of 52 playing
cards. There are a total of 52 cards, with 13 in each of the four suits (clubs, diamonds,
hearts, and spades). The symbols for the cards are listed below.
Probability Examples and Solutions
We have provided you with some probability problems with their solutions.
Problem 1. There are 8 balls in a container, 4 are red, 1 is yellow and 3 are blue. What is the
probability of picking a yellow ball?

Solution:
The probability is equal to the number of yellow balls in the container divided by the total
number of balls in the container, i.e. 1/8.

Problem 2: A dice is rolled. What is the probability that an even number has been obtained?
Solution:
When fair six-sided dice are rolled, there are six possible outcomes: 1, 2, 3, 4, 5, or 6.
Out of these, half are even (2, 4, 6) and half are odd (1, 3, 5). Therefore, the probability of
getting an even number is:
P(even) = number of even outcomes / total number of outcomes
P(even) = 3 / 6
P(even) = ½

Problem 3. A bag contains 4 white, 5 red, and 6 blue balls. Three balls are drawn
at random from the bag. The probability that all of them are red, is:
Solution:
Let S be the sample space.
Then, n(S) = Number of ways of drawing 3 balls out of 15
= 15C3 = 455
Let E = event of getting all the 3 red balls.
n(E) = 5C3 = 10
P(E) = n(E)/n(S) = 10/455 = 2/91.

Problem 4. In a class there are 10 girls and 15 boys, what is the probability that 1
girl and 2 boys are selected?

Solution:
Let S be the sample space.
Then, n(S)= Number of ways of selecting 3 children out of 25
= 25C3
= 2300.
Let E= event of selecting 1 girl and 2 boys.
n(E) = 10C1*15C2 = 1050
P(E) = n(E)/n(S) = 1050/2300 = 21/46.

What is Probability of Impossible Event?


The probability of an impossible event is zero.
What is Probability Density Function?
A Probability Density Function (PDF) represents the likelihood of a continuous random
variable falling within a particular range of values. It is a statistical concept used in
probability theory and statistics.
What is Probability Mass Function?
A Probability Mass Function (PMF) represents the probability distribution of a discrete random
variable, which can take on a finite or countably infinite number of possible values.

CALCULUS
What is Calculus?
Calculus, a branch of mathematics that deals with the study of rate of change. It was
founded by Newton and Leibniz.
Calculus math is commonly used in mathematical simulations to find the best solutions. It
focuses on core ideas like limits, functions, integration, differentiation, and so on.
Calculus mathematics is classified into two parts:
 Differential Calculus: used to determine the rate of change
 Integral Calculus: used to find quantity based on known rates of change.

Differential Calculus
 Differential calculus is used to solve the problem of calculating the rate at which a
function changes in relation to other variables.
 To obtain the optimal answer, derivatives are utilized to determine a function’s maxima
and minima values.
 It primarily handles variables like x and y, functions like f(x), and the variations in x
and y that follow.
 dy and dx are used to symbolize differentials.
 The process of differentiating allows us to compute derivatives. The derivative of a
function is given by dy/dx or f’ (x).
1. Limits
2. Derivatives

Integral Calculus
The study of integrals and their properties is known as integral calculus. It is primarily useful
for:
 To compute f from f’ (i.e. from its derivative). If a function f is differentiable in the
range under consideration, then f’ is specified in that range.
 To determine the region under a curve.

Integration
Integration is exactly the opposite of differentiation. Differentiation is the partition of a
portion into a number of smaller parts, and integration is gathering tiny parts to create a
whole. It is frequently applied to area calculations.
Definite Integral
A definite integral has a specified boundary beyond which the equation must be computed.
The lower and upper limits of a function’s independent variable are defined, and its
integration is represented using definite integrals.

Indefinite Integral
An infinite integral lack a fixed boundary, i.e. there is no upper and lower limit. As a result,
the integration value is always followed by a constant value.

 Tangents and Normal


 Equation of Tangents and Normal
 Absolute Minima and Maxima
 Relative Minima and Maxima
 Concave Function

You might also like