0% found this document useful (0 votes)
122 views27 pages

Fds Unit FINAL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
122 views27 pages

Fds Unit FINAL

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AD3491 FUNDAMENTALS OF DATA SCIENCE AND ANALYTICS

II YEAR / IV SEMESTER (B.Tech- ARTIFICIAL INTELLIGENCE AND DATA SCIENCE)


UNIT – V
PREDICTIVE ANALYTICS

PREPARED BY

S.SANTHI PRIYA, M.E., (AP/ AI&DS)

VERIFIED BY

HOD PRINCIPAL CEO/CORRESPONDENT

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SENGUNTHAR COLLEGE OF ENGINEERING ,TIRUCHENGODE-637 205.

1
UNIT V
PREDICTIVE ANALYTICS
 Linear Least Squares
 Implementation Goodness Of Fit
 Testing A Linear Model
 Weighted Resampling.
 Regression Using Stats models
 Multiple Regression
 Nonlinear Relationships
 Logistic Regression
 Estimating Parameters
 Time Series Analysis
 Moving Averages
 Missing Values
 Serial Correlation
 Autocorrelation.
 Introduction To Survival Analysis.

2
LIST OF IMPORTANT QUESTIONS

UNIT V
PREDICTIVE ANALYTICS

PART A (2 marks)

1. What is a linear least squares analysis?

2.What is the least squares of a linear regression?

3.What is the least squares formula?

4. What do you mean by goodness of fit test explain with examples?

5. Why is the goodness of fit test used?

6. How do you test a linear regression model?

7. What is meant by correlation?

8. Define the term positive correlation.

9. What is the meaning of perfect correlation?

10. What is Karl Pearson’s coefficient of correlation?

3
PART B(16 marks)

1.Explain about Least square method ?

2. Let's say we have data as shown below.

x 1 2 3 4 5

y 2 5 3 8 7

3.We will now calculate a chi-square statistic for a specific example. Suppose that we
have a simple random sample of 600 M&M candies with the following distribution:

 212 of the candies are blue.


 147 of the candies are orange.
 103 of the candies are green.
 50 of the candies are red.
 46 of the candies are yellow.
 42 of the candies are brown.

4. Calculate the chi-square value for the following data:

Male Female

Full Stop 6(observed) 6 (observed)


5.76 (expected)
6.24 (expected)

Rolling Stop 16 (observed) 15 (observed)


16.12 (expected) 14.88 (expected)

No Stop 4 (observed) 3 (observed)


3.64 (expected) 3.36 (expected)

5.Suppose that a marketing firm conducts a survey of 1,000 households to determine


the average number of TVs each household owns. The data show a large number of
households with two or three TVs and a smaller number with one or four. Every
household in the sample has at least one TV and no household has more than four.
Find the mean number of TVs per household.

4
Number of TVs per Household Number of Households

1 73

2 378

3 459

4 90

5
UNIT V
PART A(2-MARKS)

1.What is a linear least squares analysis?


In statistics and mathematics, linear least squares is an approach to fitting a
mathematical or statistical model to data in cases where the idealized value provided by the
model for any data point is expressed linearly in terms of the unknown parameters of the
model.
2.what is the least squares of a linear regression?
If the data shows a leaner relationship between two variables, the line that best fits
this linear relationship is known as a least-squares regression line, which minimizes the
vertical distance from the data points to the regression line.
3.What is the least squares formula?

Least Square Method Formula


 Suppose when we have to determine the equation of line of best fit for the given
data, then we first use the following formula.
 The equation of least square line is given by Y = a + bX.
 Normal equation for 'a':
 ∑Y = na + b∑X.
 Normal equation for 'b':
 ∑XY = a∑X + b∑X2

4. What do you mean by goodness of fit test explain with examples?


Goodness-of-fit tests are statistical methods that make inferences about
observed values. For instance, you can determine whether a sample group is truly
representative of the entire population. As such, they determine how actual values are
related to the predicted values in a model.

5. Why is the goodness of fit test used?


6
Chi-square goodness of fit test is conducted when the predictand variable in the
dataset is categorical. It is applied to determine whether sample data are consistent with a
hypothesized distribution.
6. How do you test a linear regression model?
The best way to take a look at a regression data is by plotting the predicted values
against the real values in the holdout set. In a perfect condition, we expect that the
points lie on the 45 degrees line passing through the origin (y = x is the equation). The
nearer the points to this line, the better the regression
7. What is meant by correlation?

The term correlation can be defined as the degree of interdependence between two
variables Two variables are said to be correlated when the change in the value of one
variable leads to the change in the values of the other. In other words, two variables of the
same group are said to be correlated when an increase in the value of one variable leads to
an increase in the value of the other or an increase in the value of one variable leads to the
decrease in the value of the other or the decrease in the value of one variable, leads to the
decrease in the value of the other or decrease in the value of one variable leads to an
increase in the value of the other variables.

8. Define the term positive correlation.

When an increase in the value of one variable leads to an increase in the value of
the other variable and when the decrease in the value of one variable leads to the decrease
in the value of the other variable, the correlation between the two variables is said to be a
positive correlation.

9. What is the meaning of perfect correlation?

When the change in the values of two related variables is in the same direction and
same proportion, the correlation is called perfect positive correlation. The coefficient of
correlation, in this case, is +1. On the other hand, when the value of the two related
variables changes in the same proportion but in opposite direction, the correlation is called
a perfect negative correlation. The coefficient of correlation, in this case, is -1.

10. What is Karl Pearson’s coefficient of correlation?

This method of measuring correlation was given by Karl Pearson in 1896. This
method is also known as ‘Pearson coefficient of correlation’ or ‘Product moment method of
correlation’. It is one of the most widely used mathematical methods of computing
correlation. It clearly explains the degree and direction of the relationship between two

7
variables. Karl Pearson’s coefficient is denoted easy and it is computed on the basis of
mean and standard deviation.

11. Write the three assumptions of Karl Pearson’s coefficient of correlation.

The three assumptions of Karl Pearson’s coefficient of correlation are:

 Normality.
 Cause and effect relationship.
 Linear relationship
12.Walter the two merits and demerits of Spearman’s rank difference method.

Merits: These are as follows:

 This method is easy to calculate and simple to understand as compared to Karl


Pearson’s method.
 This is the only method that can be used in calculating the coefficient of correlation.

13. Give any two merits and demerits of coefficient of concurrent deviation method.

Merits: The two merits are:

 It is the simplest of all the methods and it is also easy to understand.


 This method is specifically useful when the number of items is very large.
Demerits: The two demerits are:

 In this method, only the direction of change is studied and the magnitude of change
is completely ignored.
 It is a rough indicator of correlation.

14.What are regression lines?

The lines which give the best estimate of the value of one variable for any given
value of the other variable are called the ‘Lines of regression’ or ‘Regression lines’. In other
words, regression lines are used to predict the value of the dependent variable when the
value of the independent variable is known.

15. Write the three functions of regression lines.


8
The three functions of regression lines are:

1. They indicate the degree and direction of correlation.


2. They are useful in predicting the value of the dependent variable when the value of
the independent variable is known.
3. They are helpful in the calculation of mean value as the perpendiculars drawn at the
point where the two regression lines cut each other are the mean value of the two
variables.
16. Define the regression equation of X on Y.

The regression equation of X on Y describes the variations in the values of X for the
given changes in the values of. In other words, this equation is used for estimating or
predicting the value of X for a given value of Y. This equation is expressed as follows:

When the coefficient of correlation (Y) and standard deviation (6) are given in the question,
it can be easily calculated as

17. Define the regression equation of Y on X.

The regression equation of Y on X describes the variations in the values of Y for the
given changes in the values of X. In other words, this equation is used for estimating or
predicting the value of Y for a given value of X. This equation is expressed as follows:

When the coefficient of correlation (Y) and standard deviation (0) is given in the question, it
can be easily calculated as

18.Define the term regression coefficient.

The regression coefficient is an algebraic measure of the slope of the regression


lines. It is for this reason that it is also known as the ‘Slope coefficient. Since there are two
regression equations, there are two regression coefficients.

19. What do you mean by partial correlation?

When we study the correlation among more than two variables, but in that study, we
only consider the inter-relationship between two variables, and the third variable is
assumed to be constant, the correlation is said to be a partial correlation.

20.What is weighted resampling?


9
In weighted sampling, each element is given a weight, where the probability of
an element being selected is based on its weight. In their work Efraimidis and Spirakis
presented an algorithm for weighted sampling without replacement.
21.What is the purpose of resampling?
Resampling is a methodology of economically using a data sample to improve the
accuracy and quantify the uncertainty of a population parameter
22.What are resampling techniques?
Resampling is a series of techniques used in statistics to gather more
information about a sample. This can include retaking a sample or estimating its
accuracy. With these additional techniques, resampling often improves the overall accuracy
and estimates any uncertainty within a population.
23.What are the two types of resampling?
There are four main types of resampling methods: randomization, Monte Carlo,
bootstrap, and jackknife. These methods can be used to build the distribution of a statistic
based on our data, which can then be used to generate confidence intervals on a
parameter estimate.
24.What is the difference between resampling and bootstrapping?
Bootstrapping is any test or metric that uses random sampling with
replacement (e.g. mimicking the sampling process), and falls under the broader class
of resampling methods. Bootstrapping assigns measures of accuracy (bias, variance,
confidence intervals, prediction error, etc.) to sample estimates.
25.What are some real life examples of multiple regression?
For example, scientists might use different amounts of fertilizer and water on
different fields and see how it affects crop yield. They might fit a multiple linear
regression model using fertilizer and water as the predictor variables and crop yield as the
response variable.
26.What is regression give a real life example?
Formulating a regression analysis helps you predict the effects of the independent
variable on the dependent one. Example: we can say that age and height can be
described using a linear regression model. Since a person's height increases as age
increases, they have a linear relationship.

27. How is multiple regression used?

10
You can use multiple linear regression when you want to know: How strong the
relationship is between two or more independent variables and one dependent
variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).
28.comparison Between Correlation and Regression

Basis Correlation Regression

A statistical measure
Describes how an
that defines co-
independent variable is
Meaning relationship or
associated with the
association of two
dependent variable.
variables.

Dependent and Both variables are


No difference
Independent variables different.

To fit the best line and


To describe a linear
estimate one variable
Usage relationship between
based on another
two variables.
variable.

To find a value To estimate values of a


expressing the random variable based
Objective
relationship between on the values of a fixed
variables. variable.

29. Write Correlation Coefficient Formula

Let X and Y be the two random variables.

The population correlation coefficient for X and Y is given by the formula:

Where,

ρXY = Population correlation coefficient between X and Y


μX = Mean of the variable X
μY = Mean of the variable Y
11
σX = Standard deviation of X
σY = Standard deviation of Y
E = Expected value operator
Cov = Covariance

The above formulas can also be written as:

The sample correlation coefficient formula is:

The above formulas are used to find the correlation coefficient for the given data. Based on
the value obtained through these formulas, we can determine how strong is the association
between two variables.

30.Definition: logistic regression


A function that models the exponential growth of a population but also considers
factors like the carrying capacity of land and so on is called the logistic function. It should be
remembered that the logistic function has an inflection point.

31. Write Logistic Function Equation

The standard logistic function is a logistic function with parameters k = 1, x 0 = 0, L = 1.

This reduces the logistic function as below:

32. What is Time Series Analysis?


In order to evaluate the performance of a company, its past can be compared with
the present data. When comparisons of past and present data are done, the process is
known as Time Series Analysis. Time series are stretched over a period of time rather than
being confined to a shorter time period. Time series analysis draws its important because it
can help predict the future. Depending on the past and future trends, time series are able
to predict the future.
12
33.Write Components of Time Series Analysis
The reasons or forces that change the attributes of a time series are known as the
Components of Time Series.

The following are the components of time series −

 Trend
 Seasonal Variations
 Cyclical Variations
 Random or Irregular Movements
34. What are parameters used to estimate?

Figure 1. parameter estimates


Parameters are descriptive measures of an entire population. However, their
values are usually unknown because it is infeasible to measure an entire population.
Because of this, you can take a random sample from the population to obtain parameter
estimates.
35. What are the 4 major moving averages?
The most popular simple moving averages include the 10, 20, 50, 100 and 200.
Traders often use the smaller, faster moving averages as entry triggers and the longer,
slower moving averages as clear trend filters.
36. Is autocorrelation the same as serial correlation?
Autocorrelation, also known as serial correlation, refers to the degree of
correlation of the same variables between two successive time intervals. The value of
autocorrelation ranges from -1 to 1. A value between -1 and 0 represents negative
autocorrelation. A value between 0 and 1 represents positive autocorrelation.

37. What is missing values in data science?


Missing data, or missing values, occur when you don't have data stored for
certain variables or participants. Data can go missing due to incomplete data entry,

13
equipment malfunctions, lost files, and many other reasons. In any dataset, there are
usually some missing data.

38.How do you calculate least squares?

Let us assume that the given points of data are (x_1, y_1), (x_2, y_2), …, (x_n, y_n)
in which all x’s are independent variables, while all y’s are dependent ones. Also, suppose
that f(x) be the fitting curve and d represents error or deviation from each given point.
The least-squares explain that the curve that best fits is represented by the property that
the sum of squares of all the deviations from given values must be minimum.

39.How many methods are available for the Least Square?

o There are two primary categories of least-squares method problems:


Ordinary or linear least squares
Nonlinear least squares

PART B (16- MARKS)

14
1.Explain about Least square method ?

Least Square Method

Least square method is the process of finding a regression line or best-fitted line for
any data set that is described by an equation. This method requires reducing the sum of
the squares of the residual parts of the points from the curve or line and the trend of
outcomes is found quantitatively. The method of curve fitting is seen while regression
analysis and the fitting equations to derive the curve is the least square method.

Let us look at a simple example, Ms. Dolma said in the class "Hey students who
spend more time on their assignments are getting better grades". A student wants to
estimate his grade for spending 2.3 hours on an assignment. Through the magic of the
least-squares method, it is possible to determine the predictive model that will help him
estimate the grades far more accurately. This method is much simpler because it requires
nothing more than some data and maybe a calculator.

In this section, we’re going to explore least squares, understand what it means, learn
the general formula, steps to plot it on a graph, know what are its limitations, and see what
tricks we can use with least squares.

Least Square Method Definition

The least-squares method is a statistical method used to find the line of best fit of the
form of an equation such as y = mx + b to the given data. The curve of the equation is
called the regression line. Our main objective in this method is to reduce the sum of the
squares of errors as much as possible. This is the reason this method is called the least-
squares method. This method is often used in data fitting where the best fit result is
assumed to reduce the sum of squared errors that is considered to be the difference
between the observed values and corresponding fitted value. The sum of squared errors
helps in finding the variation in observed data. For example, we have 4 data points and
using this method we arrive at the following graph.

15
Figure 2 : Least Square Method

The two basic categories of least-square problems are ordinary or linear least squares and
nonlinear least squares.

Limitations for Least Square Method

Even though the least-squares method is considered the best method to find the line of
best fit, it has a few limitations. They are:

 This method exhibits only the relationship between the two variables. All other causes
and effects are not taken into consideration.
 This method is unreliable when data is not evenly distributed.
 This method is very sensitive to outliers. In fact, this can skew the results of the least-
squares analysis.

16
Least Square Method Graph

Look at the graph below, the straight line shows the potential relationship between
the independent variable and the dependent variable. The ultimate goal of this method is to
reduce this difference between the observed response and the response predicted by the
regression line. Less residual means that the model fits better. The data points need to be
minimized by the method of reducing residuals of each point from the line. There are
vertical residuals and perpendicular residuals. Vertical is mostly used in polynomials and
hyperplane problems while perpendicular is used in general as seen in the image below.

Figure 3: Least Square Method graph

Least Square Method Formula

Least-square method is the curve that best fits a set of observations with a minimum
sum of squared residuals or errors. Let us assume that the given points of data are (x 1, y1),
(x2, y2), (x3, y3), …, (xn, yn) in which all x’s are independent variables, while all y’s are
dependent ones. This method is used to find a linear line of the form y = mx + b, where y
and x are variables, m is the slope, and b is the y-intercept. The formula to calculate slope
m and the value of b is given by:
m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

17
b = (∑y - m∑x)/n

Here, n is the number of data points.

Following are the steps to calculate the least square using the above formulas.

 Step 1: Draw a table with 4 columns where the first two columns are for x and y points.
 Step 2: In the next two columns, find xy and (x)2.
 Step 3: Find ∑x, ∑y, ∑xy, and ∑(x)2.
 Step 4: Find the value of slope m using the above formula.
 Step 5: Calculate the value of b using the above formula.
 Step 6: Substitute the value of m and b in the equation y = mx + b

2. Let's say we have data as shown below.

x 1 2 3 4 5

y 2 5 3 8 7

Solution: We will follow the steps to find the linear line.

x Y xy x2

1 2 2 1

2 5 10 4

3 3 9 9

4 8 32 16

5 7 35 25

∑x =15 ∑y = 25 ∑xy = 88 ∑x2 = 55

Find the value of m by using the formula,

m = (n∑xy - ∑y∑x)/n∑x2 - (∑x)2

m = [(5×88) - (15×25)]/(5×55) - (15)2

m = (440 - 375)/(275 - 225)

m = 65/50 = 13/10
18
Find the value of b by using the formula,

b = (∑y - m∑x)/n

b = (25 - 1.3×15)/5

b = (25 - 19.5)/5

b = 5.5/5

So, the required equation of least squares is y = mx + b = 13/10x + 5.5/5.

3. We will now calculate a chi-square statistic for a specific example. Suppose that
we have a simple random sample of 600 M&M candies with the following distribution:

 212 of the candies are blue.


 147 of the candies are orange.
 103 of the candies are green.
 50 of the candies are red.
 46 of the candies are yellow.
 42 of the candies are brown.

If the null hypothesis were true, then the expected counts for each of these colors would be
(1/6) x 600 = 100. We now use this in our calculation of the chi-square statistic.

We calculate the contribution to our statistic from each of the colors. Each is of the form
(Actual – Expected)2/Expected.:

 For blue we have (212 – 100)2/100 = 125.44


 For orange we have (147 – 100)2/100 = 22.09
 For green we have (103 – 100)2/100 = 0.09
 For red we have (50 – 100)2/100 = 25
 For yellow we have (46 – 100)2/100 = 29.16
 For brown we have (42 – 100)2/100 = 33.64

We then total all of these contributions and determine that our chi-square statistic is 125.44
+ 22.09 + 0.09 + 25 +29.16 + 33.64 =235.42.

19
Degrees of Freedom

The number of degrees of freedom for a goodness of fit test is simply one less than
the number of levels of our variable. Since there were six colors, we have 6 – 1 = 5 degrees
of freedom.

4. Calculate the chi-square value for the following data:

Male Female

Full Stop 6(observed) 6 (observed)


5.76 (expected)
6.24 (expected)

Rolling Stop 16 (observed) 15 (observed)


16.12 (expected) 14.88 (expected)

No Stop 4 (observed) 3 (observed)


3.64 (expected) 3.36 (expected)
Solution:
Now calculate Chi Square using the following formula:
χ2 = ∑ (O − E)2 / E
Calculate this formula for each cell, one at a time. For example, cell #1 (Male/Full Stop):
Observed number is: 6
Expected number is: 6.24
Therefore, (6 – 6.24)2 /6.24 = 0.0092
Continue doing this for the rest of the cells, and add the final numbers for each cell together
to get the final Chi-Square number. There are 6 total cells, so at the end, you should be
adding six numbers together for your final Chi-Square number.

5. Suppose that a marketing firm conducts a survey of 1,000 households to


determine the average number of TVs each household owns. The data show a large
number of households with two or three TVs and a smaller number with one or four.
Every household in the sample has at least one TV and no household has more than
four. Find the mean number of TVs per household.

Number of TVs per Household Number of Households

20
1 73

2 378

3 459

4 90

Solution:
As many of the values in this data set are repeated multiple times, you can easily
compute the sample mean as a weighted mean. Follow these steps to calculate the
weighted arithmetic mean:

Step 1: Assign a weight to each value in the dataset:

Step 2: Compute the numerator of the weighted mean formula.


Multiply each sample by its weight and then add the products together:

Step 3: Now, compute the denominator of the weighted mean formula by adding the
weights together.

Step 4: Divide the numerator by the denominator

The mean number of TVs per household in this sample is 2.566.

21
6. Explain about Multiple Regression ?

Multiple regression analysis is a statistical technique that analyzes the relationship


between two or more variables and uses the information to estimate the value of the
dependent variables. In multiple regression, the objective is to develop a model that
describes a dependent variable y to more than one independent variable.

Multiple Regression Formula

In linear regression, there is only one independent and dependent variable involved. But, in
the case of multiple regression, there will be a set of independent variables that helps us to
explain better or predict the dependent variable y.

The multiple regression equation is given by

y = a + b 1×1+ b2×2+……+ bkxk

where x1, x2, ….xk are the k independent variables and y is the dependent variable.

Multiple regression analysis permits to control explicitly for many other


circumstances that concurrently influence the dependent variable. The objective of
regression analysis is to model the relationship between a dependent variable and one or
more independent variables. Let k represent the number of variables and denoted by x 1, x2,
x3, ……, xk. Such an equation is useful for the prediction of value for y when the values of x
are known.

Stepwise Multiple Regression

Stepwise regression is a step by step process that begins by developing a


regression model with a single predictor variable and adds and deletes predictor variable
one step at a time. Stepwise multiple regression is the method to determine a regression
equation that begins with a single independent variable and add independent variables one
by one. The stepwise multiple regression method is also known as the forward selection
method because we begin with no independent variables and add one independent variable
to the regression equation at each of the iterations. There is another method called
backwards elimination method, which begins with an entire set of variables and eliminates
one independent variable at each of the iterations.

Residual: The variations in the dependent variable explained by the regression model are
called residual or error variation. It is also known as random error or sometimes just “error”.
This is a random error due to different sampling methods.

22
Advantages of Stepwise Multiple Regression

 Only independent variables with non zero regression coefficients are included in the
regression equation.
 The changes in the multiple standard errors of estimate and the coefficient of
determination are shown.
 The stepwise multiple regression is efficient in finding the regression equation with
only significant regression coefficients.
 The steps involved in developing the regression equation are clear.

Multivariate Multiple Regression

Mostly, the statistical inference has been kept at the bivariate level. Inferential
statistical tests have also been developed for multivariate analyses, which analyses the
relation among more than two variables. Commonly used extension of correlation analysis
for multivariate inferences is multiple regression analysis. Multiple regression analysis
shows the correlation between each set of independent and dependent variables.

Multicollinearity

Multicollinearity is a term reserved to describe the case when the inter-correlation of


predictor variables is high.

Signs of Multicollinearity

 The high correlation between pairs of predictor variables.


 The magnitude or signs of regression coefficients do not make good physical sense.
 Non-significant regression coefficients on significant predictors.
 The ultimate sensitivity of magnitude or sign of regression coefficients leads to the
insertion or deletion of a predictor variable.

7. What is the difference between Linear and Nonlinear Equations?

To find the difference between the two equations, i.e. linear and nonlinear, one
should know the definitions for them. So, let us define and see the difference between
them.

Linear Equations Non-Linear Equations

It forms a straight line or represents the It does not form a straight line but forms a curve.

23
equation for the straight line

It has only one degree. Or we can also A nonlinear equation has the degree as 2 or
define it as an equation having the more than 2, but not less than 2.
maximum degree 1.

All these equations form a straight line in It forms a curve and if we increase the value of
XY plane. These lines can be extended the degree, the curvature of the graph
to any direction but in a straight form. increases.

The general representation of linear The general representation of nonlinear


equation is; equations is;
y = mx +c ax2 + by2 = c

Where x and y are the variables, m is Where x and y are the variables and a,b and c
the slope of the line and c is a constant are the constant values
value.

Examples: Examples:

 10x = 1  x2+y2 = 1
 9y + x + 2 = 0  x2 + 12xy + y2 = 0
 4y = 3x  x2+x+2 = 25
 99x + 12 = 23 y

8. The linear equation has only one variable usually and if any equation has two variables in
it, then the equation is defined as a Linear equation in two variables. For example, 5x + 2 =
1 is Linear equation in one variable. But 5x + 2y = 1 is a Linear equation in two variables.

Let us see some examples based on these concepts.

Solved Examples

Example: Solve the linear equation 3x+9 = 2x + 18.

Solution: Given, 3x+9 = 2x + 18

⇒ 3x – 2x = 18 – 9

⇒x=9

Example: Solve the nonlinear equation x+2y = 1 and x = y.

24
Solution: Given, x+2y = 1

x=y

By putting the value of x in the first equation we get,

⇒ y + 2y = 1

⇒ 3y = 1

⇒y=⅓

∴x=y=⅓

9. How many years will it take for a bacteria population to reach 9000, if its growth is
modeled by

here, t in years?

Solution:

According to the given,

Taking logarithm on both sides,

-0.12(t-20)=ln(0.111)

t = -ln(0.111)/0.12 + 20

On simplifying,

t=38.31 years

The graph for the above solution is as below:

25
Figure 4:sample graph

Logistic function vs Sigmoid function

A mathematical function which is having S-shaped curve or a sigmoid curve is called


sigmoid function. When a standard choice has been added for a sigmoid function is
considered as the logistic function. Sigmoid function has a domain of all real numbers, with
return value strictly increasing from 0 to 1 or alternatively from −1 to 1, depending on
convention.

10. What is Serial Correlation / Autocorrelation?

Serial correlation (also called Autocorrelation) is where error terms in a time series
transfer from one period to another. In other words, the error for one time period a is
correlated with the error for a subsequent time period b. For example, an underestimate for
one quarter’s profits can result in an underestimate of profits for subsequent quarters. This
can result in a myriad of problems, including:

 Inefficient Ordinary Least Squares Estimates and any forecast based on those
estimates. An efficient estimator gives you the most information about a sample;
inefficient estimators can perform well, but require much larger sample sizes to do
so.
 Exaggerated goodness of fit (for a time series with positive serial correlation and
an independent variable that grows over time).
 Standard errors that are too small (for a time series with positive serial correlation
and an independent variable that grows over time).
 T-statistics that are too large.

26
 False positives for significant regression coefficients. In other words, a regression
coefficient appears to be statistically significant when it is not.
Types of Autocorrelation

The most common form of autocorrelation is first-order serial correlation, which can
either be positive or negative.

 Positive serial correlation is where a positive error in one period carries over into a
positive error for the following period.
 Negative serial correlation is where a negative error in one period carries over into a
negative error for the following period.
Second-order serial correlation is where an error affects data two time periods later. This
can happen when your data has seasonality. Orders higher than second-order do happen,
but they are rare.

Testing for Autocorrelation

You can test for autocorrelation with:

 A plot of residuals. Plot et against t and look for clusters of successive residuals on
one side of the zero line. You can also try adding a Lowess line, as in the image
below.
 A Durbin-Watson test.
 A Lagrange Multiplier Test.
 Ljung Box Test.
 A correlogram. A pattern in the results is an indication for autocorrelation. Any values
above zero should be looked at with suspicion.
 The Moran’s I statistic, which is similar to a correlation coefficient.

27

You might also like