0% found this document useful (0 votes)

62 views9 pages

From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression

Uploaded by

ahmed.imad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views9 pages

From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression

Uploaded by

ahmed.imad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/379564737

From Equations to Predictions: Understanding the Mathematics and Machine

Learning of Multiple Linear Regression

Article in Journal of Mathematical & Computer Applications · April 2024

DOI: 10.47363/JMCA/2024(3)137

CITATION READS

1 162

2 authors:

Vesna Antoska Knights Marija Prchkovska

University "St. Kliment Ohridski" - Bitola Mother Teresa University
57 PUBLICATIONS 74 CITATIONS 2 PUBLICATIONS 1 CITATION

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Marija Prchkovska on 16 April 2024.

The user has requested enhancement of the downloaded file.

ISSN: 2754-6705

Journal of Mathematical &

Computer Applications

Research Article Open Access

From Equations to Predictions: Understanding the Mathematics

and Machine Learning of Multiple Linear Regression
Vesna Knights1*and Marıja Prchkovska2

1
University “St Kliment Ohridski”- Bitola, Faculty or Technology and Technical Science -Veles, 7000, Bitola, Republic of North Macedonia

2
Mother Teresa University, Faculty of Computer Science, Informatics, 1000,Skopje, Republic of North Macedonia

ABSTRACT
In this paper, the core concepts of multiple linear regression are explored, with a focus on its mathematical foundations and integration with machine
learning principles. The objective is to bridge the gap between theory and practical application, providing readers with a comprehensive understanding
of this versatile method and highlighting its synergy with traditional statistical approaches and modern computational methods. The paper begins by
applying multiple linear regression to predict wine quality based on physicochemical attributes, using a comprehensive dataset. The least squares method
is used to estimate regression coefficients, facilitating the construction of a predictive model. The study also encompasses the testing of assumptions
such as homoscedasticity and normality of residuals, along with the assessment of autocorrelation to ensure model robustness. To illustrate the practical
implementation of multiple linear regression, a demonstration using PyTorch, a popular deep learning framework, is provided. A linear model is defined,
and the significance of gradient descent in optimizing model parameters is elucidated. Additionally, the paper covers topics such as data preprocessing,
model evaluation, and insights into interpreting regression results.

Furthermore, the performance of linear regression is evaluated in comparison to decision trees, random forests, and support vector regression, showcasing
the versatility of this classic technique. By presenting a holistic view of multiple linear regression, emphasizing its mathematical foundations, practical
implementation, and integration with machine learning, researchers and practitioners are empowered to leverage the potential of linear regression across
various domains.

*Corresponding author
Vesna Knights, University “St Kliment Ohridski”- Bitola, Faculty or Technology and Technical Science -Veles, 7000, Bitola, Republic of North
Macedonia.

Received: March 19, 2024; Accepted: March 21, 2024, Published: April 03, 2024

Keywords: Linear Regression, Machine Learning, Mathematical or mean square error (MSE) [10-12]. OLS serves as a method to
Foundations, Model Implementation, Predictive Modeling estimate the unknown parameters of the linear regression function,
with its primary objective being the minimization of the sum of
Introductıon squared differences between the observed dependent variable and
Multiple linear regression, a foundational statistical technique, the values predicted by the linear regression function [10,11].
plays a pivotal role in modeling the intricate relationships
that exist between a dependent variable (response) and one or This paper embarks on an exploration of the intricate world of
more independent variables (predictors) [1-3]. This method multiple linear regression, aiming to bridge the chasm between
involves fitting a linear equation to observed data, enabling us to theoretical understanding and practical application. The
comprehend, quantify, and predict associations among variables. following sections delve into the mathematical foundations of
Its versatility extends across a multitude of domains, including this method, in alignment with the insights presented by Kutner,
economics, marketing, and scientific research, where it serves as Nachtsheim, Neter, and Li [13]. The discussion extends further,
an invaluable tool for making predictions and unraveling intricate encompassing the synergistic relationship between traditional
variable connections [4-6]. statistical approaches and contemporary computational methods.
Our journey begins with the practical application of multiple
At its core, multiple linear regression is a supervised learning linear regression to predict wine quality based on physicochemical
algorithm. It's particularly adept at handling continuous real- attributes, employing an extensive dataset [14]. Leveraging
numbered target variables [7-9]. This method establishes the least squares method, we estimate regression coefficients,
relationships between the dependent variable, denoted as 'y,' and paving the way for the construction of a predictive model [15].
one or more independent variables, collectively represented as Assumptions, such as homoscedasticity and normality of residuals,
'x,' through the creation of a best-fit line. This process operates are rigorously tested. Additionally, we assess autocorrelation,
under the fundamental principle of ordinary least squares (OLS) ensuring the robustness of our model.

J Mathe & Comp Appli, 2024 Volume 3(2): 1-8

Citation: Vesna Knights, Marıja Prchkovska (2024) From Equations to Predictions: Understanding the Mathematics and Machine Learning of Multiple Linear
Regression. Journal of Mathematical & Computer Applications. SRC/JMCA-168. DOI: doi.org/10.47363/JMCA/2024(3)137

On the practical implementation of multiple linear regression, Loss Function: Multiple Linear Regression employs a loss
this paper provides a hands-on demonstration using PyTorch, function that measures the squared differences between the
a well-regarded deep learning framework [16-18]. Within this observed and predicted values. The ultimate goal is to minimize
context, a linear model is defined, emphasizing the critical role of the sum of squared residuals.
gradient descent in optimizing model parameters [18]. Subsequent
sections of the paper delve into essential topics such as data Assumptions
preprocessing, model evaluation, and insightful approaches for Multiple Linear Regression assumes that the errors (residuals) are
interpreting regression results [19]. normally distributed with constant variance (homoscedasticity)
and does not require a specific probabilistic model for the errors.
Furthermore, this study broadens its scope by evaluating the
performance of linear regression against other contemporary Linear Regression Model
machine learning techniques, including decision trees, random In simple linear regression, with one independent variable (X) and
forests, and support vector regression [17,20,21]. This comparative one dependent variable (Y), the model is defined as:
analysis underscores the enduring adaptability of this time-honored Y = β0 + βiX
method within the domain of predictive modeling. By offering For Multiple Linear Regression, where there are multiple
a comprehensive perspective on multiple linear regression, independent variables (x1, x2, ..., xp), the model is represented as:
emphasizing its mathematical foundations, practical applications, Y(yi)=β0+β1x1+β2x2+...+βpxp
and integration with modern machine learning, this work aims to where Y(yi) presents the observed value
empower researchers and practitioners, equipping them to leverage
the substantial potential of linear regression across various fields İn order to make predictions, the model is expressed as:
[22]. Y` = β0 + β1X1 + β2X2 + ... + βpXp + ε
Y` represents the predicted value of the dependent variable Y for
Materıal and Methods a given set of independent variables.
Material β0 is the y-intercept, representing the expected value of Y when
For the purpose of this study, a database from Cortez et al. (2009) all independent variables are 0.
was utilized [14]. The dataset includes the following attribute β1, β2, ..., βp are the coefficients (slopes) for the independent
information: variables.
Input variables (based on physicochemical tests): ε (Error or Residual) is the difference between the actual observed
Input variables (based on physicochemical tests) value (Y(yi)) and the predicted value (Y’). Matematically:
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3) ε = yi-ŷi
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3) The primary objective of linear regression is to determine the
5 - chlorides (sodium chloride - g / dm^3 coefficients that minimize the sum of squared errors (SSE) and
6 - free sulfur dioxide (mg / dm^3) provide an accurate model for predicting the target variable based
7 - total sulfur dioxide (mg / dm^3) on the input features. This is achieved through methods like the
8 - density (g / cm^3) least squares approach, optimizing the coefficients to create a
9 - pH predictive model.
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume) In the context of machine learning, this approach allows us to find
Output variable (based on sensory data): the best-fitting linear model that captures the relationship between
12 - quality (score between 0 and 10) the independent variables and the dependent variable, facilitating
accurate predictions on new, unseen data.
Methods
The Collection of the Data Results
The data for this study were obtained from the dataset provided The dataset comprises m = 1599 examples and n = 11 independent
by Cortez et al. in 2009 [14]. Modeling wine preferences by variables (Table 1). The target variable, 'quality,' falls within a
data mining from physicochemical properties. Decision Support range of 0 to 10, while the remaining eleven variables represent
Systems, 47(4), 547-553]. The dataset contains information on various physicochemical attributes. Given the presence of multiple
physicochemical attributes of wine, making it suitable for the independent variables, we are tasked with fitting a multiple linear
analysis and implementation of multiple linear regression. regression model.

Statistical Analysis The equation for multiple linear regression can be expressed as:
The statistical analysis in this study primarily involves the
implementation of Multiple Linear Regression. (Y(yi)) = β0 + β1 * fixed acidity + β2 * volatile acidity + β3 *
citric acid + β4 * residual sugar + β5 * chlorides + β6 * free sulfur
Implementation of Multiple Linear Regression dioxide + β7 * total sulfur dioxide + β8 * density + β9 * pH + β10
Objective: The objective of Multiple Linear Regression is to find * sulphates + β11 * alcohol (1)
the estimates of the regression coefficients (β0, β1, β2, ..., βp) that
minimize the sum of the squared differences between the observed
values (y) and the values predicted by the linear regression model.

J Mathe & Comp Appli, 2024 Volume 3(2): 2-8

Table 1: The Dataset of Wine

x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 Y(yi)
fixed_ volatile_ citric_ Residual chlorides free_ total_ density pH sulphates alco-hol quality
acidity acidity acid _sugar sulfur_ sulfur_
dioxide dioxide
y0 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
y1 7.8 0.880 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
y2 7.8 0.760 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 6
… … … … … … … … … … … …
y1599 7.4 0.700 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 6

Before making predictions with linear regression, it's essential to estimate the coefficients β0 and βi from the available data. The
estimation of βj, representing the coefficients, can be calculated using the following formula:

(2) Residual Sugar 0.0163 0.276

Where, xij is the value of the j-th feature for the i-th data point Chlorides -1.8742 0.000
(e.g., fixed acidity, volatile acidity, citric acid, etc.). Free Sulfur Dioxide 0.0044 0.045
x̅ j is the mean of the j-th feature across all data points. Total Sulfur Dioxide -0.0033 0.000
Density -17.8812 0.409
Y is the mean of the dependent variable (quality) across all data
points. pH -0.4137 0.031
Sulphates 0.9163 0.000
The intercept term (β0) can be computed as: Alcohol 0.2762 0.000
Intercept
These coefficients represent the estimated associations between
(3) each independent variable and the dependent variable, quality. For
instance, the coefficient for volatile acidity (-1.0836) indicates
that an increase in volatile acidity is correlated with a decrease
Instead of performing complex calculations manually using the in wine quality. Conversely, the coefficient for alcohol (0.2762)
given formulas to estimate the coefficients, (β0, β1, β2, β3, ..., suggests that a higher alcohol content tends to be associated with
β11) we leveraged machine learning techniques and libraries to higher wine quality.
automate this process. The coefficients were computed using the
following code: This comprehensive analysis contributes valuable insights into
the collective impact of these physicochemical attributes on wine
Table 2: Code for Computed Coefficients
Code The next step is preparing data for a machine-learning model by
import statsmodels.formula.api as smf performing:
# Update the formula to encompass the relevant variables • Separating the features (X) and the target variable (y- quality).
formula = "quality ~ fixed_acidity + volatile_acidity + citric_acid • Standardizing the features using `StandardScaler`, by
+ residual_sugar + chlorides + free_sulfur_dioxide + total_sulfur_ performing the following transformations on each feature:
dioxide + density + pH + sulphates + alcohol" It calculates the mean (μ) and standard deviation (σ) of each
# Fit the regression model feature in the training data.
est = smf.ols(formula=formula, data=data).fit() • For each feature, it subtracts the mean (μ) and then divides
# Display the summary of the regression analysis
by the standard deviation (σ):
print(est.summary())
X standardized = (X−μ)/ σ
By utilizing this approach, we achieved a more efficient and
automated means of estimating the coefficients, allowing us to
Where: X is the original feature value, Xstandardized is the standardized
focus on the interpretation and insights drawn from the results
feature value.
Splitting the data into training and testing sets using train_test_split
The results of the multiple linear regression analysis are
(X, y, random_state = 0, test_size=0.25).
summarized in the following table:
Once these coefficients have been calculated, they can be used to
Table 3: The Results of The Multiple Linear Regression
make predictions for new data points by plugging in the values
Analysis
of the independent variables into the linear regression equation.
Variable Coefficient P-value
Intercept 21.9652 0.300 ŷi - Predicted values based on the linear model
Fixed Acidity 0.0250 0.336
ŷi = β0 + βjX +ei (4)
Volatile Acidity -1.0836 0.000
Citric Acid -0.1826 0.215

J Mathe & Comp Appli, 2024 Volume 3(2): 3-8

The error term (e) is known as a residual, represents the difference Table 5: Code for Testing Heteroscedasticity
between the actual observed values (yi) and the predicted values Code
(ŷi ) for each data point (i).
residuals = y_train. values-y_pred
mean_residuals = np. mean(residuals)
Table 4: Code for Calculated Residuals print ("Mean of Residuals {}”. format(mean_residuals))
Code Mean of Residuals 1.2741174994864182e-16
residuals = y_train. values-y_pred
mean_residuals = np. mean(residuals) In statistical analysis, the Goldfeld-Quandt test is commonly
print ("Mean of Residuals {}”. format(mean_residuals)) employed to assess homoscedasticity, a concept denoting the
Mean of Residuals 1.2741174994864182e-16 assumption that the variance of errors (residuals) in a regression
model remains consistent irrespective of the levels of independent
Residuals are calculated by subtracting the predicted values (y_ variables. Homoscedasticity holds significance in regression
pred) from the actual values (y_train). These residuals represent analysis as it signifies that the model's errors exhibit uniform
the differences between the observed (actual) values and the variability, thereby contributing to the reliability of the model's
values predicted by linear regression model for each data point performance.
in your training dataset.
When interpreting the Goldfeld-Quandt test results, the pivotal
mean_residuals calculates the mean (average) of the residuals. element is the p-value. In the context of the obtained p-value in
wine analysis (0.9197664304253765), it signifies the following
The output tha is provided, "Mean of Residuals hypotheses:
1.2741174994864182e-16," indicates that the mean of the
residuals is extremely close to zero but not exactly zero. The Null Hypothesis (H0): The error terms exhibit homoscedasticity,
value is approximately, which is a very small number. implying they possess a constant variance.

In theory, the mean of residuals should ideally be exactly zero Alternative Hypothesis (Ha): The error terms display
for a well-fitted linear regression model. Indicating that the linear heteroscedasticity, indicating varying variance.
regression model is reasonably well-calibrated on the training
data, and, on average, it does not exhibit systematic bias in its In our specific case, the calculated p-value (0.9197664304253765)
predictions. significantly exceeds the conventional significance level of 0.05.
When the p-value surpasses the significance level, it implies that
In the context of regression analysis, homoscedasticity,indicates there is insufficient evidence to support the conclusion that the
that the residuals exhibit consistent or nearly consistent variance error terms exhibit heteroscedasticity. The null hypothesis implies
along the regression line. To assess this, we can create a scatter that the error terms maintain homoscedasticity.
plot of the error terms against the predicted values, ensuring that
there is no discernible pattern in the residuals. Homoscedasticity is a fundamental assumption in linear regression
models. When this assumption is met, it signifies that the errors
in the model consistently vary across different levels of the
independent variables. This uniformity ensures that the model's
predictions maintain reliability across the entire spectrum of
predictor values. This uniformity facilitates a clearer interpretation
of the relationship between the dependent and independent
variables.

Тhe Goldfeld-Quandt test's outcome [('F statistic',

0.8906577345903255), ('p-value', 0.9197664304253765)] is
considered favorable as it supports the fundamental assumption
of homoscedasticity in linear regression. This assumption is crucial
for ensuring the model's reliability, interpretability, and validity
of statistical inferences derived from the model.

Checking for the normality of error terms (residuals) is an important

Figure 1: Presence of Heteroscedasticity in the Regression step in regression analysis to assess whether the residuals follow
Analysis a normal distribution, which is one of the assumptions of linear
regression. The normality of residuals implies that the errors are
The graphical method involves visualizing the relationship normally distributed around zero, indicating that the model is
between the error terms and predicted values to identify any appropriate for the data.
patterns that may indicate the presence of heteroscedasticity in
the regression analysis. Check for Normality of error terms/residuals
p = sns. distplot (residuals, kde=True)
By Using Goldfeld-Quandt test, heteroscedasticity is tested. p = plt. title ('Normality of error terms/residuals')

J Mathe & Comp Appli, 2024 Volume 3(2): 4-8

The Ljung-Box test is a statistical test used to check for the

presence of autocorrelation in time series data or in the residuals
of a regression model. It assesses whether the past values of a
series (lags) are correlated with the current values.

The null hypothesis of the Ljung-Box test is that there is no

autocorrelation in the data, meaning that the values at different
lags are not significantly correlated. The alternative hypothesis
is that there is autocorrelation present, meaning that the values at
different lags are correlated.

Minimum lb_stat is value, 2.091432890259537, of the Ljung-Box

statistic calculated for a specific lag or set of lags. It indicates the
magnitude of autocorrelation in the residuals at those lags.

lb_pvalue is greater than chosen significance level, as it is in the

results (0.07947300165019978 > 0.05), it suggests that the Ljung-
Figure 2: KDE Histogram of Normality of Error Terms (Residuals) Box statistic is not statistically significant. This means that there
is no strong evidence to conclude that autocorrelation is present
The unit on the x-axis of the histogram and KDE (Kernel Density in the residuals at the specified lags.
Estimation) plot for the normality of error terms/residuals depends
on the values of the residuals themselves. The x-axis represents
the range of values that the residuals are taken.

In the context of your specific analysis, the x-axis likely

represents the range of residual values. These residual values are
the differences between the actual observed values (yi) and the
predicted values (ŷi) for each data point in your dataset.

For example, our regression problem where the dependent

variable (quality) has values ranging from 0 to 10, and the model Figure 4: Autocorrelation Figure 5: Partial Autocorrelation
predictions (ŷi) also fall within this range, then the residuals on
the x-axis would typically be centered around zero (representing Autocorrelation function (ACF) plot and partial autocorrelation
the cases where the model predictions are close to the actual function (PACF) plot are commonly used in time series analysis
values), and the range would extend to both positive and negative to understand the autocorrelation structure of a time series or
values, depending on how much the predictions deviate from the the residuals of a time series model. The plots help identify the
actual values. presence of autocorrelation at different lags and can guide the
selection of appropriate models for time series data.
Autocorrelation is another statistical concept used to analyze and
understand patterns in data. It is a statistical measure that assesses The observed pattern in the plot indicates the presence of
the linear relationship between a time series and its lagged values autocorrelation because there is a spike that extends beyond the
(previous observations). It is often used to detect patterns or red confidence interval region. This suggests that there may be
correlations within a time series data. Autocorrelation can help underlying dependencies or patterns in the data, possibly related
identify periodicity, trends, or seasonality in time series data. to seasonality or other factors. It's important to further investigate
and consider these autocorrelations when analyzing the time
series data.

In the domain of linear regression analysis, a paramount component

is the Loss Function. This integral element plays a pivotal role in
evaluating the model's performance in terms of its ability to capture
the underlying relationship between the independent variable
(often denoted as X) and the dependent variable (Y).

Тhe Sum of Squared Errors (SSE) is employed as an essential

indicator of the overall goodness of fit of the linear regression
model. It quantifies the collective magnitude of squared residuals,
offering valuable insights into the model's ability to accurately
represent the observed data.

Figure 3: Autocorrelation Function Least squares method, which minimizes the sum of squared
differences between the observed Y values and the predicted Ỹ
The autocorrelation function (ACF) is used to plot the correlation values:
between a time series and its lagged values at various lags.
The relationship between ε and SSE is expressed by the formula
for SSE:
J Mathe & Comp Appli, 2024 Volume 3(2): 5-8
Citation: Vesna Knights, Marıja Prchkovska (2024) From Equations to Predictions: Understanding the Mathematics and Machine Learning of Multiple Linear
Regression. Journal of Mathematical & Computer Applications. SRC/JMCA-168. DOI: doi.org/10.47363/JMCA/2024(3)137

coefficient is inflated due to the presence of multicollinearity in

the dataset. Multicollinearity occurs when predictor variables in
a regression model are highly correlated with each other.
Here's how ε and SSE are related:
(10)

R-squared (R2) values are statistical measures that indicate how

well the regression model explains the variability in the data. A
(5) higher R2 value, closer to 1, suggests that the model is better at
explaining the variability. In the results, R2 values of 0.38123 and
And SST is diference differences betweean the observed values 0.303635. A VIF value greater than 1 and less than 5 indicates
yi, and y̅ main of tte valye of yi moderate correlation. These values indicate that the explains some
of the variability in the data, but there is still a substantial amount
of unexplained variability.

VIF (Variance Inflation Factor) is a metric used to assess

(6) multicollinearity in a regression model. Multicollinearity occurs
when predictor variables in the model are highly correlated with
Mean Absolute Error (MAE), measures the average absolute each other, which can lead to unstable coefficient estimates. A
difference between the actual (observed) values and the predicted high VIF value (typically greater than 1) suggests that a predictor
values. variable is highly correlated with other predictors in the model,
indicating multicollinearity. In the results, VIF values of 1.6161 and
(7) 1.43602, which are relatively low. Lower VIF values are generally
better because they indicate lower levels of multicollinearity.
The central objective of the Loss Function is to quantify the error
inherent in the model's predictions. In practice, it measures the In summary, R2 values suggest that regression models explain
extent of disparity between the observed values of the dependent some but not all of the variability in the data. Additionally,
variable (Y) and the corresponding predicted values (Ŷ) for each your VIF values are relatively low, indicating lower levels
data point (i). A widely adopted metric within this context is the of multicollinearity, which is generally a positive outcome in
Mean Squared Error (MSE), defined as follows: regression analysis.

In the realm of machine learning, choosing the right algorithm

(8) is paramount for achieving accurate and reliable predictions.
In this analysis, we have evaluated the performance of three
According to multiple linear regression for the given wine dataset distinct regression models: the DecisionTreeRegressor,
calculations were made for mean_absolute_error (MAE), mean_ RandomForestRegressor, and Support Vector Machine (SVM).
squared_error (MSE) and root_mean_squared_error (RMSE), for Each of these models brings its own strengths and characteristics
the model trained set but also for the tested set. The results are to the table.
presented in Table 6
The DecisionTreeRegressor is known for its ability to capture
For model evaluation is used R2, statistical measure of how close complex relationships within the data, potentially leading to a
data are to the fitted regression line. high level of accuracy on the training set. However, it may also
be prone to overfitting, where it performs exceptionally well on
(9) the training data but struggles to generalize to new, unseen data.

Where SSE is Sum of Square Error and SST is Sum of Square Total The Random Forest Regressor, on the other hand, employs an
ensemble approach, combining multiple decision trees to enhance
Table 6: Multiple Linear Regression Model prediction accuracy. This model often strikes a balance between
Loss Function Multiple Linear Regression complexity and generalization, making it a popular choice for
various regression tasks.
y_train y_test
МАЕ 0.48949 0.53303 Lastly, the Support Vector Machine, or SVM, is a powerful
MSE 0.38888 0.490888 algorithm that excels in capturing intricate patterns within data.
While it may exhibit a lower accuracy on the training set compared
RMSE 0.62360 0.700634
to other models, it can provide robust predictions and is particularly
R2 0.38123 0.303635 adept at handling non-linear relationships.
VIF 1.6161 1.43602
In this comparative analysis, we present the results of these models
A lower MAE, MSE and RMSE, indicates a very good fit of the based on metrics such as accuracy, R-squared, and various error
model to the data because it the predicted values are closer to the measures. By understanding the strengths and limitations of each
actual values. model, we aim to guide the selection process towards the algorithm
best suited for the specific nuances of our dataset and objectives.
The Variance Inflation Factor (VIF) is a measure that helps us
understand how much the variance of an estimated regression

J Mathe & Comp Appli, 2024 Volume 3(2): 6-8

Table 7: Comparising Linerar Regression Problem Solwing with Diferent Type of Machine Learning Models for Wine Dataset
Model Performance Comparison of Regression Models
Accuracy R2 MAE MSE RMSE
DecisionTreeRegressor 1.0 1.0 0.000 0.000 0.00
RandomForestRegressor 0.929 0.929 0.158 0.047 0.217
SVM 0.556 0.556 0.380 0.295 0.543

To illustrate the practical implementation of multiple linear Normality of Residuals

regression, a demonstration using PyTorch, as a popular deep The first assumption involves examining whether the residual
learning framework, is provided in purpose to see the primary errors follow a normal distribution.
similarity and differences between traditional linear regression
analysis and a linear regression model using the PyTorch deep Mean of Residuals Close to Zero
learning framework. The second assumption requires that the mean of the residual
errors should ideally be close to zero or approach zero. (A non-
Initialization of Variables zero mean may indicate a systematic bias in the model which is
Initially, random values are assigned to the coefficients (β0, β1, not the case in our study).
..., β11) that will be learned during training. These coefficients are
declared as PyTorch tensors with `requires_grad=True` to enable Multivariate Normality
gradient computation. Linear regression assumes that all variables are multivariate
normally distributed.
Linear Model Function
The linear model function `mylnmodel` is defined, which takes Homoscedasticity
in the independent variables (e.g., fixed acidity, volatile acidity, Which means that the variance of the residuals should remain
etc.) as tensors and computes the predicted value for `quality` constant across the regression line. To assess homoscedasticity, a
using the learned coefficients. It is a simple linear equation with scatter plot of residuals against fitted values can be examined. If the
coefficients and variables. plot exhibits a consistent spread of points, homoscedasticity is met;
otherwise, a funnel-shaped pattern may indicate heteroscedasticity.
The Mean Squared Error function (MSE) is implemented to
calculate the loss between the predicted values and the actual Multicollinearity Check
"quality" values in the dataset. The last assumption pertains to multicollinearity refers to
high correlations among independent variables. To detect
Gradient Calculation multicollinearity, the Variance Inflation Factor (VIF) is often
After predicting the values and computing the loss, the gradients used. VIF measures the correlation and strength of correlation
of the loss function with respect to the coefficients are calculated between independent variables. A VIF value greater than 1 and
using `loss. backward () `. This step enables the model to update less than 5 indicates moderate correlation, while a VIF less than
the coefficients in the direction that minimizes the loss. 5 is considered a critical level of multicollinearity.

Gradient Descent is used for optimization. The code runs for These assumptions collectively help ensure that a multiple linear
5,000 iterations, updating the coefficients with small steps in the regression model is appropriate for the given data and that the
direction of gradient descent. This process iteratively refines the model's predictions are reliable. Violations of these assumptions
coefficients to improve the model's accuracy. may require further analysis or potential model adjustments.

Dıscussıon and Conclusion Lastly, we conduct comparative assessments with alternative

This paper has provided a comprehensive exploration of multiple regression models, including decision trees, random forests,
linear regression, shedding light on its foundational principles and support vector regression. Also illustrate the practical
and seamless integration with contemporary machine learning implementation of multiple linear regression, a demonstration
techniques. By bridging the theoretical underpinnings with using PyTorch, as a popular deep learning framework
practical applications, we have aimed to equip readers with a
holistic understanding of this versatile statistical method. Conflict of Interest
The authors have declared that there is no conflict of interest.
Three key outcomes emerge from this study. Firstly, we
demonstrate the formulation of independent and dependent Author Contributions
variables in linear regression, providing a structured framework Research Conceptualization: VAK; Machine Learning Model
for modeling. Secondly, analyze model performance using Comparison: VAK, MP. PyTorch Implementation: VAK, MP;
essential metrics such as Mean Absolute Error (MAE), Mean Results and Discussion: VAK; Manuscript Writing: VAK, MP;
Squared Error (MSE), Root Mean Squared Error (RMSE), and the References and Citations: VAK, MP; Figures and Tables: VAK,
Coefficient of Determination (R²). These metrics offer insights into MP; Proofreading and Finalization: VAK.
model accuracy and its explanatory power. Also validate a linear
regression model, it is essential to assess several key assumptions
to ensure the model's reliability and suitability for the data. These
common assumptions for Linear Regression are as follows:

J Mathe & Comp Appli, 2024 Volume 3(2): 7-8

References 13. Kutner MH, Nachtsheim CJ, Neter J, Li W (2004) Applied

1. Montgomery DC, Peck EA, Vining GG (2012) Introduction Linear Statistical Models. McGraw-Hill Education https://
to Linear Regression Analysis. John Wiley & Sons 821. users.stat.ufl.edu/~winner/sta4211/ALSM_5Ed_Kutner.pdf.
2. Wooldridge JM (2019) Introductory Econometrics: A Modern 14. Cortez P, Cerdeira A, Almeida F, Matos T, Reis J (2009)
Approach. Cengage Learning https://au.cengage.com/c/ Modeling wine preferences by data mining from
isbn/9781337558860/. physicochemical properties. Decision Support Systems 47:
3. Fox J (2016) Applied Regression Analysis and Generalized 547-553.
Linear Models. SAGE Publications 816. 15. Least-Squares Method (2008) In: The Concise Encyclopedia
4. Gujarati DN, Porter DC (2009) Basic Econometrics. McGraw- of Statistics. Springer, New York, NY 304-306.
Hill Education https://www.cbpbu.ac.in/userfiles/file/2020/ 16. PyTorch (2023) https://pytorch.org./.
STUDY_MAT/ECO/1.pdf. 17. Bishop CM (2006) Pattern Recognition and Machine Learning.
5. Greene WH (2018) Econometric Analysis. Pearson Education Springer https://link.springer.com/book/9780387310732.
https://www.ctanujit.org/uploads/2/5/3/9/25393293/_ 18. Chen J, Song L, Wainwright MJ, Jordan MI (2018) Learning
econometric_analysis_by_greence.pdf. to explain: An information-theoretic perspective on model
6. Sun Y, Wang X, Zhang C, Zuo M (2023) Multiple Regression: interpretation. In: Proceedings of the 35th International
Methodology and Applications. Highlights in Science, Conference on Machine Learning 80: 883-892.
Engineering and Technology AMMSAC 49: 542. 19. Ribeiro MT, Singh S, Guestrin C (2016) Why should I
7. James G, Witten D, Hastie T, Tibshirani R (2013) An trust you? Explaining the predictions of any classifier.
Introduction to Statistical Learning. Springer https://link. In: Proceedings of the 22nd ACM SIGKDD international
springer.com/book/10.1007/978-1-4614-7138-7. conference on knowledge discovery and data mining 1135-
8. Hastie T, Tibshirani R, Friedman J (2001) The Elements of 1144.
Statistical Learning: Data Mining, Inference, and Prediction. 20. Lee H, Wang J, Leblon B (2020) Using Linear Regression,
Springer. Random Forests, and Support Vector Machine with Unmanned
9. Gelman A, Hill J (2006) Data Analysis Using Regression Aerial Vehicle Multispectral Images to Predict Canopy
and Multilevel/Hierarchical Models. Cambridge University Nitrogen Weight in Corn. Remote Sensing 12: 2071.
Press https://www.cambridge.org/highereducation/books/ 21. Jana M (2023) Exploring Machine Learning Models: A
data-analysis-using-regression-and-multilevel-hierarchical- Comprehensive Comparison of Logistic Regression, Decision
models/32A29531C7FD730C3A68951A17C9D983#over Trees, SVM, Random Forest, and XGBoost. Medium https://
view. medium.com/@malli.learnings/exploring-machine-learning-
10. Hastie T, Tibshirani R, Friedman J (2009) The Elements models-a-comprehensive-comparison-of-logistic-regression-
of Statistical Learning. Springer https://link.springer.com/ decision-38cc12287055.
book/10.1007/978-0-387-84858-7. 22. Knights V, Kolak M, Markovikj G, Gajdoš Kljusurić J (2023)
11. Murphy KP (2012) Machine Learning: A Probabilistic Modeling and Optimization with Artificial Intelligence in
Perspective. MIT Press 1104. Nutrition. Applied Sciences 13: 7835.
12. Iwasaki M (2020) Multiple Regression Analysis from Data
Science Perspective 131-140.

under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.

J Mathe & Comp Appli, 2024 Volume 3(2): 8-8

View publication stats

Lin Mod Book
No ratings yet
Lin Mod Book
567 pages
PLSandPLSDA - Torino2021 - Federico Marini
No ratings yet
PLSandPLSDA - Torino2021 - Federico Marini
53 pages
Classical And. Modern Regression With Applications: Duxbury
No ratings yet
Classical And. Modern Regression With Applications: Duxbury
7 pages
Exp 2 (Multiple Linear Regression)
No ratings yet
Exp 2 (Multiple Linear Regression)
6 pages
Lesson 6
No ratings yet
Lesson 6
25 pages
Day.11 What Is Multiple Linear Regression
No ratings yet
Day.11 What Is Multiple Linear Regression
10 pages
LR LogReg
No ratings yet
LR LogReg
53 pages
Linear Regression (Least Square Error Fit)
No ratings yet
Linear Regression (Least Square Error Fit)
14 pages
Optimizing Linear Regression Models With Lasso and
No ratings yet
Optimizing Linear Regression Models With Lasso and
15 pages
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
No ratings yet
Beyond Multiple Linear Regression Applied Generalized Linear Models and Multilevel Models in R 1st Edition Paul Roback
71 pages
Applied Linear Statistical Models: MS 5218 Dr. Lilun DU Multiple Regression
No ratings yet
Applied Linear Statistical Models: MS 5218 Dr. Lilun DU Multiple Regression
66 pages
Estimating Models and Evaluating Their Efficiency Under Multicollinearity in Multiple Linear Regression: A Comparative Study
No ratings yet
Estimating Models and Evaluating Their Efficiency Under Multicollinearity in Multiple Linear Regression: A Comparative Study
15 pages
Lecture - 8 MLR
No ratings yet
Lecture - 8 MLR
63 pages
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
No ratings yet
COURSES ECONOMETRICS Multiple Regression, Dummy, Error Anal
26 pages
3.3 Regression Problem
No ratings yet
3.3 Regression Problem
30 pages
Mod2 - Multiple Linear Regression
No ratings yet
Mod2 - Multiple Linear Regression
10 pages
DAV 2201079 Exp 3-1
No ratings yet
DAV 2201079 Exp 3-1
11 pages
Unit 4
No ratings yet
Unit 4
4 pages
04 Multiple Linear Regression
No ratings yet
04 Multiple Linear Regression
17 pages
Predictive Model Assignment 3 - MLR Model
No ratings yet
Predictive Model Assignment 3 - MLR Model
19 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
9 pages
Arathi
No ratings yet
Arathi
9 pages
2 Linear
No ratings yet
2 Linear
15 pages
CSL0777 L16
No ratings yet
CSL0777 L16
25 pages
Mulitple Linear Regression
No ratings yet
Mulitple Linear Regression
6 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
Day 6 Session 2 MLR
No ratings yet
Day 6 Session 2 MLR
16 pages
Arnav MLlab03
No ratings yet
Arnav MLlab03
4 pages
Econometrics Formulas Updated
No ratings yet
Econometrics Formulas Updated
4 pages
LPM Stata Baum
No ratings yet
LPM Stata Baum
73 pages
Rohan 20QM30011 AMSM Assignment Ch8
No ratings yet
Rohan 20QM30011 AMSM Assignment Ch8
11 pages
UNIT-5 Detailed Notes
No ratings yet
UNIT-5 Detailed Notes
50 pages
Regression
No ratings yet
Regression
6 pages
CHAPTER THREE - Multiple Linear Regression Analysis
No ratings yet
CHAPTER THREE - Multiple Linear Regression Analysis
77 pages
Multiple Regression Methodology and Applications
No ratings yet
Multiple Regression Methodology and Applications
7 pages
20BCP021 - Assignment - 5
No ratings yet
20BCP021 - Assignment - 5
5 pages
Stat 378
No ratings yet
Stat 378
73 pages
Pink Green Bright Aesthetic Playful Math Class Presentation
No ratings yet
Pink Green Bright Aesthetic Playful Math Class Presentation
34 pages
ML Unit
No ratings yet
ML Unit
23 pages
A Robust Regression Method Based On Exponential-Type Kernel Functions - de Carvalho Et Al
No ratings yet
A Robust Regression Method Based On Exponential-Type Kernel Functions - de Carvalho Et Al
47 pages
Multicollinerity A Violation of Classical Linear Regression Model Assumptions
No ratings yet
Multicollinerity A Violation of Classical Linear Regression Model Assumptions
19 pages
Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
No ratings yet
Multivariate Gaussian and Student T Process Regression For Multi-Output Prediction
29 pages
Khater
No ratings yet
Khater
1 page
U-4 Iml
No ratings yet
U-4 Iml
17 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
AM Lecture10
No ratings yet
AM Lecture10
27 pages
5) Multiple Regression
100% (1)
5) Multiple Regression
8 pages
Matecconf Icpcm2023 01046
No ratings yet
Matecconf Icpcm2023 01046
6 pages
Supervised Learning Notes 1-4
No ratings yet
Supervised Learning Notes 1-4
42 pages
Brauer & Curtin - Linear Mixed Effects Models and The Analysis of Nonindependent Data
No ratings yet
Brauer & Curtin - Linear Mixed Effects Models and The Analysis of Nonindependent Data
24 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
MMM - Multiple Regression
No ratings yet
MMM - Multiple Regression
68 pages
Unit Vi: TO Artificial Neural Network
No ratings yet
Unit Vi: TO Artificial Neural Network
71 pages
House Price Prediction Using Regression Techniques: A Comparative Study
No ratings yet
House Price Prediction Using Regression Techniques: A Comparative Study
5 pages
Regress A o Linear
No ratings yet
Regress A o Linear
8 pages
MBA Analytics For Finance 09
No ratings yet
MBA Analytics For Finance 09
12 pages
CFA Level2
No ratings yet
CFA Level2
8 pages
Economtric 2 Eqution
No ratings yet
Economtric 2 Eqution
64 pages
Ms 236 N 0
No ratings yet
Ms 236 N 0
63 pages
CH-3-Multiple Linear Regression
No ratings yet
CH-3-Multiple Linear Regression
13 pages
Agung Erwanto Statistika Regresi
No ratings yet
Agung Erwanto Statistika Regresi
21 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
GIS320 Lecture4 Geographically Weighted Regression
No ratings yet
GIS320 Lecture4 Geographically Weighted Regression
19 pages
Module5 Marketing Mix Model 1
No ratings yet
Module5 Marketing Mix Model 1
43 pages
Cheat Sheet Linear and Logistic Regression
No ratings yet
Cheat Sheet Linear and Logistic Regression
2 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
Multiple Regression - Show
No ratings yet
Multiple Regression - Show
7 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
Mivar NETs and logical inference with the linear complexity
From Everand
Mivar NETs and logical inference with the linear complexity
Varlamov, Oleg O.
No ratings yet
Topic3 IV Example
No ratings yet
Topic3 IV Example
18 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
AE 2023 Lecture7
No ratings yet
AE 2023 Lecture7
40 pages
A Review On Linear Regression Comprehensive in Machine Learning
No ratings yet
A Review On Linear Regression Comprehensive in Machine Learning
8 pages
DATAENG Practice Problem 11
No ratings yet
DATAENG Practice Problem 11
6 pages
Stata Session 10 2
No ratings yet
Stata Session 10 2
10 pages
Combinepdf
No ratings yet
Combinepdf
8 pages
Practical Problems of Regression - Pdf.crdownload
No ratings yet
Practical Problems of Regression - Pdf.crdownload
2 pages
02 046-Eng
No ratings yet
02 046-Eng
12 pages
Linear With Polynomial Regression:Overview
No ratings yet
Linear With Polynomial Regression:Overview
3 pages
HW #5 - Linear Regression F24 SOLUTIONS
No ratings yet
HW #5 - Linear Regression F24 SOLUTIONS
4 pages
Econometrics I Lecture 4 Wooldridge
No ratings yet
Econometrics I Lecture 4 Wooldridge
33 pages
Percobaan Mat
No ratings yet
Percobaan Mat
4 pages
NA 1.CurveFitting
No ratings yet
NA 1.CurveFitting
12 pages
CH-6, Math-5 - Lecture - Note
No ratings yet
CH-6, Math-5 - Lecture - Note
16 pages
Econometrics Midterms Test BFT 64th
No ratings yet
Econometrics Midterms Test BFT 64th
5 pages
Tank Settlement
No ratings yet
Tank Settlement
6 pages
Mathcad Regression
No ratings yet
Mathcad Regression
14 pages
Mobile Data Logger and Control System
No ratings yet
Mobile Data Logger and Control System
11 pages
PeakFit 4.12 PDF
No ratings yet
PeakFit 4.12 PDF
2 pages

From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression

Uploaded by

From Equations To Predictions Understanding The Mathematics and Machine Learning of Multiple Linear Regression

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

From Equations to Predictions: Understanding the Mathematics and Machine

Article in Journal of Mathematical & Computer Applications · April 2024

Vesna Antoska Knights Marija Prchkovska

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Journal of Mathematical &

Research Article Open Access

From Equations to Predictions: Understanding the Mathematics

J Mathe & Comp Appli, 2024 Volume 3(2): 1-8

J Mathe & Comp Appli, 2024 Volume 3(2): 2-8

Table 1: The Dataset of Wine

(2) Residual Sugar 0.0163 0.276

J Mathe & Comp Appli, 2024 Volume 3(2): 3-8

Тhe Goldfeld-Quandt test's outcome [('F statistic',

Checking for the normality of error terms (residuals) is an important

J Mathe & Comp Appli, 2024 Volume 3(2): 4-8

The Ljung-Box test is a statistical test used to check for the

The null hypothesis of the Ljung-Box test is that there is no

Minimum lb_stat is value, 2.091432890259537, of the Ljung-Box

lb_pvalue is greater than chosen significance level, as it is in the

In the context of your specific analysis, the x-axis likely

For example, our regression problem where the dependent

In the domain of linear regression analysis, a paramount component

Тhe Sum of Squared Errors (SSE) is employed as an essential

coefficient is inflated due to the presence of multicollinearity in

R-squared (R2) values are statistical measures that indicate how

VIF (Variance Inflation Factor) is a metric used to assess

In the realm of machine learning, choosing the right algorithm

J Mathe & Comp Appli, 2024 Volume 3(2): 6-8

To illustrate the practical implementation of multiple linear Normality of Residuals

Dıscussıon and Conclusion Lastly, we conduct comparative assessments with alternative

J Mathe & Comp Appli, 2024 Volume 3(2): 7-8

References 13. Kutner MH, Nachtsheim CJ, Neter J, Li W (2004) Applied

Copyright: ©2024 Vesna Knights. This is an open-access article distributed

J Mathe & Comp Appli, 2024 Volume 3(2): 8-8

View publication stats

You might also like