0% found this document useful (0 votes)
2 views4 pages

regression

The document outlines a Python script that utilizes the pandas and statsmodels libraries to analyze financial data from an Excel file. It performs data cleaning, sets up a linear regression model to predict return on assets based on financial ratios, and evaluates the model's performance using R-squared and residuals. The results indicate a moderate fit with an R-squared value of approximately 0.21.

Uploaded by

tianikban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views4 pages

regression

The document outlines a Python script that utilizes the pandas and statsmodels libraries to analyze financial data from an Excel file. It performs data cleaning, sets up a linear regression model to predict return on assets based on financial ratios, and evaluates the model's performance using R-squared and residuals. The results indicate a moderate fit with an R-squared value of approximately 0.21.

Uploaded by

tianikban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

In [1]: import pandas as pd

In [2]: fin=pd.read_excel(r"C:\Users\RISHI\Desktop\Aug 2024\financial.xlsx")

In [3]: fin.dropna(inplace=True)

In [4]: y=fin['returnOnAssets']

In [5]: x=fin[['currentRatio', 'debtToEquity', 'ebitda']]

In [6]: x

Out[6]: currentRatio debtToEquity ebitda

0 1.038 170.714 128217997312

1 2.247 50.217 90829996032

2 2.928 11.329 91144003584

3 2.928 11.329 91144003584

4 1.136 100.864 59174998016

... ... ... ...

221 0.841 141.986 3068999936

222 1.569 3.537 3946700032

223 2.390 180.941 2355000064

224 3.353 85.034 3475000064

225 0.507 124.735 1319500032

205 rows × 3 columns

In [7]: from sklearn.model_selection import train_test_split

In [8]: import statsmodels.api as sm

In [9]: X=sm.add_constant(x)

In [10]: X
Out[10]: const currentRatio debtToEquity ebitda

0 1.0 1.038 170.714 128217997312

1 1.0 2.247 50.217 90829996032

2 1.0 2.928 11.329 91144003584

3 1.0 2.928 11.329 91144003584

4 1.0 1.136 100.864 59174998016

... ... ... ... ...

221 1.0 0.841 141.986 3068999936

222 1.0 1.569 3.537 3946700032

223 1.0 2.390 180.941 2355000064

224 1.0 3.353 85.034 3475000064

225 1.0 0.507 124.735 1319500032

205 rows × 4 columns

In [11]: X_train, X_test, y_train, y_test = train_test_split(X,y, random_state=0)

In [12]: model=sm.OLS(y_train, X_train).fit()

In [13]: print(model.summary())

OLS Regression Results


==============================================================================
Dep. Variable: returnOnAssets R-squared: 0.209
Model: OLS Adj. R-squared: 0.193
Method: Least Squares F-statistic: 13.11
Date: Fri, 20 Sep 2024 Prob (F-statistic): 1.21e-07
Time: 14:27:11 Log-Likelihood: 248.14
No. Observations: 153 AIC: -488.3
Df Residuals: 149 BIC: -476.2
Df Model: 3
Covariance Type: nonrobust
================================================================================
coef std err t P>|t| [0.025 0.975]
--------------------------------------------------------------------------------
const 0.0309 0.008 3.895 0.000 0.015 0.047
currentRatio 0.0197 0.004 4.890 0.000 0.012 0.028
debtToEquity 2.635e-05 9.75e-06 2.701 0.008 7.08e-06 4.56e-05
ebitda 7.226e-13 2.28e-13 3.165 0.002 2.71e-13 1.17e-12
==============================================================================
Omnibus: 109.102 Durbin-Watson: 1.878
Prob(Omnibus): 0.000 Jarque-Bera (JB): 998.692
Skew: 2.494 Prob(JB): 1.37e-217
Kurtosis: 14.480 Cond. No. 4.42e+10
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly s
pecified.
[2] The condition number is large, 4.42e+10. This might indicate that there are
strong multicollinearity or other numerical problems.

In [14]: # Prediction for the test data


In [16]: y.test_pred=model.predict(X_test)

In [17]: y.test_pred

59 0.084129
Out[17]:
198 0.070687
5 0.064327
21 0.097716
207 0.065393
184 0.087238
86 0.071619
167 0.066811
114 0.075085
37 0.073800
13 0.085601
139 0.093931
62 0.087591
74 0.132399
51 0.073887
183 0.047443
140 0.055721
7 0.132760
42 0.144428
165 0.042749
90 0.062389
121 0.143262
141 0.065798
185 0.050141
196 0.042206
148 0.058587
169 0.071995
123 0.076719
174 0.058076
99 0.071946
200 0.081539
212 0.061014
135 0.083448
187 0.059122
102 0.108028
19 0.069159
208 0.090854
146 0.055471
25 0.074840
84 0.053265
49 0.064228
4 0.098656
81 0.056132
144 0.069132
156 0.056807
132 0.060130
30 0.054889
93 0.101201
162 0.082719
204 0.053697
8 0.064203
100 0.057359
dtype: float64

In [18]: # residual values


model.resid
176 0.003599
Out[18]:
68 -0.031502
85 -0.009711
116 0.025656
71 -0.018216
...
75 0.041095
211 -0.001403
127 0.027477
54 -0.029534
186 0.009018
Length: 153, dtype: float64

In [20]: # r square
model.rsquared

0.20885212614490956
Out[20]:

In [ ]:

You might also like