Project
Project
FINANCIAL ANALYTICS
Slot: T35
Submitted to
Dr. Sunitha K
TEAM MEMBERS
SWETHA M 23MBA0035
NANDHINI M 23MBA0049
MARISH BALAJI V 23MBA0104
SUBHASHINI P 23MBA0111
APARNA S 23MBA0112
0
TABLE OF CONTENTS
CHAPTER I: INTRODUCTION ..........................................................2
1.1) Objectives ....................................................................................2
1.2) Background of the 10 Selected Cryptocurrencies in the Project 2
CHAPTER II: METHODOLOGY ........................................................4
2.1) Purpose of the Study ...................................................................4
2.2) Research Design: .........................................................................4
2.3) Sampling Design: ........................................................................5
2.4) Sampling Method: .......................................................................5
2.5) Data Collection Methods: ...........................................................6
CHAPTER III: DATA ANALYSIS AND INTERPRETATION ........6
3.1) Crypto Index Analysis (Bitwise 10 Crypto Index) .....................6
3.2) Bitcoin .......................................................................................14
3.3) Ethereum ...................................................................................24
3.4) Tether.........................................................................................33
3.5) BNB ...........................................................................................41
3.6) Solana ........................................................................................50
3.7) USDC ........................................................................................57
3.8) XRP ...........................................................................................64
3.9) Dogecoin ...................................................................................73
3.10)Toncoin .....................................................................................81
3.11) Tron: ........................................................................................90
CHAPTER IV: FINDINGS AND CONCLUSION ............................99
4.1) Findings .....................................................................................99
4.2) Implications: ........................................................................... 101
4.3) Conclusion:............................................................................. 105
CHAPTER V: REFERENCES ......................................................... 106
1
CHAPTER 1: INTRODUCTION
The project focuses on forecasting the prices of the top 10 cryptocurrencies using advanced
modeling techniques. Given the volatile nature of digital assets like Bitcoin, Ethereum, and
others, predicting their future prices presents a significant challenge. This project employs
ARIMA and LSTM models to analyze historical price data, leveraging their capabilities to
identify trends and patterns. We evaluate the accuracy of these models to determine their
predictive performance. The analysis is carried out using Python for data processing, model
development, and data visualization, while Excel is utilized for organizing data. By combining
statistical and machine learning approaches, the project aims to provide insights into market
behavior, offering a reliable tool for investors to make informed decisions in the cryptocurrency
landscape. This comprehensive evaluation seeks to identify the most effective model for
predicting cryptocurrency prices with precision.
1.1) Objectives:
• Build ARIMA and LSTM models to predict the future prices of Bitwise 10 Crypto Index
Fund (BITW) and the top 10 cryptocurrencies by market capitalization.
• Evaluate and compare the accuracy of the ARIMA (statistical) and LSTM (machine
learning) models for predicting cryptocurrency price trends.
2
1.2) Background of the 10 Selected Cryptocurrencies in the Project:
These involves the selection of the top 10 cryptocurrencies by market capitalization.
3
support from high-profile individuals. Despite its humorous origins, it has become
widely used for tipping and donations in online communities.
▪ Toncoin (TON): Toncoin is the native cryptocurrency of the TON (Telegram Open
Network), a blockchain platform initially developed by the messaging app Telegram. It
aims to provide fast, scalable transactions and supports decentralized applications,
positioning itself as a versatile tool for digital finance.
▪ TRON (TRX): Launched in 2017 by Justin Sun, TRON is a blockchain platform
designed to decentralize the internet. It focuses on providing a platform for content
creators to share their work without relying on intermediaries. TRON's high throughput
and low transaction fees have made it popular for dApps and smart contracts.
CHAPTER 2: METHODOLOGY
The primary objective of this project is to predict the future prices of the top 10
cryptocurrencies by market capitalization using advanced time series forecasting models such
as ARIMA and LSTM. The study aims to provide investors and researchers with accurate
predictions of price movements, thereby helping them make informed decisions. By comparing
the results of ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short-
Term Memory), the project seeks to identify which model provides better accuracy for
predicting highly volatile crypto markets. This study will also explore the factors influencing
the performance of these models, such as the time horizon of the data, model parameters, and
volatility patterns.
This project follows an exploratory and predictive research design. The exploratory aspect
involves identifying trends, seasonality, and the stochastic behavior of cryptocurrency prices,
using historical price data. The predictive part involves building time series forecasting models
(ARIMA and LSTM) to project future prices based on past data.
The study also aims to conduct a comparative analysis of ARIMA and LSTM models. ARIMA
is a statistical model that captures linear relationships in time series data, whereas LSTM is a
machine learning model designed to handle complex nonlinear dependencies. The project will
4
evaluate the strengths and weaknesses of each model to understand their applicability to
cryptocurrency price forecasting.
The sampling design for this project involves the selection of the top 10 cryptocurrencies by
market capitalization as of the most recent data (2023-2024). The specific cryptocurrencies
included are Bitcoin (BTC), Ethereum (ETH), Tether (USDT), BNB (BNB), Solana (SOL),
USDC (USDC), XRP(XRP), Dogecoin (DOGE), Toncoin (TON), and TRON (TRX).
The historical price data for these cryptocurrencies will be gathered for a period it started
trading to ensure sufficient data points for training the time series models.
• Time-based Sampling: Daily historical price data is collected from reliable data
sources such as Yahoo Finance, Investing.com, or other cryptocurrency exchanges.
5
2.5) Data Collection Methods:
The data for this project will be obtained from secondary sources:
• Historical price data for the top 10 cryptocurrencies will be gathered from platforms
such as investing.com, yahoo finance, or cryptocurrency exchanges' APIs...
The statistical analysis for this project will include the following steps:
• Data Preprocessing: Cleaning and normalizing the data to ensure it's in the appropriate
format for ARIMA and LSTM models.
• ARIMA Model: After conducting stationarity tests (e.g., ADF test), the ARIMA model
will be fitted to the data. The optimal parameters (p, d, q) will be determined using
techniques such as grid search and Akaike Information Criterion (AIC).
• LSTM Model: The data will be transformed into sequences of past observations to
train the LSTM. The model will be tuned using hyperparameter optimization techniques
to identify the best architecture for forecasting.
• Model Comparison: The predicted results from both ARIMA and LSTM models will
be compared using evaluation metrics such as RMSE, MAE, and MAPE to determine
which model provides better performance.
• Visualization: The results will be visualized with line graphs to display the actual vs.
predicted values for both models over the time horizon of the study.
The Bitwise 10 Crypto Index is a popular index that tracks the performance of the 10 largest
cryptocurrencies by market capitalization. It is maintained by Bitwise Asset Management, a
firm specializing in crypto asset management.
The primary goal of the Bitwise 10 Crypto Index is to give investors exposure to a diversified
selection of leading cryptocurrencies. Rather than investing in individual coins, the index
6
allows investors to gain broad exposure to the crypto market with a single investment. This
helps to spread risk across different crypto assets.
As of October 2, 2023, the constituents of the Bitwise 10 Crypto Index and their
respective weights are:
Data Collection:
Historical price data for the Bitwise 10 Crypto Index Fund (BITW) was gathered from
platform called Yahoo Finance for the period of 2020-2024.
Inference:
Model 1: LSTM
• Training Loss: The model’s training loss reduces significantly after just a few epochs,
reaching a minimum of 0.0021. This indicates that the model learned well during
training and converged effectively.
• Model Performance (RMSE): The Root Mean Squared Error (RMSE) on the test
data is 0.025, which indicates that the model's predictions deviate slightly from the
7
actual values. However, for time series predictions, especially in volatile markets like
crypto (BITW), this is considered a decent level of error.
The graph of actual vs. predicted prices shows that while the model is able to capture
the general trend, the predicted values (in orange) appear smoother and less volatile
compared to the actual values (blue line). This is expected behavior in many LSTM
models as they tend to average out rapid fluctuations.
• Prediction of Future Prices: The forecasted prices for the next 40 days (starting from
Sept 27, 2024) show a consistent downward trend, with prices decreasing steadily
from 26.40 to around 17.11. The steady decline might be attributed to the patterns
recognized by the LSTM in historical data, which could suggest an expected bearish
market phase.
• Early Stopping Impact: Early stopping was implemented with a patience of 5 epochs,
which prevented the model from over-training and potentially overfitting. The model
stopped training after 9 epochs, which suggests that it reached an optimal point early
on.
Summary:
• The LSTM model captures the overall trend in stock prices, but it smooths out short-
term fluctuations, leading to less volatile predictions.
8
• RMSE of 0.025 suggests a reasonable level of error for this time series forecasting
problem.
• The future price predictions show a downward trend, which might be a continuation
of the patterns recognized in the training data.
Adjustments in the architecture or additional training data might improve the model’s ability
to predict sudden price movements more accurately.
➢ The chosen ARIMA model is configured with parameters (5, 1, 0), which indicates:
❖ Model Summary:
➢ Log Likelihood: The log likelihood of the model is -2499.066, which helps assess
model fit.
9
➢ AIC (Akaike Information Criterion): The AIC is 5010.132. This is used to compare
models, with lower AIC values generally indicating a better model fit.
➢ BIC (Bayesian Information Criterion): The BIC value is 5039.026, which similarly
helps evaluate model fit, but penalizes more complex models.
❖ Coefficients:
➢ The second lag (ar.L2) has a strong negative impact with a coefficient of -0.2058, which
implies that values two time periods back negatively affect the current forecast.
➢ The third autoregressive term (ar.L3) is also negative and statistically significant,
showing additional negative correlation from past data.
➢ The fourth and fifth lags (ar.L4, ar.L5) are not statistically significant, indicating that
they may not contribute much to the model's prediction ability.
❖ Residual Analysis:
➢ The residuals plot shows fluctuations around zero, indicating that the model has
captured much of the patterns in the data. However, some volatility in residuals might
suggest that not all patterns are perfectly captured.
10
➢ The density plot of residuals indicates a right-skewed distribution, which may
suggest that there are some outliers or extreme values in the data that are not well
explained by the model.
➢ Residual statistics: The residuals have a mean close to zero (0.038), which is a good
indicator that the model does not have large bias. However, the standard deviation
(3.968) suggests that there are still notable deviations from the actual values.
❖ Forecasting Results:
➢ The model forecasts a relatively flat price trajectory for the next 40 days, with values
stabilizing around 35.90.
➢ The forecast doesn't exhibit much variation, which might indicate that the ARIMA
model struggles with capturing the high volatility typical of cryptocurrency prices,
leading to over-smoothing of future predictions.
❖ Evaluation of Forecast:
➢ The Mean Squared Error (MSE) for the test set is 31.78, which suggests that there is
a considerable deviation between the predicted and actual test values. This relatively
high MSE indicates that the ARIMA model may not be the best for forecasting highly
volatile assets like BITW.
11
➢ The forecast results are less volatile compared to actual data, which could be due to the
model's structure (ARIMA tends to smooth out volatility) or insufficient modelling of
underlying factors affecting price volatility.
Key Observations:
• Strengths: The ARIMA model does a good job of capturing overall trends, but it is
limited in handling short-term fluctuations, especially in highly volatile markets like
crypto.
• Limitations: The model produces smooth forecasts, which may not adequately reflect
the real-world volatility of BITW. The residuals suggest that certain patterns remain
unexplained, and the right-skewness hints at possible outliers or abrupt changes that the
model fails to capture.
The ARIMA model's metrics indicate that it struggles to predict the actual values accurately:
• The MSE (31.78) is relatively high, suggesting that there is a large deviation
between the predicted and actual prices.
• The RMSE (5.64) is also high, which highlights significant forecast errors.
• Both MAE and MAPE are returning nan, which could be due to division by
zero or incorrect handling of data, possibly because some actual values in the
test set are zero or very small.
12
❖ LSTM Model Metrics:
• The MSE (4.70) is significantly lower than the ARIMA model’s, indicating that
the LSTM captures the pattern with much smaller errors.
• The RMSE (2.17) is also much lower, showing that the LSTM model's
predictions are more accurate on average.
• The MAPE (5.88%) suggests that the LSTM model's predictions deviate by an
average of 5.88% from the actual values, which is a reasonably good level of
accuracy.
• MAE (1.84) is quite low, meaning that on average, the LSTM model's
predictions are only off by about 1.84 units.
Conclusion:
The LSTM model is clearly the better model for forecasting BITW prices. It provides
significantly more accurate predictions with lower error metrics across the board. The ARIMA
model, while useful for linear, stationary time series, is not as effective in capturing the highly
volatile and nonlinear nature of cryptocurrency data like BITW.
MODEL ACCURACY
ARIMA 0.50
LSTM 0.85
➢ The ARIMA model achieves an accuracy of 50%, meaning that the model correctly
predicts whether the return is positive or negative in half of the cases. This is similar to
a random guess and indicates that the ARIMA model struggles with predicting the
direction of the returns.
➢ The reason for ARIMA’s performance may be due to the complexity and volatility of
the cryptocurrency data, which is typically difficult for linear models like ARIMA to
handle effectively.
13
❖ LSTM Model Accuracy:
➢ The LSTM model achieves a much higher accuracy of 85%, suggesting that it is better
at capturing the nonlinear patterns in the time series data.
Conclusion:
• LSTM Model is the better model in this case, achieving a significantly higher accuracy
of 85% compared to the 50% accuracy of the ARIMA model.
• The ARIMA model’s performance indicates that it is not suitable for this particular
dataset, likely due to its inability to capture the complex, nonlinear trends in
cryptocurrency data.
• LSTM’s architecture allows it to perform much better for this multivariate and time-
dependent data, making it the preferred model for predicting BITW stock prices.
Given the accuracy comparison, LSTM is the recommended model for future forecasting tasks
involving BITW or similar volatile assets.
3.2) BITCOIN
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
14
Significance CV- CV-
Test
P-value level Test statistic 10% CV-5% 2.5% CV-1%
ADF 0.874837 0.05 -0.58273 -2.8621 -3.43162
KPSS 0.01 0.05 7.729495 0.347 0.463 0.574 0.739
The series is likely to be non-stationary
Differencing:
15
Model 1: LSTM Model
The loss values indicate the model's prediction error during training, with lower values
reflecting better performance. Starting at 0.0027 in Epoch 1, the loss significantly decreases to
9.0149e-04 (0.00090149) in Epoch 2, suggesting the model is beginning to learn the underlying
data patterns. This trend continues across subsequent epochs, with losses of 6.8044e-04 (Epoch
3), 6.4960e-04 (Epoch 4), 5.4779e-04 (Epoch 5), 5.2417e-04 (Epoch 6), and 4.4458e-04
(Epoch 7). The consistent decline in loss values indicates effective learning and convergence
of the optimization algorithm.
Interpretation:
Overall Trend: The model generally captures the overall trend of the actual prices, with both
lines showing a similar upward and downward movement.
16
Underestimation: In several regions, the predicted prices consistently underestimate the actual
prices. This suggests that the model is not fully capturing the upward momentum of the series.
Lag: There is a noticeable lag between the actual and predicted prices, particularly during
periods of rapid price changes. This indicates that the model is struggling to keep up with the
fast-paced movements of the series.
Interpretation:
• Downward Trend: The forecasted prices exhibit a consistent downward trend over the
40-day period.
• Linearity: The trend appears to be linear, suggesting a constant rate of price decline.
• Magnitude of Decrease: The predicted price decrease is significant, with a drop from
approximately 60,000 to 42,500.
The forecast suggests a potential downward trend in prices over the next 40 days. However, it's
important to note that cryptocurrency markets are highly volatile, and the actual price
movements may deviate significantly from the forecast. The model used to generate this
forecast may not capture all the relevant factors that could influence the price, and the
prediction should be interpreted with caution.
17
Model observation:
The LSTM model demonstrates effective learning, with loss values consistently decreasing
from 0.0027 (Epoch 1) to 4.4458e-04 (Epoch 7), indicating improved data fitting. The model
converged efficiently and achieved a low RMSE of 0.0356, reflecting small prediction errors
despite cryptocurrency volatility. It successfully captures overall price trends but tends to
underestimate sharp upward movements and exhibits lag during rapid price changes. The 40-
day forecast predicts a linear downward trend from 60,000 to 42,500, though caution is advised
as the model may not fully account for the volatility inherent in cryptocurrency markets.
Conclusion:
The LSTM model demonstrates effective learning with a consistent reduction in loss and a
relatively low RMSE, indicating that it has achieved good accuracy in predicting
cryptocurrency prices. However, it underestimates the upward momentum in certain regions
and lags during rapid price changes. The forecasted linear downward trend should be
interpreted cautiously due to the high volatility inherent in cryptocurrency markets. While the
model captures general trends, it may not account for all factors influencing price movements,
and further tuning or additional data could improve its performance in capturing rapid
fluctuations.
18
❖ Model Overview
The SARIMAX results indicate an ARIMA model specified as ARIMA(5, 1, 0), where:
• 1 indicates the differencing order (the data has been differenced once to make it
stationary).
The model is based on 5146 observations (data points) collected from July 18, 2010, to
August 18, 2024. The Log Likelihood value of -41275.345 represents the likelihood of the
observed data given the model parameters.
❖ Coefficients of AR Terms
• ar.L1: -0.0556 (p < 0.000): The first lag of the dependent variable has a negative
influence on the current value.
• ar.L2: 0.0149 (p = 0.019): The second lag has a positive influence, though weaker than
L1.
• ar.L4: 0.0353 (p < 0.000): This term shows a strong positive influence on the current
price.
• ar.L5: 0.0153 (p = 0.010): Similar to the second lag, it also has a positive influence but
is weaker.
Overall, the autoregressive coefficients indicate that the past values of the price have significant
effects on the current price, with the most substantial influence from the first lag.
• Sigma² (Variance of the error term): 5.447e+05 indicates the variance of the residuals
(error terms) from the model. A larger value signifies higher variability in the model's
error predictions. The standard error associated with this variance is 3255.078, and it is
19
statistically significant (p < 0.000), suggesting that the residuals have a significant
degree of variability.
❖ Statistical Tests
• Ljung-Box (L1) (Q): 0.00 with a Prob(Q) of 0.99 indicates that there is no evidence of
autocorrelation in the residuals. A high p-value suggests that the model adequately
captures the autocorrelation in the data.
• Jarque-Bera (JB): 88472.08 with Prob(JB) < 0.000 indicates that the residuals do not
follow a normal distribution. This high value suggests a departure from normality.
• Heteroskedasticity (H): 2598.23 with Prob(H) < 0.000 shows that the residuals exhibit
heteroskedasticity, meaning the variance of errors is not constant across observations.
• Kurtosis: 23.31 suggests that the residuals have heavy tails compared to a normal
distribution (kurtosis of 3), indicating potential outliers or extreme values in the
residuals.
20
❖ Residual Diagnostics:
• No Clear Pattern: The residuals appear to be centered around zero and do not exhibit
any clear patterns or trends. This suggests that the ARIMA model is capturing the
underlying trend and seasonality in the data.
• Randomness: The residuals seem to be randomly distributed, indicating that the
model is not missing any significant components.
• Outliers: There are a few outliers, which are points that deviate significantly from the
rest of the data. These outliers could be due to unusual events or noise in the data.
21
• Distribution Shape: The density plot shows a clear peak around zero, indicating that
the majority of residuals are close to zero. This is a good sign as it suggests that the
model is accurately predicting the values in most cases.
• Skewness: The distribution is slightly skewed to the right, with a longer tail on the
right side. This indicates that there are some larger positive residuals, meaning the
model underestimates the values in some cases.
• Kurtosis: The distribution appears to be leptokurtic, meaning it has heavier tails than
a normal distribution. This suggests that there are some outliers or extreme values in
the residuals.
❖ Forecasted Price:
• Initial Spike: The forecasted prices start with a sharp increase, reaching a peak
around October 1, 2024.
• Sharp Decline: After the initial spike, the prices experience a sharp decline, reaching
a low point around October 10, 2024.
• Stabilization: Following the decline, the prices stabilize at a relatively constant level.
22
Model Interpretation Based on Accuracy Metrics
❖ ARIMA Model
• MSE (Mean Squared Error): 12,131,087.25 highlights that larger errors are penalized
more heavily, but the MSE is still significantly lower compared to other models,
suggesting ARIMA handles outliers reasonably well.
• RMSE (Root Mean Squared Error): 3482.97 further emphasizes ARIMA's predictive
error, but it still remains lower than the LSTM model.
Conclusion for ARIMA: The ARIMA model has decent accuracy, particularly when looking
at the MAPE. It strikes a balance between absolute error and percentage error, making it a
reasonably effective model for this task.
❖ LSTM Model
• MAE: 3508.58, higher than ARIMA, suggests that LSTM’s average error is larger,
indicating its predictions deviate more from actual values.
• MSE: 16,860,644.36, larger than ARIMA, indicates that LSTM struggles more with
larger errors and may be more affected by outliers.
• RMSE: 4106.17, also higher than ARIMA, implies that LSTM has larger prediction
errors overall.
23
• MAPE: 5.72%, slightly higher than ARIMA, suggests that LSTM's percentage error is
higher, making it less accurate in comparison.
Conclusion for LSTM: The LSTM model performs worse than ARIMA based on all metrics,
with larger absolute and percentage errors. While LSTM is often strong in time-series
predictions, in this case, it is outperformed by ARIMA.
Overall Conclusion:
• ARIMA is the most accurate model across all metrics, with the lowest MAE, MSE,
RMSE, and MAPE, making it the best performer for this prediction task.
• LSTM performs worse than ARIMA, with higher errors in both absolute and
percentage terms. While LSTM is often a powerful model for time-series data, it is not
the best choice here.
3.3) ETHEREUM
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
24
Test P-value Significance Test statistic CV- CV-5% CV- CV-1%
level 10% 2.5%
ADF 0.471043 0.05 -1.623217 -2.86247 -3.43246
KPSS 0.01 0.05 6.057117 0.347 0.463 0.574 0.739
The series is likely to be non-stationary
Differencing:
As training progressed, the loss steadily decreased, reaching 0.0012 by Epoch 7, demonstrating
effective learning and improvement. After Epoch 3, the reduction in loss became more gradual,
25
suggesting the model was nearing its optimal performance. By Epoch 7, the loss had stabilized,
indicating that further training would likely yield only minor improvements, and the model
might soon converge. This pattern reflects a successful learning process, with the model
approaching a point of diminishing returns in performance enhancements.
The RMSE value of 0.0371 on the test data indicates that, on average, the model's predictions
deviate from the actual values by around 3.71%. In the context of Ethereum 's highly volatile
prices, this suggests the model has relatively good predictive accuracy.
Interpretation:
Model Performance: The model appears to capture the general trend of the actual prices, with
the predicted line following the overall shape of the actual line. However, there are noticeable
deviations, particularly during periods of high volatility.
Underestimation: The model consistently underestimates the actual prices, particularly during
periods of rapid price increases. This suggests that the model might be struggling to capture
the full extent of price fluctuations.
Lag: There seems to be a slight lag between the actual and predicted prices, indicating that the
model might not be responding quickly enough to sudden price changes.
26
Interpretation:
• Downward Trend: The forecasted prices exhibit a consistent downward trend over the
40-day period.
• Linearity: The trend appears to be linear, suggesting a constant rate of price decline.
The LSTM model predicts a steady decline in Ethereum prices over the next 40 days. However,
it's important to note that cryptocurrency markets are highly volatile, and the actual price
movements may deviate significantly from the forecast. The model's prediction is based on
historical data and may not fully capture the impact of future events or market sentiment.
Model observation:
The LSTM model shows effective learning with a steady decrease in loss across epochs,
stabilizing by Epoch 7, and achieving an RMSE of 0.0371, indicating decent predictive
accuracy. However, the model underestimates price increases and exhibits lag when responding
to sudden market shifts, likely due to the volatile nature of cryptocurrencies. The forecasted
linear decline in Ethereum prices over the next 40 days suggests the model may oversimplify
trends and fail to capture the complexity and rapid fluctuations typical of cryptocurrency
markets.
27
Conclusion:
The LSTM model effectively captures the general price trend but struggles with rapid market
fluctuations, especially during periods of volatility. The linear downward trend prediction may
suggest an oversimplified understanding of price movements. Given cryptocurrency’s
unpredictable nature, the model might benefit from incorporating more sophisticated features.
Enhancing the model by adding features like trading volume or external financial data, and
refining its architecture—such as modifying the number of LSTM layers or introducing
dropout layers—could improve its ability to capture complex price patterns and manage market
volatility more effectively.
28
❖ Coefficients of AR Terms:
• AR.L1 (Lag 1): -0.0700 (p-value < 0.05). This coefficient indicates that the value of
the series at lag 1 has a negative effect on the current value. The negative sign
suggests that when the price was high at the previous step, it has a tendency to pull
back slightly in the next step.
• AR.L2 (Lag 2): 0.0175 (p-value = 0.041). The coefficient for lag 2 is small but
positive, suggesting a weak positive relationship between the value at lag 2 and the
current value.
• AR.L3 (Lag 3): 0.0404 (p-value < 0.05). This positive coefficient indicates a
significant positive impact of lag 3 on the current price.
• AR.L4 (Lag 4): 0.0298 (p-value < 0.05). This positive coefficient suggests the price
from 4 steps back contributes positively.
• AR.L5 (Lag 5): -0.0508 (p-value < 0.05). This indicates that the price 5 steps back
have a significant negative impact.
❖ Statistical Tests:
• Ljung-Box (L1) (Q) Test: This test checks for autocorrelation in the residuals. A p-v
alue of 0.83 indicates no significant autocorrelation, meaning the model residuals are
independent.
• Jarque-Bera (JB) Test: This tests for normality in residuals. A JB value of 48793.94
with a p-value of 0.00 indicates non-normality in residuals, which may suggest the
presence of outliers or skewness in the data.
• Heteroskedasticity Test (H): The test shows H = 12.50 and a p-value of 0.00, meani
ng that heteroskedasticity (non-constant variance) is present. This suggests that the
variance of the residuals is not constant over time.
❖ Residual Diagnostics:
• No Clear Pattern: The residuals appear to be centered around zero and do not exhibit
any clear patterns or trends. This suggests that the ARIMA model is capturing the
underlying trend and seasonality in the data.
• Randomness: The residuals seem to be randomly distributed, indicating that the model
is not missing any significant components.
30
• Outliers: There are a few outliers, which are points that deviate significantly from the
rest of the data. These outliers could be due to unusual events or noise in the data.
• Distribution Shape: The density plot shows a clear peak around zero, indicating that the
majority of residuals are close to zero. This is a good sign as it suggests that the model is
accurately predicting the values in most cases.
• Skewness: The distribution is slightly skewed to the right, with a longer tail on the right
side. This indicates that there are some larger positive residuals, meaning the model und
erestimates the values in some cases.
• Kurtosis: The distribution appears to be leptokurtic, meaning it has heavier tails than a n
ormal distribution. This suggests that there are some outliers or extreme values in the
residuals.
The graph shows the forecasted prices for the next 40 days, starting from September 28, 2024.
The mean squared error (MSE) is a metric used to evaluate the accuracy of the forecast.
• Initial Spike: The forecasted prices start with a sharp increase, reaching a peak around
October 1, 2024.
• Stabilization: After the initial spike, the prices stabilize at a relatively constant level.
• MSE: The MSE of 34259.14 indicates that the model's predictions are relatively far
from the actual values. This suggests that the model may not be capturing the underlying
dynamics of the price series effectively.
31
MODELS PERFORMANCE COMPARISON
❖ LSTM Model
• MAE: 135.35, lower than ARIMA, implies that LSTM's average prediction error is
smaller, indicating a closer fit to the actual data.
• MSE: 25,984.50, also lower than ARIMA, suggests that LSTM handles larger errors
better, reducing the effect of outliers.
• RMSE: 161.20, shows the predictive accuracy is better than ARIMA, and fewer
extreme errors are influencing the model.
• MAPE: 5.33%, the lowest among the three models, indicates that LSTM provides the
most accurate predictions in terms of percentage deviation from the actual values.
32
Conclusion for LSTM: The LSTM model has the best overall performance across all error m
etrics, showing its strength in handling time series data. It has lower errors and is more accura
te in its predictions compared to ARIMA and Logistic Regression.
Overall Conclusion:
• The LSTM model is the most accurate and reliable across all the given metrics, with
the lowest errors and best overall performance.
• The ARIMA model performs reasonably well but is not as accurate as LSTM. It could
still be useful depending on the specific use case.
3.4) BNB
Stationarity Test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
Intrepretation:
33
Differencing
34
Model 1: LSTM
An RMSE of 0.0122 indicates that the model is accurately predicting the target variable with
minimal error, reflecting high precision in its predictions. This low value shows that the model
has effectively learned from the training data and is well-suited for the task, demonstrating a
strong fit between the predicted and actual values. Overall, the RMSE confirms the model's
reliability and robustness in delivering accurate forecasts.
35
• Accuracy and Trend Capturing
• Trend Alignment
The actual prices, represented by the blue line, demonstrate significant volatility, showcasing
sharp fluctuations over time. In contrast, the predicted prices, indicated by the orange line,
closely follow the trends of the actual prices, suggesting that the prediction model captures the
underlying market behavior effectively. However, there are instances, particularly during
periods of rapid price changes, where the predicted values lag behind the actual prices.
• Smooth Fit
The predicted prices appear smoother compared to the actual prices, indicating that the model
incorporates some level of smoothing, making it less sensitive to short-term fluctuations while
effectively capturing longer-term trends. This smoothness suggests that while the model
performs well for long-term forecasts, it may struggle with short-term accuracy during volatile
market conditions, highlighting potential areas for improvement in forecasting strategies.
❖ Predicted price
36
• General Trend: The forecast predicts a consistent decline in prices over the next 40
days. Starting from approximately 550 at the beginning of October, the price steadily
drops to around 250 by the end of October.
• Rate of Decline: The curve suggests that the rate of decline is gradual but continuous.
It does not show any major fluctuations or spikes, indicating a smooth downward
trajectory in the predicted values.
• Prediction Accuracy: The forecasted prices show a steady, predictable pattern. There
are no signs of volatility or unexpected changes, implying that the forecasting model is
confident about this declining trend over the time period.
• The forecast suggests a bearish outlook for the asset or stock, with prices expected to
decrease significantly over the next 40 days. This steady decline could indicate market
conditions such as oversupply, weak demand, or other external factors leading to a price
drop. This forecast implies caution, as prices may continue to drop in the near future
unless external factors change.
Model 2: ARIMA
❖ AR Coefficients:
The autoregressive coefficients (ar.L1 to ar.L5) show how past values influence the current
value. Specifically:
• ar.L1 (-0.1167): The first lag has a significant negative impact on the current price, as
the coefficient is statistically significant (p-value < 0.05).
• ar.L2 (0.0860): This coefficient is positive and statistically significant (p-value < 0.05),
indicating that the second lag has a positive influence on the current price.
• ar.L3 (0.0551): Positive and statistically significant (p-value < 0.05), suggesting that
the third lag positively influences the current price.
• ar.L4 (0.0008): The fourth lag is not significant, as the coefficient is very close to zero
and the p-value is high (p = 0.934), indicating no meaningful influence on the current
price.
• ar.L5 (-0.0941): This coefficient is negative and statistically significant (p-value <
0.05), suggesting a significant negative influence from the fifth lag on the current price.
37
• sigma² (149.4350): This represents the estimated variance of the error term. The higher
value suggests the model has captured some noise in the data.
• Skew value: -0.36 The negative skewness indicates that the distribution of the residuals
is slightly left-skewed. This means that the left tail of the distribution (representing
lower price values) is longer or fatter than the right tail, suggesting that there are more
small, negative residuals than positive ones.
• Kurtosis value: 32.19 The kurtosis is significantly higher than 3 (which would indicate
a normal distribution), suggesting that the residuals have heavy tails. This high kurtosis
indicates that the residuals experience extreme values more often than would be
expected under a normal distribution.
❖ Residual diagnostics
38
• Ljung-Box (L1) (Q = 0.02, Prob(Q) = 0.88): The high p-value suggests that there is
no significant autocorrelation in the residuals. This indicates that the model has
adequately captured the patterns in the time series data.
• Jarque-Bera (JB = 87,884.71, Prob(JB) = 0.00): The Jarque-Bera test indicates that
the residuals are not normally distributed (p-value < 0.05). This suggests that there may
still be skewness or kurtosis in the data that the model has not captured.
• Heteroskedasticity (H = 152.59, Prob(H) = 0.00): The test indicates significant
heteroskedasticity (variance changes over time) in the residuals, suggesting the
potential need for a model that accounts for changing volatility (e.g., GARCH).
❖ Forecasted price
The forecasted prices in the plot are relatively stable, maintaining an average around 535.0
with minimal fluctuations over the forecast horizon. This stability suggests that the model
expects the price to stabilize and remain close to this level in the future. The flat trend
indicates a lack of significant volatility, reflecting a period of equilibrium in the market.
The lack of substantial change in the forecasted prices may imply that the model captures
the mean-reverting behavior of the time series rather than anticipating large price swings
or trends. This suggests that any deviations from the average price are expected to be
temporary, reinforcing the idea that prices will converge back to the forecasted level rather
than exhibiting significant upward or downward movements in the near term.
39
Model performance comparison
❖ ARIMA Model
• MSE (Mean Squared Error): 58,621.63, which is quite high, indicating that the
model's predictions are not close to the actual values in terms of variance.
• RMSE (Root Mean Squared Error): 242.12, also high, indicating the model has a
substantial error magnitude in predicting values.
• Inference: While the ARIMA model has provided results for MSE and RMSE affect
the model’s overall reliability and accuracy.
❖ LSTM Model
• MAE (Mean Absolute Error): 33.53, which is relatively low and indicates that the
average error in predictions is small.
• MSE (Mean Squared Error): 1,361.55, which is significantly lower compared to the
ARIMA model, indicating better accuracy in predicting values close to the actual ones.
• RMSE (Root Mean Squared Error): 36.90, further reflecting that the LSTM model
performs well with much smaller errors compared to ARIMA.
• MAPE (Mean Absolute Percentage Error): 6.02%, indicating that the model's
predictions are accurate within a small percentage error of the actual values.
• Inference: The LSTM model performs well across all metrics, showing that it captures
the underlying patterns in the data effectively. Its low error metrics make it the most
accurate and suitable model for this task.
40
Conclusion
• The LSTM model is the most effective based on its low error rates across all metrics,
showing high predictive accuracy.
• The ARIMA model shows high errors (MSE, RMSE), and the presence of Nan in key
metrics like MAE and MAPE indicates potential issues with data or model reliability.
3.5) TETHER
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
Intrepretation:
.
This is stationary data.
41
Model 1: LSTM
❖ Training Epochs and Loss
• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased, in
dicating that the model was learning and improving over time. The loss values start at
0.0239 and gradually decrease.
• Improving Loss: The loss reduction shows that the model is fitting the data better as
training progresses.
• Small Loss Values: The low loss values suggest that the model is performing well
during training, as it is minimizing the error effectively.
• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.
A RMSE of 0.0122 shows that the model is accurately predicting the target variable with
minimal error, confirming that the model has successfully learned from the training data and is
well-suited for the task at hand. This low RMSE value indicates high prediction precision and
a strong fit between the model and the data.
42
• Accuracy and Trend Capturing
• Trend Alignment: The predicted prices (orange dots) generally follow the overall trend
of the actual Tether prices (blue line), indicating that the model has successfully
captured the general movement of prices. However, there are clear deviations in certain
periods where the actual prices experience sharper fluctuations, showing that the model
struggles to fully capture the volatility in those instances.
• Smoother Predictions: The predicted prices are noticeably smoother compared to the
actual prices, which show more frequent and larger spikes and dips. This smoothing
effect suggests that the model is filtering out short-term market noise and focusing on
broader trends. While this can be beneficial for generating stable predictions, it also
means the model may be underestimating sudden, significant price movements in the
actual data.
❖ Predicted Prices
43
❖ General Trend:
There is an upward trend in the plot. The predicted prices are start from 1.0020647048950195
on 2024-09-28 and increase to 1.0055464506149292 on 2024-11-06.
❖ Prediction Characteristics:
• The predictions do not show significant price volatility or large fluctuations over the
40-day forecast. This indicates that the model predicts stable price movement with
upward changes.
• Prices are converging to a narrow range toward the end of the 40 days (from 1.002064
7048950195 to 1.0055464506149292), suggesting a stable market outlook according
to the LSTM model.
❖ Key Observations
• Good Fit for Smooth Trends: The model successfully captures a clear upward trend
in the predicted prices, making it suitable for scenarios where smooth, predictable
trends are observed. This characteristic indicates that the LSTM model is well-
calibrated to detect and project consistent growth patterns over time.
• Limited Volatility Capture: The predictions demonstrate minimal price volatility,
suggesting that the model may not be designed to account for sudden market
fluctuations or high-risk scenarios. This limitation means that while the model provides
reliable forecasts for stable market conditions, it may overlook more erratic behaviours
in price movements that could occur in volatile markets.
44
Model 2: ARIMA Model
❖ AR coefficients:
The autoregressive coefficients (ar.L1 to ar.L5) show how the past values influence the current
value. Specifically:
• ar. L1 (-0.3931): The first lag has a strong and significant negative impact on the
current price, with a p-value of 0.000, indicating it is highly statistically significant. A
1-unit increase in the first lag of price results in a 0.3931 decrease in the current price.
• ar. L2 (-0.1808): The second lag also has a significant negative influence on the current
price (p-value < 0.05), showing that past price movements negatively affect the current
price, but less so than the first lag. A 1-unit increase in the second lag of price results
in a 0.1808 decrease in the current price.
• ar. L3 (-0.1555): The third lag continues to have a significant negative effect on the
current price, with a coefficient of -0.1555 and a p-value of 0.000, meaning it has a
statistically significant role. A 1-unit increase in the third lag of price results in a 0.1555
decrease in the current price.
• ar. L4 (-0.1224): This lag has a smaller but still statistically significant negative impact
on the current price, with a p-value of 0.000. A 1-unit increase in the fourth lag of price
results in a 0.1224 decrease in the current price.
45
• ar. L5 (-0.0299): The fifth lag has the smallest negative effect, but it is still statistically
significant (p-value < 0.05), indicating a minor impact. A 1-unit increase in the fifth lag
of price results in a 0.0299 decrease in the current price.
• sigma2 (1.342e-05): This is the variance of the residuals (errors). It has a very small
value and is statistically significant (p-value = 0.000), indicating the model has captured
a considerable portion of the variance in the data.
• Log Likelihood (11235.274): This value indicates the goodness of fit for the model.
Higher values (closer to 0) suggest a better model fit to the data.
• AIC (-22458.549): The Akaike Information Criterion helps compare models, with
lower values indicating a better model fit. This value is notably low, suggesting that the
model fits well relative to potential alternatives.
• BIC (-22423.183): The Bayesian Information Criterion, similar to AIC, also favors
lower values. A BIC this low suggests a strong model fit while penalizing for
complexity.
❖ Residual diagnostics
46
• Ljung-Box (L1) (Q: 3.19): The Ljung-Box test checks for autocorrelation in the
residuals. A Q value of 3.19 indicates no significant autocorrelation at lag 1 (since the
p-value is 0.07), suggesting that the residuals are white noise.
• Jarque-Bera (JB: 149759.29, Prob (JB): 0.00): The Jarque-Bera test indicates a
significant deviation from normality (p < 0.05), suggesting that the residuals may not
be normally distributed.
• Heteroskedasticity (Prob(H): 0.00): The test for heteroskedasticity indicates the
presence of non-constant variance in the residuals, which could affect model
predictions and inferences.
❖ Forecasted Prices
• The forecasted prices for the next 40 days are mostly stable around 3.0 after the initial
period, with minimal fluctuations. This indicates that the model expects the price to
remain steady over the forecast horizon, with no significant trends or deviations
predicted.
• At the start of the forecast (around 2024-10-01), there is a sharp increase in the price,
rising quickly to just under 4.0. This is followed by a correction, where the price
declines over the next few days and settles around 3.0.
47
• Beyond the first week of October, the price stabilizes and shows little to no change until
November 1st, where it continues to hover around 3.0. This flattening indicates that the
model predicts very low volatility, suggesting that the price may have reached an
equilibrium.
• This type of behavior, where the price stabilizes after an initial fluctuation, may indicate
that the ARIMA model is capturing the underlying mean-reverting characteristics of the
time series. The model suggests that prices are likely to settle near an average value
after a short-term spike.
• The model does not predict any substantial upward or downward trends after the initial
adjustment. The price is expected to stabilize, with no signs of further large fluctuations
in the near future.
❖ Key takeaway
• Trend Detection: The model captures short-term mean-reverting behavior, showing an
initial spike followed by price stabilization around 3.0. No significant long-term trends
are detected.
• Volatility Capture: The model captures early volatility but predicts a stable price after
the first week of October, implying limited volatility beyond that point.
• Improvement: Incorporating models like LSTM or hybrid approaches could better
capture long-term trends. Using GARCH could improve the model’s ability to forecast
dynamic volatility. Addressing residual non-normality and heteroskedasticity would
enhance robustness
48
❖ ARIMA Model
• MSE (Mean Squared Error): Very low at 9.35648204162056e-08, suggesting that the
model's predictions are close to the actual values in terms of variance.
• RMSE (Root Mean Squared Error): Low at 0.0003058836713788521, indicating
good performance in terms of error magnitude.
• Inference: While the ARIMA model shows low MSE and RMSE, the presence of nan
for MAE and MAPE indicates potential data quality issues that might hinder its overall
reliability.
❖ LSTM Model
• MAE: 0.00021842650413512744, which is low and indicates that the average
prediction error is small.
• MSE: 6.671413513154018e-08, also very low, suggesting that the model predicts
values that are very close to the actual values.
• RMSE: 0.0002582907956771595, indicating that the model performs well with small
errors in prediction.
• MAPE: 0.021835715043512845, or approximately 2.18%, showing that the model's
predictions are accurate within a small percentage of the actual values.
• Inference: The LSTM model performs well across all metrics, indicating it effectively
captures patterns in the data, making it suitable for this task.
Conclusion
• The LSTM model is the most effective based on its metrics, showing low error rates
and high predictive accuracy.
• The ARIMA model has low error metrics.
• The Logistic Regression model shows moderate performance, with relatively high
percentage errors compared to LSTM, indicating it might not be the best fit for this
data.
49
3.6) SOLANA
Stationarity test:
ADF TEST:
H0: The time series possesses a unit root and is non-stationary.
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
H0: The time series is stationary around a deterministic trend (trend-stationary).
H1: The time series is non-stationary (has a unit root).
50
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 1.830 e-08 0.05 -6.417035 -3.434 -2.863 Reject H0
KPSS 1.000 e-01 0.05 0.110194 0.739 0.463 0.347 Accept H0
The series is likely to be stationary.
Model 1: LSTM
Inference from LSTM Model Training and Predictions:
❖ Training Epochs and Loss:
• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased,
indicating that the model was learning and improving over time. The loss values start
at 0.0081 and gradually decrease to as low as 0.0015.
o Improving Loss: The loss reduction shows that the model is fitting the data
better as training progresses.
o Small Loss Values: The low loss values suggest that the model is performing
well during training, as it is minimizing the error effectively.
• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.
51
• Trend Alignment: The predicted prices closely follow the actual price trends,
especially in areas where the price fluctuates. This shows that the LSTM model has
effectively captured the major trends and seasonal patterns in the time series data.
• Smooth Fit: The predicted prices (orange dots) are generally smoother than the actual
prices (blue line). The actual prices tend to be more volatile, while the predicted prices
smooth out some of the short-term noise in the data.
• General Trend:
o The predicted prices start from 151.22 on 2024-09-28 and show a very gradual
increase up to 154.59 by 2024-10-06, and then remain fairly flat thereafter.
o After 2024-10-06, there is a slight drop, and prices fluctuate around 152 to 154
until 2024-11-06.
• Prediction Characteristics:
o The predictions do not show significant price volatility or large fluctuations over
the 40-day forecast. This indicates that the model predicts stable price
movement with only minor upward or downward changes.
o Prices are converging to a narrow range toward the end of the 40 days (from
151.55 to 154.59), suggesting a stable market outlook according to the LSTM
model.
❖ Key Observations:
• Good Fit for Smooth Trends: The LSTM model has successfully learned the general
trends in price movements, as indicated by low training loss and RMSE. It is predicting
fairly stable price movements over the forecast period.
52
• Limited Volatility Capture: The model's predictions indicate low price volatility,
which may not fully reflect more chaotic or rapidly changing market conditions.
Depending on the actual market behavior, this could be a limitation in real-world
applicability.
❖ Model Overview:
o The ARIMA model used here is ARIMA(5, 1, 0), which indicates:
▪ AR(5): The model uses five lagged values of the time series for the
autoregressive component.
▪ I(1): The data has been differenced once to make the time series
stationary.
▪ MA(0): There is no moving average component in this model.
❖ Key Parameters:
o AR coefficients:
▪ The autoregressive coefficients (ar.L1 to ar.L5) show how the past
values influence the current value. Specifically:
▪ ar.L1 (-0.0081): The first lag has a very small and non-
significant impact on the current price.
53
▪ ar.L2 (0.0297), ar.L3 (0.0425), ar.L4 (0.0658): These
coefficients are positive and statistically significant (p-values <
0.05), indicating that the second, third, and fourth lag values
have a positive influence on the current price.
▪ ar.L5 (-0.1019): This coefficient is negative and statistically
significant, indicating that the fifth lag has a significant negative
influence on the current price.
o sigma^2 (22.35): This represents the estimated variance of the error term. The
low value indicates that the model has a relatively low level of noise.
❖ Residual Diagnostics:
o Ljung-Box (L1) (Q) = 0.01, Prob(Q) = 0.94: The high p-value indicates that
there is no significant autocorrelation in the residuals, which suggests that the
model has adequately captured the patterns in the time series data.
54
o Jarque-Bera (JB = 2531.69, Prob(JB) = 0.00): The Jarque-Bera test indicates
that the residuals are not normally distributed (p-value < 0.05). This suggests
that there may still be some skewness or kurtosis present in the data that the
model has not captured.
o Heteroskedasticity (H = 0.93, Prob(H) = 0.42): The p-value for this test
indicates that there is no significant heteroskedasticity (variance changes over
time) in the residuals.
❖ Forecasted Prices:
o The forecasted prices for the upcoming days are relatively stable, around
142.87, showing minimal fluctuations over the forecast horizon. This suggests
that the model expects the price to stabilize and hover around this level in the
future.
o This lack of significant change in forecasted prices may be an indication that
the ARIMA model is primarily capturing the mean-reverting behavior of the
time series rather than anticipating large swings or trends.
❖ Model Performance:
o Mean Squared Error (MSE = 95.31): This value quantifies the average
squared difference between the actual and predicted values. A lower MSE
indicates better model performance, but its value alone doesn’t offer much
insight without comparison to alternative models.
o Based on the MSE, we can infer that the ARIMA model provides a reasonably
good fit for the data but may not capture short-term price volatility as effectively
as other models (e.g., LSTM).
55
Key Takeaways:
• Trend Detection: The ARIMA model seems to predict a stable price trend in the
forecasted period, which may indicate that the time series is reverting to its mean after
significant fluctuations.
• Volatility Capture: While the ARIMA model does a good job of predicting general
trends, it appears to miss some short-term fluctuations in the data, as shown by the high
Jarque-Bera test statistic.
• Further Improvements: Depending on the specific use case, further tuning (such as
increasing the AR terms or experimenting with seasonal components) or trying
alternative models (such as SARIMA or LSTM) might enhance predictive performance,
especially in the presence of volatility or trends.
In summary, the ARIMA(5,1,0) model provides a good overall fit to the data, but its ability to
predict short-term fluctuations and account for residual skewness and kurtosis may be limited.
❖ ARIMA Model:
Inference:
• The ARIMA model has a reasonable MSE and RMSE values, indicating that the
predictions are somewhat accurate.
• However, the absence of meaningful MAE and MAPE could point to issues in certain
areas, such as missing or infinite values, which suggest that this model may not have
been robust enough to handle all aspects of the dataset.
❖ LSTM Model:
Inference:
• The LSTM model outperforms the ARIMA model in terms of MSE and RMSE,
indicating that it predicts more accurately.
• The MAE is lower than the ARIMA model, implying that on average, the LSTM
model's predictions are closer to the actual values.
56
• The MAPE of 4.98% indicates that the LSTM model makes more accurate percentage-
based predictions, making it a more reliable model overall for this dataset.
Conclusion:
• LSTM performs the best across all metrics, making it the most accurate model for
predicting this time series.
• ARIMA performs reasonably well, though it is less accurate than LSTM.
Thus, for future time series forecasting tasks, LSTM is the best choice based on these metrics.
3.7) USDC
Model 1: LSTM Model
Model Performance (Epochs and Loss):
❖ Loss Values Across Epochs:
o The training began with an initial loss of 0.0477 and dropped significantly by
the second epoch to 0.0047. By the fifth epoch, the loss further reduced to
0.0033 and continued to hover around this value for the remaining epochs.
o The loss values indicate that the model quickly improved and learned
meaningful patterns in the early epochs but did not see a significant drop after
the fifth epoch, suggesting diminishing returns on further training.
❖ Model Evaluation - RMSE:
o The Root Mean Squared Error (RMSE) achieved is 0.01126, which is
relatively low and implies the model is doing a decent job at predicting future
prices. The low RMSE reflects the accuracy of predictions compared to actual
price values, though there is still room for improvement, particularly in volatile
areas.
57
Graphical Insights (Actual vs Predicted Prices):
o Smoothness of Predictions:
▪ The orange dots (predicted prices) exhibit a smoother trajectory
compared to the actual price line, which is more volatile. This reflects a
characteristic of LSTM models where short-term noise in the data is
smoothed out, capturing broader trends but potentially missing sudden
spikes or drops.
❖ Trend Alignment:
o The predicted prices closely follow the actual prices, particularly in the general
peaks and troughs. However, there are moments where the predicted values
slightly deviate from the actual values, especially around sharp changes. This is
a common limitation of LSTM models, which may struggle with sudden,
dramatic shifts in time-series data.
58
❖ Predicted Prices for the Next 40 Days:
• Gradual Increase:
o The LSTM model forecasts a slight, consistent increase in prices over the next
40 days. Starting from 1.0023 on 2024-09-28, the predictions gradually rise to
1.0062 by 2024-11-06.
o Low Volatility:
▪ The predicted prices do not show significant fluctuations, with daily
increments ranging between 0.0001 to 0.0003. This suggests that the
model expects minimal volatility and stable price movement, which
could be a limitation when applied to more volatile markets.
❖ Model Observations:
• Trend Learning:
o The LSTM model has effectively learned the major price movements but does
not fully capture the high-frequency volatility of the market. The smooth nature
of the predictions suggests that the model is more attuned to long-term trends
than short-term price fluctuations.
• Prediction Stability:
o Over the 40-day forecast, the predicted prices stabilize around a narrow range
of 1.0023 to 1.0062, suggesting a steady market outlook. However, this might
not be reflective of real-world scenarios, especially in volatile markets, where
price movements can be more erratic.
❖ Conclusion and Recommendations:
• Good Trend Prediction:
o The LSTM model is well-suited for capturing broad trends in the price data, as
demonstrated by the close alignment between actual and predicted values and
59
the low RMSE. However, its tendency to smooth out fluctuations means it may
underperform in highly volatile markets.
• Improvement Suggestions:
o Adding more features (e.g., trading volume, external financial data) and fine-
tuning the model’s architecture (such as adjusting the number of LSTM layers
or incorporating dropout layers) could help the model capture more complex
price dynamics and volatility.
❖ Model Overview:
• The model used is ARIMA(5, 1, 0), meaning:
o AR(5): The model incorporates five lagged values of the time series for the
autoregressive component.
o I(1): The data has been differenced once to make it stationary.
o MA(0): There is no moving average component in this model.
❖ Key Parameters:
• Autoregressive (AR) Coefficients:
o ar.L1 (-0.6069): The first lag has a large negative and statistically significant
impact on the current price.
60
o ar.L2 (-0.3491): The second lag also has a significant negative impact, though
smaller than the first.
o ar.L3 (-0.3237), ar.L4 (-0.2517), ar.L5 (-0.1681): These lags further contribute
negatively, but their impact decreases with each lag. All of these coefficients are
statistically significant (p-values < 0.05), indicating their relevance in predicting
the current price.
• Sigma² (3.936e-05): This value represents the estimated variance of the error term. The
low value suggests that the model has a small level of noise, indicating strong predictive
performance.
❖ Model Fit Metrics:
• AIC (-15193.797): The Akaike Information Criterion is a measure of model quality,
and the very low value here suggests that the model fits the data well. A lower AIC
indicates better model performance.
• BIC (-15159.950): The Bayesian Information Criterion is similar to AIC but penalizes
for model complexity. The low BIC value confirms a good model fit, but comparisons
with alternative models are necessary to contextualize this performance.
• Log Likelihood (7602.898): The high log-likelihood indicates that the model fits the
data well, as values closer to zero are preferable.
❖ Residual Diagnostics:
• Ljung-Box (L1) (Q = 0.37), Prob(Q) = 0.54: The high p-value shows that there is no
significant autocorrelation in the residuals, meaning the model has captured the
underlying patterns in the data effectively.
• Jarque-Bera (JB = 10,136,496.38), Prob(JB) = 0.00: The very low p-value and high
JB statistic suggest that the residuals are not normally distributed, indicating the
61
presence of significant skewness or kurtosis in the data that the model may not be fully
capturing.
• Heteroskedasticity (H = 0.02), Prob(H) = 0.00: The very low p-value for the
heteroskedasticity test indicates that there is significant heteroskedasticity in the
residuals, meaning the variance of the residuals changes over time.
❖ Forecasted Prices:
• The forecasted prices from 2024-09-28 to 2024-11-06 show very little fluctuation,
ranging from 0.999843 to 0.999827, suggesting that the ARIMA model predicts a stable
trend in the price over this period.
• Stability in Forecasts: The lack of significant change in forecasted prices indicates
that the model is capturing a mean-reverting behavior in the time series, with minimal
expectation of large swings or trends.
❖ Model Performance:
• Mean Squared Error (MSE = 5.54e-08): The extremely low MSE suggests that the
model is performing well, providing accurate predictions with minimal deviation from
actual values. However, this number alone is not sufficient to judge performance
without comparing it to alternative models.
❖ Key Takeaways:
• Trend Detection: The ARIMA model is predicting stable prices in the forecasted
period, suggesting that the time series is reverting to its mean after past fluctuations.
• Volatility Capture: While the ARIMA model captures general trends well, the
significant Jarque-Bera test statistic and heteroskedasticity suggest it struggles to
capture short-term volatility or distributional anomalies.
62
• Further Improvements: The model could benefit from further tuning, such as
incorporating a moving average component (ARIMA(5,1,1) or SARIMA), or exploring
alternative models like LSTM for better volatility and short-term trend capture.
In summary, the ARIMA(5,1,0) model offers strong overall performance with good trend
detection, but it may underperform in capturing price volatility and addressing the non-normal
distribution of residuals.
ARIMA Model:
• Inference:
o The ARIMA model performs exceptionally well with very low errors across all
metrics.
o The MSE (5.54108) and RMSE (0.0002) values indicate highly accurate
predictions that are very close to the actual values.
o A MAE of 0.00017 demonstrates the precision of ARIMA in predicting values.
o The MAPE value of 0.0172% shows minimal percentage error, indicating
reliable forecasting ability for future values in the time series.
o This strong performance suggests that ARIMA effectively captures long-term
trends and reduces overall prediction errors, making it a highly suitable model
for this dataset.
LSTM Model:
• Inference:
o While the LSTM model performs well, it does not surpass the accuracy of the
ARIMA model.
o The MSE (5.50078) and RMSE (0.00074) values are slightly higher compared
to ARIMA, indicating larger prediction errors.
o A MAE of 0.00070 reveals that LSTM is not as precise on average as ARIMA.
o The MAPE value of 0.07013% indicates a higher percentage error compared to
ARIMA, though LSTM still manages to capture patterns effectively.
63
o LSTM’s strength lies in its ability to model complex, non-linear patterns, but in
this dataset, ARIMA outperforms it in terms of accuracy.
Conclusion:
• ARIMA is the most suitable model for this dataset, achieving superior performance
with near-zero errors and minimal prediction deviation.
• LSTM, while capable of handling more complex patterns, does not match ARIMA's
precision, especially in terms of percentage-based error (MAPE).
3.8) XRP
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
64
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 0.005402 0.05 -3.619349 -3.43221 -2.862364 Accept H0
KPSS 0.010000 0.05 3.310496 0.739 0.574 0.347 Reject H0
The series is likely to be non-stationary.
Model 1: LSTM
• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased over
the course of training, indicating that the model was learning and improving its
predictions. The loss values start at 0.0029 and gradually decrease to 0.00068 by the
later epochs.
o Improving Loss: The consistent reduction in loss values shows that the model
is effectively fitting the data as the training progresses, capturing patterns in the
dataset.
65
o Small Loss Values: The low loss values (reaching as low as 0.00068) suggest
that the model is performing well and minimizing prediction error effectively
during the training process.
• Potential Overfitting: The loss starts to plateau toward the later epochs, indicating that
the model has captured most of the trends. Further training beyond 50 epochs may not
significantly improve the model's performance, signaling potential overfitting if
training continues without improvements.
• Trend Alignment: The predicted prices in the first image (orange dots) closely follow
the actual price trends (blue line), especially during periods of price fluctuation. This
suggests that the model has effectively captured major trends and patterns in the time
series data.
66
• Smooth Fit: The predicted prices are smoother compared to the actual prices, which
exhibit more short-term volatility. The LSTM model smooths out some of this noise,
focusing on longer-term trends.
❖ Predicted Prices for the Next 40 Days:
• General Trend:
o The second image shows the forecasted prices over the next 40 days, starting at
0.5683 on 2024-09-28 and gradually decreasing to 0.3449 by 2024-11-06.
o There is a consistent, gradual downward trend throughout the forecast period,
with no significant volatility or large fluctuations.
❖ Prediction Characteristics:
o The predictions do not show significant price volatility or large fluctuations over the
40-day forecast period. This indicates that the model predicts stable price movement
with a consistent downward trend.
o Starting from 0.5683 on 2024-09-28, the prices steadily decrease to 0.3449 by 2024-
11-06. The predicted prices show only minor daily decreases, with no sharp spikes or
rapid changes.
o This smooth decline suggests that the model anticipates low volatility in the market,
with prices gradually falling without major upward or downward fluctuations. The
overall trend is a consistent decrease rather than erratic movement.
67
❖ Key Observations:
o Good Fit for General Trends: The LSTM model has effectively captured the general
trends in the data, with a smooth fit between predicted and actual prices, as indicated
by the low loss and RMSE values.
o Limited Volatility Capture: The model predicts a stable decline in prices over the
forecasted 40 days, with limited price volatility. This may not fully capture chaotic or
rapidly fluctuating market conditions, which could be a limitation in real-world
applications depending on the context.
❖ Model Overview:
❖ The ARIMA Model: The model used here is ARIMA(5, 1, 0), which means:
o AR (5): Five autoregressive terms are included, meaning the model uses five
lagged values of the time series.
68
❖ Key Parameters:
• AR Coefficients:
o ar.L1 (-0.0111): The first lag has a small, non-significant impact on the current
price (p-value > 0.05).
o ar.L2 (-0.0003): The second lag also has a non-significant influence (p-value >
0.05).
o ar.L3 (0.0417), ar.L4 (0.0362), ar.L5 (0.0580): These lags have positive and
statistically significant effects (p-values < 0.05), indicating that the third, fourth,
and fifth lagged values positively influence the current price.
• Error Variance (sigma²): The estimated error variance is 0.00150.00150.0015,
indicating a relatively low noise level in the model.
69
o Ljung-Box (L1) (Q = 0.02, Prob(Q) = 0.89): The high p-value indicates no
significant autocorrelation in the residuals, suggesting the model has adequately
captured the time series patterns.
o Jarque-Bera (JB = 745352.21, Prob (JB) = 0.00): The p-value suggests that
the residuals are not normally distributed, with potential skewness and kurtosis
not captured by the model.
o Heteroskedasticity (H = 0.46, Prob(H) = 0.50): The high p-value implies no
significant.
❖ Forecasted Prices:
o The forecasted prices are relatively stable, hovering around 0.562228 after an
initial drop. This suggests that the model anticipates the price to stabilize at this
level with minimal fluctuations through the forecast horizon.
o The lack of significant change in the forecasted prices indicates that the ARIMA
model is likely capturing the mean-reverting behavior of the time series rather
than predicting any large fluctuations or trends.
70
Model Performance:
Key Takeaways:
• Trend Detection:
o The ARIMA model appears to predict a steady price level in the forecasted
period, likely indicating that the time series is reverting to its mean after an
initial adjustment.
• Volatility Capture:
o While the ARIMA model is effective at predicting the general trend, it seems to
miss out on short-term fluctuations in the data, as shown by the lack of
variability in the forecast after October 6, 2024.
• Further Improvements:
o Depending on the use case, additional tuning (e.g., increasing AR terms or
experimenting with seasonal components) or using alternative models like
SARIMA or LSTM might enhance predictive performance, especially in cases
of price volatility or trends.
In Summary: The ARIMA (5,1,0) model provides a reasonable overall fit to the data,
successfully predicting a stable trend with low MSE. However, its ability to capture short-term
fluctuations may be limited. Further tuning or the adoption of more advanced models could
improve performance, particularly in capturing volatility and short-term variations.
71
Models Performance Comparison:
ARIMA Model:
Inference:
• The ARIMA model shows a very low Mean Squared Error (MSE = 0.00069) and Root
Mean Squared Error (RMSE = 0.02631), indicating that it provides accurate predictions
in terms of squared error metrics.
• However, the absence of meaningful Mean Absolute Error (MAE) and Mean Absolute
Percentage Error (MAPE) values suggests that there may be limitations or issues with
this model when handling the full range of data, possibly due to missing values or the
nature of the ARIMA model's predictions on this specific dataset.
LSTM Model:
Inference:
• The LSTM model demonstrates better performance compared to ARIMA, with a lower
MAE (0.0228) and a comparable MSE (0.00074), indicating a higher level of predictive
accuracy.
• The MAPE of 3.9390% shows that the LSTM model's percentage-based errors are
small, making it a more reliable choice for capturing the time series patterns in this
dataset.
Conclusion:
• LSTM is the most accurate model across all metrics, making it the best option for
predicting this dataset.
• ARIMA provides reasonable results, especially with low MSE and RMSE, though its
missing MAE and MAPE values highlight some potential limitations.
72
Based on these metrics, LSTM would be the preferred choice for future time series forecasting
tasks on this dataset, due to its superior accuracy and lower percentage-based error.
3.9) DOGECOIN
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
73
Differencing
Model 1: LSTM
• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased,
indicating that the model was learning and improving over time. The loss values start
at 0.0036 and gradually decrease to as low as 0.0010.
o Improving Loss: The loss reduction shows that the model is fitting the data
better as training progresses.
o Small Loss Values: The low loss values suggest that the model is performing
well during training, as it is minimizing the error effectively.
• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.
74
❖ Model Evaluation - RMSE:
• RMSE: 0.0144: The Root Mean Squared Error (RMSE) of 0.0144 is relatively low,
indicating that the model has good prediction accuracy on the log returns. RMSE is a
good measure of how well the model fits the data, and lower values represent better
fits.
• Trend Alignment: The predicted prices closely follow the actual price trends,
especially in areas where the price fluctuates. This shows that the LSTM model has
effectively captured the major trends and seasonal patterns in the time series data.
• Smooth Fit: The predicted prices (orange dots) are generally smoother than the actual
prices (blue line). The actual prices tend to be more volatile, while the predicted prices
smooth out some of the short-term noise in the data.
75
• General Trend:
o The predicted prices start from 0.109 on 2024-10-01 and show a gradual
decrease to 0.079 by 2024-11-06.
• Prediction Characteristics:
o The predictions do not show significant price volatility or large fluctuations over
the 40-day forecast. This indicates that the model predicts stable price
movement with only minor upward or downward changes.
o Prices are converging to a narrow range toward the end of the 40 days (from
0.109 to 0.079), suggesting a stable market outlook according to the LSTM
model.
❖ Key Observations:
• Good Fit for Smooth Trends: The LSTM model has successfully learned the general
trends in price movements, as indicated by low training loss and RMSE. It is predicting
fairly stable price movements over the forecast period.
• Limited Volatility Capture: The model's predictions indicate low price volatility,
which may not fully reflect more chaotic or rapidly changing market conditions.
Depending on the actual market behavior, this could be a limitation in real-world
applicability.
76
❖ Model Overview:
▪ AR(5): The model uses five lagged values of the time series for the
autoregressive component.
▪ I(1): The data has been differenced once to make the time series
stationary.
❖ Key Parameters:
o AR coefficients:
o sigma^2 (0.0001): This represents the estimated variance of the error term
(white noise). The very low value suggests that the model has low noise,
meaning the model residuals are small, implying good fit.
77
❖ Model Fit Metrics:
❖ Residual Diagnostics:
Ljung-Box (Q = 1.39, p = 0.24): The p-value greater than 0.05 indicates that
the residuals are uncorrelated, implying the model is adequately capturing the
autocorrelations in the data.
78
Heteroskedasticity (213.45, p = 0.00): A significant p-value suggests the
presence of heteroskedasticity in the residuals, meaning the variance of the
residual’s changes over time.
❖ Forecasted Prices:
• Price Stability:
o The forecasted prices show some initial fluctuations in the first few days of
October 2024, with a spike and a dip before stabilizing around the 0.1014 mark.
The model predicts a consistent trend with minimal changes beyond mid-
October, suggesting that the price will stabilize and remain steady for the
remainder of the forecast period.
o This behavior indicates that the ARIMA model is capturing a mean-reverting
process after the initial volatility, projecting stability in future prices.
Key Takeaways:
• Trend Detection:
o The ARIMA (5,1,0) model effectively predicts a stable price trend after early
October. This suggests that the time series has a mean-reverting tendency, and
the model has captured this long-term stability well.
• Volatility Capture:
o While the model's performance is strong in terms of accuracy (as indicated by
the low MSE), it might be missing out on capturing the initial short-term
79
volatility. The forecast shows that after early fluctuations, the model expects
little to no volatility moving forward, which might not fully represent real-world
scenarios with continuous fluctuations.
• Further Improvements:
o If capturing short-term price swings is important, further tuning (e.g., adding
more autoregressive terms or experimenting with alternative models like
SARIMA or even LSTM) could improve the model's ability to handle volatility.
o Alternative models that are more sensitive to short-term changes could offer
better predictions of fluctuations while maintaining long-term stability.
❖ ARIMA Model
Inference:
• The MAE of 0.0056 is slightly higher than LSTM, implying that the ARIMA model’s
predictions are marginally less close to the actual values.
• The MSE of 4.8661 and RMSE of 0.0069 indicate that ARIMA's predictions are less
accurate than LSTM in this particular case.
• With a MAPE of 5.2429%, ARIMA performs slightly worse than LSTM in percentage-
based predictions, suggesting it may not handle this dataset as well.
❖ LSTM Model:
Inference:
• The LSTM model demonstrates a solid performance with a MAE of 0.0048 and a
reasonable MSE of 4.3437, which indicates relatively accurate predictions.
• The RMSE of 0.0065 suggests that the LSTM model provides a fairly close
approximation of the actual values.
80
• The MAPE of 4.4875% signifies that the model is reliable for percentage-based
predictions.
• Overall, the LSTM model shows good accuracy and handles this dataset effectively
Conclusion:
• LSTM performs the best across all metrics, making it the most accurate model for
predicting this time series.
Thus, for future time series forecasting tasks, LSTM is the best choice based on these metrics.
3.10) TONCOIN
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
81
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 0.891952 0.05 -0.500329 -3.44082 -2.866160 - Accept H0
KPSS 0.010000 0.05 2.701556 0.739 0.463 0.347 Reject H0
The series is likely to be non-stationary
Differencing:
Model 1: LSTM
82
the model effectively learned the patterns in the data, minimizing prediction errors over
time.
➢ Small Loss Values: The final loss values indicate the model's strong performance,
particularly for time-series forecasting tasks like stock price prediction. The drop in loss
from 0.0165 to 0.0021 suggests that the model is capable of capturing the underlying
trend.
➢ Potential Overfitting: Despite the improvement in loss, it’s important to check for
overfitting, as the model might have reached its limit in terms of additional gains in
accuracy beyond the 50th epoch.
❖ Model Evaluation - RMSE:
➢ RMSE: 0.0612: The Root Mean Squared Error (RMSE) of 0.0612 shows that the model
has decent accuracy in capturing price movements. RMSE values closer to zero indicate
better fits, and here the RMSE suggests the LSTM model is able to predict stock prices
reasonably well.
❖ Accuracy and Trend Capturing:
➢ Trend Alignment: The model successfully captured the price trends and patterns seen
in the historical data, as demonstrated by how closely the predicted prices (orange line)
follow the actual prices (blue line) in the chart. However, some noise or volatility in
actual prices is smoothed out in the predictions, leading to smoother curves in the
forecasted prices.
83
➢ Smooth Fit: The smoother nature of the predicted prices indicates that the LSTM
model was effective in filtering out short-term volatility while capturing broader market
trends, which could be beneficial for long-term investors.
➢ General Trend: Starting from 5.73 on September 28, 2024, the predicted prices show
a steady upward trend, reaching approximately 6.02 by November 6, 2024. The growth
is gradual, indicating stable price movement.
➢ Prediction Characteristics: The forecast does not show any significant price volatility
or dramatic changes, suggesting that the model predicts a relatively stable market for
the next 40 days, with prices fluctuating within a narrow range.
❖ Key Observations:
➢ Good Fit for Smooth Trends: The LSTM model is highly effective at learning long-
term price trends while maintaining low prediction errors, as seen by the low loss and
RMSE values. This is useful for generating forecasts where long-term stability is
expected.
➢ Limited Volatility Capture: Although the model does well in predicting the overall
trend, it appears less capable of capturing short-term volatility. This could pose a
limitation in more volatile or speculative market conditions where rapid price
movements are common.
❖ In summary, the LSTM model provides a reliable forecast for the next 40 days, predicting
steady price growth with limited volatility. While the model is well-suited for capturing
84
broad trends, its prediction smoothness may overlook short-term price fluctuations, which
could be significant in highly dynamic markets.
❖ Model Overview:
❖ Key Parameters:
• AR Coefficients:
o ar. L1 (-0.0214): The first lag is small and statistically insignificant, indicating
it has little impact on the current price.
85
o ar. L2 (-0.0240): Similarly, the second lag is insignificant with a minimal
impact.
o ar. L3 (-0.0294): The third lag also lacks significance, suggesting little
influence on price changes.
o ar. L4 (-0.0608): Statistically significant (p = 0.011), meaning the fourth lag
negatively impacts the price.
o ar. L5 (-0.0523): Also, significant (p = 0.025), indicating a strong negative
influence from the fifth lag.
• Sigma² (0.0267): Represents the variance of the error term. A relatively small value
suggests low noise in the model’s predictions.
• AIC (-464.183): The Akaike Information Criterion is quite low, indicating a better
model fit relative to others.
• BIC (-437.732): The Bayesian Information Criterion is also low but slightly higher than
AIC, indicating that the model avoids overfitting.
• Log Likelihood (238.091): A positive value suggests the model fits the data relatively
well.
❖ Residual Diagnostics:
86
• Jarque-Bera (JB = 1586.37, Prob = 0.00): The residuals are not normally distributed,
which may indicate skewness or kurtosis in the data that has not been fully captured by
the model.
• Heteroskedasticity (H = 19.71, Prob = 0.00): This test indicates there is
heteroskedasticity, suggesting that the variance changes over time.
❖ Forecasted Prices:
• The forecasted prices show minimal fluctuations and suggest a stable price around 6.84
over the forecast horizon. This indicates that the model expects the price to stabilize
without significant volatility.
❖ Model Performance:
• Mean Squared Error (MSE = 1.894): The model provides a decent fit with a relatively
low MSE. However, this model may miss capturing some short-term volatility and
fluctuations due to its simplistic ARIMA structure.
87
7. Key Takeaways:
• Trend Detection: The ARIMA (5,1,0) model forecasts a stable price trend in the
coming days, which may suggest a return to the mean after fluctuations.
• Volatility Capture: Although the model predicts overall trends well, it misses short-
term volatility, as seen from the non-normal residuals (Jarque-Bera test).
• Further Model Improvements: Further tuning (e.g., adding seasonal components or
experimenting with other models like SARIMA or machine learning models like
LSTM) may enhance predictive accuracy, especially in volatile markets like
cryptocurrencies.
Conclusion:
The ARIMA (5,1,0) model provides a reasonably good fit for predicting general trends in
cryptocurrency prices, but may not capture rapid short-term fluctuations due to its simplicity.
❖ ARIMA Model:
• Mean Absolute Error (MAE: 1.27): The ARIMA model has a relatively high MAE,
indicating that its predictions deviate more from the actual values on average.
• Mean Squared Error (MSE: 1.89): A higher MSE value suggests that the ARIMA
model produces predictions with significant error.
• Root Mean Squared Error (RMSE: 1.38): The RMSE also reflects a considerable
difference between predicted and actual values, implying that the ARIMA model's
predictions are not highly accurate.
88
• Mean Absolute Percentage Error (MAPE: 23.93%): With a MAPE close to 24%,
the ARIMA model's predictions are off by a considerable percentage on average, which
could limit its reliability in this case.
• Conclusion: The ARIMA model shows moderate prediction performance. It is
somewhat accurate but produces considerable error across all metrics, meaning it might
not be suitable for highly accurate short-term predictions in this dataset.
❖ LSTM Model:
• Mean Absolute Error (MAE: 4.40): The LSTM model has a much higher MAE
compared to ARIMA, indicating that the predictions deviate more significantly from the
actual values.
• Mean Squared Error (MSE: 19.60): A very high MSE reveals that the LSTM model
is producing large errors in its predictions, with much worse accuracy compared to
ARIMA.
• Root Mean Squared Error (RMSE: 4.43): Similarly, the high RMSE confirms that the
LSTM model has substantial variance between predicted and actual values.
• Mean Absolute Percentage Error (MAPE: 78.70%): The MAPE indicates that the
LSTM model’s percentage-based error is around 79%, which is extremely high, showing
poor predictive performance.
• Conclusion: The LSTM model significantly underperforms in this case, with large
errors across all metrics. Despite its potential in other time series tasks, it is not a good
fit for this dataset or may require significant adjustments in tuning to improve
performance.
Overall Conclusion:
• LSTM Model: LSTM performs the worst across all metrics, with large prediction
errors and high percentage-based inaccuracies, making it unsuitable for this dataset.
• Recommendation: ARIMA can be chosen for this specific task, as both models
perform similarly and offer reasonable predictions. LSTM, on the other hand, would
need significant improvements or reconfiguration to provide useful results.
89
3.10) TRON
Stationarity test:
ADF TEST:
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
90
Differencing
Model 1: LSTM
Epochs 1-50: The model trained for 50 epochs, with a consistent reduction in loss,
showing learning and improvements over time.
➢ Initial Loss: It started with a loss of 0.0083 in the first epoch, dropping to 0.0016 by
the 10th epoch and further reducing to 0.0013 by the end of training.
➢ Small Loss Values: These values indicate that the model has been effective in
minimizing error, suggesting a good fit.
➢ Potential Overfitting: Although the loss continued to decrease, the plateau around
later epochs (around 50) might suggest that further training would not significantly
improve performance.
❖ Model Evaluation - RMSE:
RMSE: The model’s RMSE for the prediction was 0.03705, which is low. This low value
suggests that the model is relatively accurate in predicting stock price movements based
on past trends. RMSE is a strong indicator of the model’s fit, and lower values point to a
better fit.
91
❖ Accuracy and Trend Capturing:
• Trend Alignment: The predicted prices align closely with actual trends, particularly
in periods with significant price fluctuations, demonstrating that the LSTM model
successfully captured major patterns.
• Smoothing of Volatility: The predicted prices tend to smooth out short-term
volatility, showing a less erratic trend compared to the actual data. This suggests that
while the model captures long-term trends, it may not be fully sensitive to rapid
market changes.
❖ Predicted Prices for the Next 40 Days:
❖ General Trend:
➢ Starting at 0.1404 on 2024-09-28, the predicted prices exhibit a steady decline over
the first two weeks, reaching around 0.084 by 2024-11-06.
92
➢ This indicates a downtrend over the next 40 days, with no significant upward
movement or price recovery expected.
❖ Prediction Characteristics:
➢ The model does not predict substantial volatility. Instead, it shows a continuous,
gradual price decline.
➢ Toward the end of the forecast, the predicted prices converge to a narrow range
between 0.0798 and 0.0855, reflecting a stable and less volatile outlook for the
market.
❖ Key Observations:
Good Fit for Smooth Trends: The model has effectively captured the overall trend and
patterns of the time series. The low RMSE and loss values reinforce its ability to predict price
movements with relatively high accuracy.
Limited Volatility Capture: The predictions show reduced volatility, which could be a
limitation, especially if the actual market experiences more chaotic and sudden price shifts.
This suggests that the model is better suited for smoother market conditions and might need
adjustments for highly volatile markets.
Summary:
The LSTM model demonstrates solid performance in capturing general trends, but it exhibits
limitations in responding to short-term volatility. Over the next 40 days, the predicted decline
in prices points to a bearish outlook, with stable and narrow price movements toward the end
of the forecast period.
93
MODEL 2: ARIMA
❖ Model Overview:
❖ Key Parameters:
• AR Coefficients:
o ar. L1 (-0.0472): The first lag has a statistically significant negative influence
(p-value = 0.000) on the current price.
o ar. L2 (0.1324): The second lag has a positive and highly significant impact
on the price.
o ar. L3 (0.0872): Similarly, the third lag is also positive and significant.
o ar. L4 (-0.1278): The fourth lag has a strong negative effect, with a high
significance level (p-value = 0.000).
o ar. L5 (-0.0343): The fifth lag is negative and statistically significant (p-value
= 0.000).
94
• Sigma² (1.544e-05): This is the variance of the error term, indicating very low noise
in the model, suggesting accurate predictions.
• AIC (-20328.903): The Akaike Information Criterion is low, indicating a good fit of
the model.
• BIC (-20294.034): The Bayesian Information Criterion is slightly higher than AIC,
but still low, confirming a good fit while accounting for model complexity.
• Log Likelihood (10170.452): A high log likelihood shows that the model fits the data
well.
❖ Residual Diagnostics:
• Ljung-Box (Q = 0.01, Prob = 0.94): The high p-value indicates that there is no
significant autocorrelation in the residuals, meaning the model has effectively
captured the patterns in the time series data.
• Jarque-Bera (JB = 1048252.76, Prob = 0.00): The residuals are not normally
distributed, as indicated by the low p-value, suggesting skewness or high kurtosis.
• Heteroskedasticity (H = 0.15, Prob(H) = 0.00): There is evidence of
heteroskedasticity, meaning the variance of the residual’s changes over time, which
might affect prediction accuracy.
95
❖ Forecasted Prices:
• The forecasted prices remain relatively stable around 0.1349 over the next 40 days,
showing minimal fluctuations. This indicates the model is capturing a consistent
trend, without large deviations or volatility.
• Price Stabilization: The ARIMA model suggests that the price will stabilize around
this level, without anticipating any significant spikes or drops.
❖ Model Performance:
• Mean Squared Error (MSE = 0.0003668): The MSE is very low, indicating the
model provides a good fit. The low error suggests accurate predictions with minimal
variance from actual values.
❖ Key Takeaways:
• Trend Prediction: The ARIMA (5,1,0) model forecasts stable trends in the future,
indicating that the series may be mean-reverting after fluctuations.
• Volatility: While the model predicts the general trend well, it may not capture short-
term volatility, as suggested by the high Jarque-Bera statistic and the evidence of
heteroskedasticity.
• Further Improvements: The model could potentially be improved by incorporating
seasonal or volatility components (e.g., SARIMA or GARCH) or by exploring more
96
complex machine learning models (like LSTM) for better handling of market
dynamics.
Conclusion:
The ARIMA (5,1,0) model provides a good fit for the data and predicts a stable price trend
for the next 40 days. However, its ability to predict sharp short-term movements may be
limited due to residual skewness and kurtosis. Further model tuning or alternative approaches
may improve performance in more volatile market conditions.
❖ ARIMA Model:
• Mean Absolute Error (MAE: 0.0186): The ARIMA model has a relatively low
MAE, indicating that, on average, the model's predictions are quite close to the actual
values.
• Mean Squared Error (MSE: 0.0003668): The low MSE suggests that the ARIMA
model predicts with minimal error, showing it performs well for this time series data.
• Root Mean Squared Error (RMSE: 0.0192): The RMSE is also low, confirming
that the ARIMA model's predictions are accurate with minimal variance from actual
values.
• Mean Absolute Percentage Error (MAPE: 12.03%): The MAPE suggests that, on
average, the ARIMA model’s predictions are off by about 12%, which is a reasonably
accurate performance for forecasting crypto prices.
97
• Conclusion: The ARIMA model provides a good balance between accuracy and
stability. While not perfect, it is reliable for general trend predictions but could
potentially be further improved for capturing short-term fluctuations.
❖ LSTM Model:
• Mean Absolute Error (MAE: 0.1513): The LSTM model has a much higher MAE
compared to ARIMA, indicating that the average deviation from actual values is
larger.
• Mean Squared Error (MSE: 0.0229): The LSTM model has a significantly higher
MSE than ARIMA, showing that the predictions are less accurate and exhibit more
error.
• Root Mean Squared Error (RMSE: 0.1514): Similarly, the RMSE is higher for
LSTM, indicating that its predictions are further from actual values.
• Mean Absolute Percentage Error (MAPE: 98.55%): A very high MAPE indicates
that the LSTM model performs poorly in terms of percentage-based accuracy. It
suggests that on average, the LSTM model's predictions are off by nearly 99%.
• Conclusion: Despite LSTM's potential for capturing complex patterns in time series
data, it performed poorly in this case, likely due to the nature of the dataset or model
configuration.
Overall Conclusion:
98
CHAPTER 4: FINDINGS, IMPLICATIONS AND CONCLUSION
4.1) Findings:
❖ ARIMA Model:
ARIMA (AutoRegressive Integrated Moving Average) is a statistical model that works well for
linear, stationary time-series data. It assumes a certain amount of autocorrelation in the
data, which means future values can be predicted by linear combinations of past values. This
model also performs well with short-term dependencies in data.
• USDC, Bitcoin, TRON: These cryptocurrencies tend to have more stable, consistent
price movements compared to more volatile ones.
• USDC (USD Coin) is a stablecoin, which means it’s designed to maintain a stable value
over time, typically pegged to the U.S. dollar. Its price fluctuations are minimal and
predictable, making it more suitable for a linear model like ARIMA.
• Bitcoin has historically shown some degree of long-term trends and stationarity,
meaning ARIMA can identify patterns over time and make reasonable forecasts based
on past data.
• TRON has less extreme volatility compared to many altcoins, making its price patterns
more amenable to linear modelling over certain periods.
99
❖ LSTM Model:
LSTM (Long Short-Term Memory) is a type of deep learning model particularly suited for
non-linear, non-stationary time-series data. It’s powerful at capturing long-term
dependencies in the data, making it ideal for handling complex patterns, sudden shifts, and
high volatility. LSTM can model both short-term and long-term trends by learning from the
data over time.
• Solana, ETH, XRP, Dogecoin, TONCOIN, BNB, Tether: These cryptocurrencies are
more volatile and tend to exhibit complex price behaviours with non-linear trends.
• Solana, ETH (Ethereum), XRP: These are high-growth, highly volatile altcoins. Their
prices are influenced by a wide range of factors such as network updates, partnerships,
and market sentiment, which leads to non-linear and unpredictable trends. LSTM's
ability to capture long-term dependencies and non-linearity makes it more effective for
these assets.
• Dogecoin: Known for its sudden surges and social media-driven price jumps,
Dogecoin's price movements are highly non-linear and hard to predict with a simple
linear model, making LSTM a better fit.
• TONCOIN and BNB: These cryptocurrencies may exhibit sporadic price spikes due
to market activities or ecosystem changes, which LSTM can adapt to by capturing those
long-term dependencies.
• Tether (USDT): While Tether is a stablecoin like USDC, its trading volume and
liquidity patterns can have short-term fluctuations that are better captured by LSTM
due to its ability to handle more complex, non-linear trends.
• ARIMA excels with cryptocurrencies that have more predictable, linear price
movements, making it suitable for stablecoins and relatively less volatile assets.
• LSTM performs better with volatile, non-linear time series, capturing more complex
patterns and shifts in high-growth or volatile cryptocurrencies.
100
4.2) Implications
Ethereum:
The non-stationary nature of the Ethereum price series requires transformations like
differencing for accurate modelling, enabling reliable predictions with ARIMA. However,
LSTM outperforms ARIMA in price prediction by effectively capturing long-term
dependencies and complex, non-linear relationships, resulting in lower prediction errors
(RMSE = 0.0371, MAE = 0.0244) and better adaptability to market fluctuations. While ARIMA
is suitable for short-term trends, its limitations with extreme volatility make LSTM the superior
choice for forecasting in the cryptocurrency market.
Bitcoin:
The stationarity tests (ADF and KPSS) reveal that Bitcoin's price series is non-stationary,
necessitating differencing for accurate forecasting, particularly in the volatile cryptocurrency
market. The ARIMA model stands out for its simplicity, interpretability, and effectiveness in
handling non-stationary data, boasting superior performance metrics like MAE, MSE, and
RMSE, along with robust diagnostic capabilities. In comparison, the LSTM model achieves an
RMSE of 0.0356, demonstrating good predictive accuracy but underestimating sharp upward
movements and lagging during rapid price changes. While ARIMA generally outperforms
LSTM in handling Bitcoin's price dynamics, both models exhibit limitations in predicting
sudden fluctuations inherent to the cryptocurrency market.
Solana:
Analysis of Solana's time series data indicates non-stationarity, confirmed by ADF and KPSS
tests, with differencing resulting in stationarity. The LSTM model outperforms ARIMA
across all evaluation metrics (MSE, RMSE, MAE, and MAPE), effectively capturing trends
and indicating a stable price outlook, though potential overfitting is suggested by plateauing
training loss. In contrast, the ARIMA(5, 1, 0) model fits mean-reverting behavior and indicates
stable trends but struggles with short-term volatility, as shown by a high Jarque-Bera test
statistic indicating residual non-normality. While LSTM demonstrates superior predictive
power, a complementary approach that combines insights from both models could enhance
forecasting accuracy and provide a deeper understanding of market dynamics for informed
trading strategies.
101
USDC:
The analysis of USDC price trends indicates that the ARIMA model, with its low error metrics,
is effective for predicting trends in stable market conditions, making it a reliable tool for
financial analysts and traders in strategic planning and risk management. Conversely, while the
LSTM model underperformed in accuracy, its ability to model complex, non-linear patterns
offers potential for capturing short-term price fluctuations, particularly in volatile markets. This
suggests that integrating additional features into LSTM models could enhance predictive
accuracy. Overall, the findings emphasize the importance of selecting suitable forecasting
methods based on data characteristics and the need for thorough model evaluation to improve
performance in dynamic financial markets.
Tether:
The stationarity of the Tether price time series, confirmed by ADF and KPSS tests, enhances
forecasting accuracy and model selection, making reliable predictions possible with techniques
like ARIMA and LSTM. The LSTM model excels in capturing long-term dependencies and
trends, showing low error rates and making it particularly effective for informed trading
decisions in the volatile cryptocurrency market. Conversely, while the ARIMA model offers
valuable insights into overall market behavior and long-term trends, its mean-reverting nature
limits its effectiveness during high volatility and rapid price fluctuations. Thus, while LSTM
is preferred for accurate forecasting in dynamic environments, ARIMA can complement it
by providing foundational insights into the price structure.
BNB:
The analysis of BNB price prediction highlights critical implications regarding stationarity,
with both ADF and KPSS tests confirming the original time series was non-stationary.
Differencing was successfully applied to stabilize the series, enhancing the reliability of
subsequent modelling efforts. The Long Short-Term Memory (LSTM) model demonstrated
impressive predictive capabilities, achieving a Root Mean Square Error (RMSE) of 0.0122,
although it struggled to capture sharp price surges. In contrast, the ARIMA model showed less
favourable results, with high Mean Squared Error (MSE) and RMSE values, indicating limited
forecasting reliability. While ARIMA passed the Ljung-Box test for autocorrelation, the
residuals indicated non-normality, suggesting potential enhancements like integrating
Generalized Autoregressive Conditional Heteroskedasticity (GARCH) to better capture market
102
volatility. Overall, the LSTM model is prioritized for BNB price forecasting due to its
superior performance, with continuous monitoring and retraining essential to adapt to the
dynamic nature of the cryptocurrency market.
Toncoin:
The analysis of TONCOIN price prediction through stationarity testing and model evaluation
revealed crucial insights into the data characteristics and model performances. Both the ADF
and KPSS tests confirmed that the original time series was non-stationary, necessitating
differencing to stabilize the data for reliable forecasting. Once differenced, the series passed
both tests, indicating it was stationary and suitable for modelling with ARIMA and LSTM,
which depend on consistent statistical properties. The LSTM model showed potential in
learning underlying patterns, as indicated by a significant decrease in training loss from 0.0165
to 0.0021 over 50 epochs; however, a high RMSE of 4.43 suggested challenges in capturing
short-term volatility. Conversely, the ARIMA(5,1,0) model demonstrated better performance,
with lower MAE (1.27) and MSE (1.89), making it more effective for detecting stable price
trends, despite limitations in handling rapid fluctuations. Overall, ARIMA emerged as the
more reliable choice for forecasting TONCOIN prices, especially given its adaptability to the
nature of the time series data.
TRON:
The analysis of TRON price prediction indicates that stationarity tests confirm the time series
becomes stationary after differencing, with ADF and KPSS tests rejecting the null hypothesis
of non-stationarity. The ARIMA model demonstrates lower error metrics (MAE, MSE, RMSE),
effectively capturing stable trends and providing a reliable forecasting tool for relatively stable
market conditions, supported by low AIC and BIC values. In contrast, the LSTM model shows
significant limitations, with higher error metrics and poor predictive accuracy, struggling to
capture short-term volatility, which could pose risks for trading strategies. Overall, ARIMA
significantly outperforms LSTM in accuracy and reliability, making it the preferred model
for forecasting TRON prices, especially for capturing long-term trends and ensuring consistent
predictions.
103
Dogecoin:
The analysis of Dogecoin's price prediction reveals that the LSTM model outperforms the
ARIMA model in overall accuracy and predictive capability. The LSTM model achieved a
lower RMSE (0.0069) compared to ARIMA's RMSE (0.0065), indicating better handling of
complex time series patterns and capturing trends more effectively. However, ARIMA
demonstrated slightly better performance metrics in terms of MAE and MSE, suggesting its
robustness in providing reliable forecasts, especially in stable market conditions. Despite
ARIMA's strong performance, the findings suggest that the LSTM model's ability to model
non-linear relationships and capture intricate dynamics in Dogecoin's price data positions it as
the superior choice for future forecasting tasks, particularly in volatile markets.
XRP:
The stationarity tests for XRP's price series revealed non-stationarity initially, but after
differencing, both the ADF and KPSS tests confirmed stationarity. In comparing models, the
LSTM model trained for 50 epochs, showing a low RMSE of 0.0117 and capturing general
trends well, though it smoothed out short-term volatility. The ARIMA model (5,1,0) also
showed a good fit, with a lower MSE (0.00069), capturing steady price levels but missing short-
term fluctuations. Overall, the LSTM model proved more accurate, with better performance
across metrics, making it the preferred choice for predicting XRP prices.
The LSTM model's superior performance in predicting the Bitwise 10 Crypto Index (BITW)
with 85% accuracy highlights its effectiveness in capturing the nonlinear patterns of volatile
cryptocurrency markets, offering investors better decision-making tools and risk management
strategies. In contrast, the ARIMA model’s 50% accuracy reveals its limitations in this dynamic
environment, suggesting that traditional linear models may pose substantial risks for investors.
This analysis advocates for the adoption of advanced machine learning techniques like LSTM
for more reliable forecasting and encourages financial institutions to develop robust
frameworks for real-time analysis, driving innovation in cryptocurrency asset management.
104
4.3) Conclusion:
The analysis underscores the necessity of addressing non-stationarity in price series through
differencing, as evidenced in cryptocurrencies like Ethereum, Bitcoin, and TRON. This
transformation is crucial for reliable modelling and forecasting.
However, while ARIMA excels in handling stationary data, LSTM’s architecture allows it to
learn from sequences, making it better suited for capturing the underlying dynamics of non-
stationary data once transformed.
The findings advocate for a shift towards advanced machine learning techniques like LSTM
for more accurate and reliable forecasting in the cryptocurrency market. Investors and financial
institutions are encouraged to develop frameworks that leverage LSTM's capabilities,
particularly for risk management and decision-making.
ARIMA can still play a complementary role by providing foundational insights into price
structures and stable trends, especially in less volatile scenarios. Integrating both approaches
may yield enhanced forecasting accuracy and a deeper understanding of market dynamics, thus
empowering investors to make more informed trading decisions.
The analysis reveals that while traditional models like ARIMA have their merits, the
complexity and volatility of the cryptocurrency market necessitate the adoption of more
sophisticated techniques like LSTM for effective forecasting. This strategic approach can
105
significantly improve risk management and decision-making processes, ensuring stakeholders
remain competitive in an ever-evolving financial landscape.
CHAPTER 5: REFERENCES
https://www.investing.com/crypto/bitcoin/historical-data
https://www.investing.com/indices/investing.com-eth-usd
https://www.investing.com/indices/investing.com-usdt-usd
https://www.investing.com/indices/investing.com-bnb-usd
https://www.investing.com/indices/investing.com-sol-usd
https://www.investing.com/crypto/tether/usdt-usdc
https://www.investing.com/indices/investing.com-xrp-usd
https://www.investing.com/crypto/dogecoin
https://www.investing.com/crypto/toncoin/ton-usd
https://www.investing.com/crypto/tron/historical-data
https://finance.yahoo.com/quote/BITW/history/
106