0% found this document useful (0 votes)
14 views107 pages

Project

Uploaded by

Swetha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views107 pages

Project

Uploaded by

Swetha M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 107

DATA SCIENCE PROJECT:

CRYPTO PRICE PREDICTION

FINANCIAL ANALYTICS

Course code: PMBA658E

Slot: T35

Submitted to

Dr. Sunitha K

TEAM MEMBERS

SWETHA M 23MBA0035
NANDHINI M 23MBA0049
MARISH BALAJI V 23MBA0104
SUBHASHINI P 23MBA0111
APARNA S 23MBA0112
0
TABLE OF CONTENTS
CHAPTER I: INTRODUCTION ..........................................................2
1.1) Objectives ....................................................................................2
1.2) Background of the 10 Selected Cryptocurrencies in the Project 2
CHAPTER II: METHODOLOGY ........................................................4
2.1) Purpose of the Study ...................................................................4
2.2) Research Design: .........................................................................4
2.3) Sampling Design: ........................................................................5
2.4) Sampling Method: .......................................................................5
2.5) Data Collection Methods: ...........................................................6
CHAPTER III: DATA ANALYSIS AND INTERPRETATION ........6
3.1) Crypto Index Analysis (Bitwise 10 Crypto Index) .....................6
3.2) Bitcoin .......................................................................................14
3.3) Ethereum ...................................................................................24
3.4) Tether.........................................................................................33
3.5) BNB ...........................................................................................41
3.6) Solana ........................................................................................50
3.7) USDC ........................................................................................57
3.8) XRP ...........................................................................................64
3.9) Dogecoin ...................................................................................73
3.10)Toncoin .....................................................................................81
3.11) Tron: ........................................................................................90
CHAPTER IV: FINDINGS AND CONCLUSION ............................99
4.1) Findings .....................................................................................99
4.2) Implications: ........................................................................... 101
4.3) Conclusion:............................................................................. 105
CHAPTER V: REFERENCES ......................................................... 106

1
CHAPTER 1: INTRODUCTION
The project focuses on forecasting the prices of the top 10 cryptocurrencies using advanced
modeling techniques. Given the volatile nature of digital assets like Bitcoin, Ethereum, and
others, predicting their future prices presents a significant challenge. This project employs
ARIMA and LSTM models to analyze historical price data, leveraging their capabilities to
identify trends and patterns. We evaluate the accuracy of these models to determine their
predictive performance. The analysis is carried out using Python for data processing, model
development, and data visualization, while Excel is utilized for organizing data. By combining
statistical and machine learning approaches, the project aims to provide insights into market
behavior, offering a reliable tool for investors to make informed decisions in the cryptocurrency
landscape. This comprehensive evaluation seeks to identify the most effective model for
predicting cryptocurrency prices with precision.

ARIMA (Auto-Regressive Integrated Moving Average) and LSTM (Long Short-Term


Memory) models were chosen for their distinct strengths in time-series forecasting. ARIMA is
widely used for its simplicity and effectiveness in capturing linear relationships within data,
making it suitable for analyzing historical price trends. LSTM, a type of recurrent neural
network (RNN), excels at handling sequential data and learning long-term dependencies,
making it ideal for predicting non-linear patterns in cryptocurrency markets. The combination
of these methods allows the project to explore both linear and complex non-linear price
movements, providing a well-rounded approach to understanding cryptocurrency price
dynamics. By comparing these models, the project aims to determine the most accurate
approach for reliable price predictions, balancing precision and computational efficiency.

1.1) Objectives:

• Build ARIMA and LSTM models to predict the future prices of Bitwise 10 Crypto Index
Fund (BITW) and the top 10 cryptocurrencies by market capitalization.

• Evaluate and compare the accuracy of the ARIMA (statistical) and LSTM (machine
learning) models for predicting cryptocurrency price trends.

2
1.2) Background of the 10 Selected Cryptocurrencies in the Project:
These involves the selection of the top 10 cryptocurrencies by market capitalization.

▪ Bitcoin (BTC): Launched in 2009 by an anonymous entity known as Satoshi


Nakamoto, Bitcoin is the first and most well-known cryptocurrency. It operates on a
decentralized blockchain network, using a proof-of-work consensus mechanism.
Bitcoin's limited supply of 21 million coins has made it a popular choice for digital gold
and a hedge against inflation.
▪ Ethereum (ETH): Created by Vitalik Buterin and launched in 2015, Ethereum
introduced the concept of smart contracts and decentralized applications (dApps) to the
blockchain world. Its programmable blockchain allows developers to build a variety of
applications, making it a fundamental part of the DeFi and NFT ecosystems.
▪ Tether (USDT): Tether is the most widely used stablecoin, pegged to the value of the
U.S. dollar. It was designed to minimize the price volatility of cryptocurrencies,
offering a stable digital currency option that facilitates trading on cryptocurrency
exchanges and acts as a store of value.
▪ BNB (Binance Coin): Initially created in 2017 as a utility token for the Binance
cryptocurrency exchange, BNB has evolved to power the Binance Smart Chain (BSC).
It is used to pay transaction fees, participate in token sales, and earn rewards, making it
a key component of the Binance ecosystem.
▪ Solana (SOL): Solana is a high-performance blockchain known for its fast transaction
speeds and low fees. Launched in 2020, it utilizes a unique proof-of-history (PoH)
consensus mechanism that enables it to process thousands of transactions per second,
positioning it as a competitor to Ethereum for dApp and DeFi development.
▪ USD Coin (USDC): USDC is a fully-backed stablecoin pegged to the U.S. dollar,
developed by Circle and Coinbase. It provides stability, transparency, and security,
being frequently used in trading, lending, and DeFi applications. Each USDC token is
backed by a reserve of fiat currency, ensuring its value remains constant.
▪ XRP (Ripple): Ripple's XRP is designed to facilitate fast and low-cost international
money transfers. Launched in 2012, it focuses on enabling financial institutions to make
cross-border payments more efficiently. XRP's consensus protocol does not rely on
mining, making transactions faster and more energy-efficient.
▪ Dogecoin (DOGE): Originally created in 2013 as a joke based on the popular "Doge"
meme, Dogecoin has gained significant popularity due to its active community and

3
support from high-profile individuals. Despite its humorous origins, it has become
widely used for tipping and donations in online communities.
▪ Toncoin (TON): Toncoin is the native cryptocurrency of the TON (Telegram Open
Network), a blockchain platform initially developed by the messaging app Telegram. It
aims to provide fast, scalable transactions and supports decentralized applications,
positioning itself as a versatile tool for digital finance.
▪ TRON (TRX): Launched in 2017 by Justin Sun, TRON is a blockchain platform
designed to decentralize the internet. It focuses on providing a platform for content
creators to share their work without relying on intermediaries. TRON's high throughput
and low transaction fees have made it popular for dApps and smart contracts.

CHAPTER 2: METHODOLOGY

2.1) Purpose of the Study:

The primary objective of this project is to predict the future prices of the top 10
cryptocurrencies by market capitalization using advanced time series forecasting models such
as ARIMA and LSTM. The study aims to provide investors and researchers with accurate
predictions of price movements, thereby helping them make informed decisions. By comparing
the results of ARIMA (AutoRegressive Integrated Moving Average) and LSTM (Long Short-
Term Memory), the project seeks to identify which model provides better accuracy for
predicting highly volatile crypto markets. This study will also explore the factors influencing
the performance of these models, such as the time horizon of the data, model parameters, and
volatility patterns.

2.2) Research Design:

This project follows an exploratory and predictive research design. The exploratory aspect
involves identifying trends, seasonality, and the stochastic behavior of cryptocurrency prices,
using historical price data. The predictive part involves building time series forecasting models
(ARIMA and LSTM) to project future prices based on past data.

The study also aims to conduct a comparative analysis of ARIMA and LSTM models. ARIMA
is a statistical model that captures linear relationships in time series data, whereas LSTM is a
machine learning model designed to handle complex nonlinear dependencies. The project will

4
evaluate the strengths and weaknesses of each model to understand their applicability to
cryptocurrency price forecasting.

2.3) Sampling Design:

The sampling design for this project involves the selection of the top 10 cryptocurrencies by
market capitalization as of the most recent data (2023-2024). The specific cryptocurrencies
included are Bitcoin (BTC), Ethereum (ETH), Tether (USDT), BNB (BNB), Solana (SOL),
USDC (USDC), XRP(XRP), Dogecoin (DOGE), Toncoin (TON), and TRON (TRX).

The historical price data for these cryptocurrencies will be gathered for a period it started
trading to ensure sufficient data points for training the time series models.

2.4) Sampling Method:

The sampling method involves purposive and time-based sampling:

• Purposive Sampling: The top 10 cryptocurrencies are deliberately chosen based on


their market capitalization, ensuring that the sample represents the most prominent and
liquid assets in the crypto market.

• Time-based Sampling: Daily historical price data is collected from reliable data
sources such as Yahoo Finance, Investing.com, or other cryptocurrency exchanges.

5
2.5) Data Collection Methods:

The data for this project will be obtained from secondary sources:

• Historical price data for the top 10 cryptocurrencies will be gathered from platforms
such as investing.com, yahoo finance, or cryptocurrency exchanges' APIs...

2.6) Statistical Analysis:

The statistical analysis for this project will include the following steps:

• Data Preprocessing: Cleaning and normalizing the data to ensure it's in the appropriate
format for ARIMA and LSTM models.

• ARIMA Model: After conducting stationarity tests (e.g., ADF test), the ARIMA model
will be fitted to the data. The optimal parameters (p, d, q) will be determined using
techniques such as grid search and Akaike Information Criterion (AIC).

• LSTM Model: The data will be transformed into sequences of past observations to
train the LSTM. The model will be tuned using hyperparameter optimization techniques
to identify the best architecture for forecasting.

• Model Comparison: The predicted results from both ARIMA and LSTM models will
be compared using evaluation metrics such as RMSE, MAE, and MAPE to determine
which model provides better performance.

• Visualization: The results will be visualized with line graphs to display the actual vs.
predicted values for both models over the time horizon of the study.

CHAPTER 3: ANALYSIS AND INTERPRETATION

3.1) CRYPTO INDEX ANALYSIS (BITWISE 10 CRYPTO INDEX)

About the Index:

The Bitwise 10 Crypto Index is a popular index that tracks the performance of the 10 largest
cryptocurrencies by market capitalization. It is maintained by Bitwise Asset Management, a
firm specializing in crypto asset management.

The primary goal of the Bitwise 10 Crypto Index is to give investors exposure to a diversified
selection of leading cryptocurrencies. Rather than investing in individual coins, the index

6
allows investors to gain broad exposure to the crypto market with a single investment. This
helps to spread risk across different crypto assets.

Constituents of the Index:

As of October 2, 2023, the constituents of the Bitwise 10 Crypto Index and their
respective weights are:

CONSTITUENTS WEIGHTS (%)


Bitcoin (BTC) 66.69
Ethereum (ETH) 24.62
XRP (XRP) 3.36
Solana (SOL) 1.20
Cardano (ADA) 1.12
Polkadot (DOT) 0.65
Polygon (MATIC) 0.63
Bitcoin Cash (BCH) 0.60
Litecoin (LTC) 0.60
Chainlink (LINK) 0.52

Data Collection:

Historical price data for the Bitwise 10 Crypto Index Fund (BITW) was gathered from
platform called Yahoo Finance for the period of 2020-2024.

Inference:

Model 1: LSTM

Inference for the LSTM Model:

• Training Loss: The model’s training loss reduces significantly after just a few epochs,
reaching a minimum of 0.0021. This indicates that the model learned well during
training and converged effectively.

• Model Performance (RMSE): The Root Mean Squared Error (RMSE) on the test
data is 0.025, which indicates that the model's predictions deviate slightly from the

7
actual values. However, for time series predictions, especially in volatile markets like
crypto (BITW), this is considered a decent level of error.

• Actual vs. Predicted Prices:

The graph of actual vs. predicted prices shows that while the model is able to capture
the general trend, the predicted values (in orange) appear smoother and less volatile
compared to the actual values (blue line). This is expected behavior in many LSTM
models as they tend to average out rapid fluctuations.

• Prediction of Future Prices: The forecasted prices for the next 40 days (starting from
Sept 27, 2024) show a consistent downward trend, with prices decreasing steadily
from 26.40 to around 17.11. The steady decline might be attributed to the patterns
recognized by the LSTM in historical data, which could suggest an expected bearish
market phase.

• Early Stopping Impact: Early stopping was implemented with a patience of 5 epochs,
which prevented the model from over-training and potentially overfitting. The model
stopped training after 9 epochs, which suggests that it reached an optimal point early
on.

Summary:

• The LSTM model captures the overall trend in stock prices, but it smooths out short-
term fluctuations, leading to less volatile predictions.

8
• RMSE of 0.025 suggests a reasonable level of error for this time series forecasting
problem.

• The future price predictions show a downward trend, which might be a continuation
of the patterns recognized in the training data.

Adjustments in the architecture or additional training data might improve the model’s ability
to predict sudden price movements more accurately.

Model 2: ARIMA Model

Inference for ARIMA Model (SARIMAX Results):

❖ ARIMA Model (5, 1, 0):

➢ The chosen ARIMA model is configured with parameters (5, 1, 0), which indicates:

▪ p (autoregressive term): The model uses 5 lagged observations.

▪ d (differencing): The data was differenced once to achieve stationarity.

▪ q (moving average term): No moving average component was included.

❖ Model Summary:

➢ Log Likelihood: The log likelihood of the model is -2499.066, which helps assess
model fit.

9
➢ AIC (Akaike Information Criterion): The AIC is 5010.132. This is used to compare
models, with lower AIC values generally indicating a better model fit.

➢ BIC (Bayesian Information Criterion): The BIC value is 5039.026, which similarly
helps evaluate model fit, but penalizes more complex models.

❖ Coefficients:

➢ The first autoregressive term (ar.L1) is statistically significant with a coefficient of


0.0466, meaning the most recent lag has a positive impact on the future values.

➢ The second lag (ar.L2) has a strong negative impact with a coefficient of -0.2058, which
implies that values two time periods back negatively affect the current forecast.

➢ The third autoregressive term (ar.L3) is also negative and statistically significant,
showing additional negative correlation from past data.

➢ The fourth and fifth lags (ar.L4, ar.L5) are not statistically significant, indicating that
they may not contribute much to the model's prediction ability.

❖ Residual Analysis:

➢ The residuals plot shows fluctuations around zero, indicating that the model has
captured much of the patterns in the data. However, some volatility in residuals might
suggest that not all patterns are perfectly captured.

10
➢ The density plot of residuals indicates a right-skewed distribution, which may
suggest that there are some outliers or extreme values in the data that are not well
explained by the model.

➢ Residual statistics: The residuals have a mean close to zero (0.038), which is a good
indicator that the model does not have large bias. However, the standard deviation
(3.968) suggests that there are still notable deviations from the actual values.

❖ Forecasting Results:

➢ The model forecasts a relatively flat price trajectory for the next 40 days, with values
stabilizing around 35.90.

➢ The forecast doesn't exhibit much variation, which might indicate that the ARIMA
model struggles with capturing the high volatility typical of cryptocurrency prices,
leading to over-smoothing of future predictions.

❖ Evaluation of Forecast:

➢ The Mean Squared Error (MSE) for the test set is 31.78, which suggests that there is
a considerable deviation between the predicted and actual test values. This relatively
high MSE indicates that the ARIMA model may not be the best for forecasting highly
volatile assets like BITW.

11
➢ The forecast results are less volatile compared to actual data, which could be due to the
model's structure (ARIMA tends to smooth out volatility) or insufficient modelling of
underlying factors affecting price volatility.

Key Observations:

• Strengths: The ARIMA model does a good job of capturing overall trends, but it is
limited in handling short-term fluctuations, especially in highly volatile markets like
crypto.

• Limitations: The model produces smooth forecasts, which may not adequately reflect
the real-world volatility of BITW. The residuals suggest that certain patterns remain
unexplained, and the right-skewness hints at possible outliers or abrupt changes that the
model fails to capture.

PERFORMANCE COMPARISON OF ARIMA AND LSTM MODELS FOR BITW


FORECASTING:

MODEL MAE MSE RMSE MAPE


ARIMA nan 31.7817 5.6375 nan
LSTM 1.8436 4.7018 2.1683 5.8833

❖ ARIMA Model Metrics:

The ARIMA model's metrics indicate that it struggles to predict the actual values accurately:

• The MSE (31.78) is relatively high, suggesting that there is a large deviation
between the predicted and actual prices.
• The RMSE (5.64) is also high, which highlights significant forecast errors.
• Both MAE and MAPE are returning nan, which could be due to division by
zero or incorrect handling of data, possibly because some actual values in the
test set are zero or very small.

12
❖ LSTM Model Metrics:

The LSTM model performs much better than ARIMA:

• The MSE (4.70) is significantly lower than the ARIMA model’s, indicating that
the LSTM captures the pattern with much smaller errors.
• The RMSE (2.17) is also much lower, showing that the LSTM model's
predictions are more accurate on average.
• The MAPE (5.88%) suggests that the LSTM model's predictions deviate by an
average of 5.88% from the actual values, which is a reasonably good level of
accuracy.
• MAE (1.84) is quite low, meaning that on average, the LSTM model's
predictions are only off by about 1.84 units.

Conclusion:

The LSTM model is clearly the better model for forecasting BITW prices. It provides
significantly more accurate predictions with lower error metrics across the board. The ARIMA
model, while useful for linear, stationary time series, is not as effective in capturing the highly
volatile and nonlinear nature of cryptocurrency data like BITW.

ACCURACY SCORE COMPARISON:

MODEL ACCURACY
ARIMA 0.50
LSTM 0.85

❖ ARIMA Model Accuracy:

➢ The ARIMA model achieves an accuracy of 50%, meaning that the model correctly
predicts whether the return is positive or negative in half of the cases. This is similar to
a random guess and indicates that the ARIMA model struggles with predicting the
direction of the returns.

➢ The reason for ARIMA’s performance may be due to the complexity and volatility of
the cryptocurrency data, which is typically difficult for linear models like ARIMA to
handle effectively.

13
❖ LSTM Model Accuracy:

➢ The LSTM model achieves a much higher accuracy of 85%, suggesting that it is better
at capturing the nonlinear patterns in the time series data.

➢ The deep learning architecture of LSTM, which is specifically designed to handle


sequential data, appears to have learned more complex relationships in the BITW stock
data, making it much more reliable for predictions.

Conclusion:

• LSTM Model is the better model in this case, achieving a significantly higher accuracy
of 85% compared to the 50% accuracy of the ARIMA model.

• The ARIMA model’s performance indicates that it is not suitable for this particular
dataset, likely due to its inability to capture the complex, nonlinear trends in
cryptocurrency data.

• LSTM’s architecture allows it to perform much better for this multivariate and time-
dependent data, making it the preferred model for predicting BITW stock prices.

Given the accuracy comparison, LSTM is the recommended model for future forecasting tasks
involving BITW or similar volatile assets.

3.2) BITCOIN

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

14
Significance CV- CV-
Test
P-value level Test statistic 10% CV-5% 2.5% CV-1%
ADF 0.874837 0.05 -0.58273 -2.8621 -3.43162
KPSS 0.01 0.05 7.729495 0.347 0.463 0.574 0.739
The series is likely to be non-stationary

Differencing:

Significance CV- CV- CV-


Test
P-value level Test statistic 10% 5% 2.5% CV-1%
ADF 8.29E-19 0.05 -10.548617 -2.8621 -3.43162
KPSS 1.00E-01 0.05 0.196639 0.347 0.463 0.574 0.739
The series is likely to be stationary.

15
Model 1: LSTM Model

Model Performance (Epochs and Loss):

❖ Loss Values Across Epochs:

The loss values indicate the model's prediction error during training, with lower values
reflecting better performance. Starting at 0.0027 in Epoch 1, the loss significantly decreases to
9.0149e-04 (0.00090149) in Epoch 2, suggesting the model is beginning to learn the underlying
data patterns. This trend continues across subsequent epochs, with losses of 6.8044e-04 (Epoch
3), 6.4960e-04 (Epoch 4), 5.4779e-04 (Epoch 5), 5.2417e-04 (Epoch 6), and 4.4458e-04
(Epoch 7). The consistent decline in loss values indicates effective learning and convergence
of the optimization algorithm.

❖ Model Evaluation - RMSE:


The model evaluation yielded a Root Mean Squared Error (RMSE) of 0.0356 after training.
RMSE quantifies the average magnitude of prediction errors, reflecting the model's
accuracy. This value indicates that the model's predictions are, on average, very close to
the actual values, demonstrating good performance. Lower RMSE values generally signify
better model performance, with values nearer to zero suggesting minimal prediction error,
reinforcing the effectiveness of the model in capturing the underlying data patterns.

Graphical Insights (Actual vs Predicted Prices):

Interpretation:

Overall Trend: The model generally captures the overall trend of the actual prices, with both
lines showing a similar upward and downward movement.

16
Underestimation: In several regions, the predicted prices consistently underestimate the actual
prices. This suggests that the model is not fully capturing the upward momentum of the series.

Lag: There is a noticeable lag between the actual and predicted prices, particularly during
periods of rapid price changes. This indicates that the model is struggling to keep up with the
fast-paced movements of the series.

Interpretation:

• Downward Trend: The forecasted prices exhibit a consistent downward trend over the
40-day period.

• Linearity: The trend appears to be linear, suggesting a constant rate of price decline.

• Magnitude of Decrease: The predicted price decrease is significant, with a drop from
approximately 60,000 to 42,500.

The forecast suggests a potential downward trend in prices over the next 40 days. However, it's
important to note that cryptocurrency markets are highly volatile, and the actual price
movements may deviate significantly from the forecast. The model used to generate this
forecast may not capture all the relevant factors that could influence the price, and the
prediction should be interpreted with caution.

17
Model observation:

The LSTM model demonstrates effective learning, with loss values consistently decreasing
from 0.0027 (Epoch 1) to 4.4458e-04 (Epoch 7), indicating improved data fitting. The model
converged efficiently and achieved a low RMSE of 0.0356, reflecting small prediction errors
despite cryptocurrency volatility. It successfully captures overall price trends but tends to
underestimate sharp upward movements and exhibits lag during rapid price changes. The 40-
day forecast predicts a linear downward trend from 60,000 to 42,500, though caution is advised
as the model may not fully account for the volatility inherent in cryptocurrency markets.

Conclusion:

The LSTM model demonstrates effective learning with a consistent reduction in loss and a
relatively low RMSE, indicating that it has achieved good accuracy in predicting
cryptocurrency prices. However, it underestimates the upward momentum in certain regions
and lags during rapid price changes. The forecasted linear downward trend should be
interpreted cautiously due to the high volatility inherent in cryptocurrency markets. While the
model captures general trends, it may not account for all factors influencing price movements,
and further tuning or additional data could improve its performance in capturing rapid
fluctuations.

Model 2 : ARIMA Model

Interpretation of SARIMAX (ARIMA(5, 1, 0)) Model Results

18
❖ Model Overview

The SARIMAX results indicate an ARIMA model specified as ARIMA(5, 1, 0), where:

• 5 is the order of the autoregressive terms (AR).

• 1 indicates the differencing order (the data has been differenced once to make it
stationary).

• 0 signifies no moving average terms (MA) are included.

The model is based on 5146 observations (data points) collected from July 18, 2010, to
August 18, 2024. The Log Likelihood value of -41275.345 represents the likelihood of the
observed data given the model parameters.

❖ Coefficients of AR Terms

The table includes coefficients for the autoregressive (AR) terms:

• ar.L1: -0.0556 (p < 0.000): The first lag of the dependent variable has a negative
influence on the current value.

• ar.L2: 0.0149 (p = 0.019): The second lag has a positive influence, though weaker than
L1.

• ar.L3: 0.0212 (p = 0.001): A positive influence, showing a significant contribution to


the current value.

• ar.L4: 0.0353 (p < 0.000): This term shows a strong positive influence on the current
price.

• ar.L5: 0.0153 (p = 0.010): Similar to the second lag, it also has a positive influence but
is weaker.

Overall, the autoregressive coefficients indicate that the past values of the price have significant
effects on the current price, with the most substantial influence from the first lag.

❖ Error Term (Sigma²)

• Sigma² (Variance of the error term): 5.447e+05 indicates the variance of the residuals
(error terms) from the model. A larger value signifies higher variability in the model's
error predictions. The standard error associated with this variance is 3255.078, and it is

19
statistically significant (p < 0.000), suggesting that the residuals have a significant
degree of variability.

❖ Statistical Tests

• Ljung-Box (L1) (Q): 0.00 with a Prob(Q) of 0.99 indicates that there is no evidence of
autocorrelation in the residuals. A high p-value suggests that the model adequately
captures the autocorrelation in the data.

• Jarque-Bera (JB): 88472.08 with Prob(JB) < 0.000 indicates that the residuals do not
follow a normal distribution. This high value suggests a departure from normality.

• Heteroskedasticity (H): 2598.23 with Prob(H) < 0.000 shows that the residuals exhibit
heteroskedasticity, meaning the variance of errors is not constant across observations.

❖ Model Fit Criteria

• AIC (Akaike Information Criterion): 82562.689 and BIC (Bayesian Information


Criterion): 82601.964 are both used to evaluate the model fit. Lower values of AIC
and BIC generally indicate a better-fitting model. However, they should be compared
to those of other models to determine which model is preferred.

• HQIC (Hannan-Quinn Information Criterion): 82576.434 serves a similar purpose


as AIC and BIC.

❖ Skewness and Kurtosis

• Skewness: 0.09 indicates that the distribution of residuals is relatively symmetrical,


suggesting a slight right skew.

• Kurtosis: 23.31 suggests that the residuals have heavy tails compared to a normal
distribution (kurtosis of 3), indicating potential outliers or extreme values in the
residuals.

20
❖ Residual Diagnostics:

The graph shows the residuals of an ARIMA model:

• No Clear Pattern: The residuals appear to be centered around zero and do not exhibit
any clear patterns or trends. This suggests that the ARIMA model is capturing the
underlying trend and seasonality in the data.
• Randomness: The residuals seem to be randomly distributed, indicating that the
model is not missing any significant components.
• Outliers: There are a few outliers, which are points that deviate significantly from the
rest of the data. These outliers could be due to unusual events or noise in the data.

21
• Distribution Shape: The density plot shows a clear peak around zero, indicating that
the majority of residuals are close to zero. This is a good sign as it suggests that the
model is accurately predicting the values in most cases.
• Skewness: The distribution is slightly skewed to the right, with a longer tail on the
right side. This indicates that there are some larger positive residuals, meaning the
model underestimates the values in some cases.
• Kurtosis: The distribution appears to be leptokurtic, meaning it has heavier tails than
a normal distribution. This suggests that there are some outliers or extreme values in
the residuals.

❖ Forecasted Price:

• Initial Spike: The forecasted prices start with a sharp increase, reaching a peak
around October 1, 2024.
• Sharp Decline: After the initial spike, the prices experience a sharp decline, reaching
a low point around October 10, 2024.
• Stabilization: Following the decline, the prices stabilize at a relatively constant level.

22
Model Interpretation Based on Accuracy Metrics

Models Performance Comparison:

Model MAE MSE RMSE MAPE


ARIMA 2772.281 12131087 3482.971 4.508763
LSTM 3508.575 16860644 4106.171 5.720896

❖ ARIMA Model

• MAE (Mean Absolute Error): 2772.28 indicates that, on average, ARIMA's


predictions deviate from the actual values by 2772.28 units.

• MSE (Mean Squared Error): 12,131,087.25 highlights that larger errors are penalized
more heavily, but the MSE is still significantly lower compared to other models,
suggesting ARIMA handles outliers reasonably well.

• RMSE (Root Mean Squared Error): 3482.97 further emphasizes ARIMA's predictive
error, but it still remains lower than the LSTM model.

• MAPE (Mean Absolute Percentage Error): 4.51% suggests that ARIMA's


predictions, on average, are off by 4.51%, making it relatively accurate in terms of
percentage error.

Conclusion for ARIMA: The ARIMA model has decent accuracy, particularly when looking
at the MAPE. It strikes a balance between absolute error and percentage error, making it a
reasonably effective model for this task.

❖ LSTM Model

• MAE: 3508.58, higher than ARIMA, suggests that LSTM’s average error is larger,
indicating its predictions deviate more from actual values.

• MSE: 16,860,644.36, larger than ARIMA, indicates that LSTM struggles more with
larger errors and may be more affected by outliers.

• RMSE: 4106.17, also higher than ARIMA, implies that LSTM has larger prediction
errors overall.

23
• MAPE: 5.72%, slightly higher than ARIMA, suggests that LSTM's percentage error is
higher, making it less accurate in comparison.

Conclusion for LSTM: The LSTM model performs worse than ARIMA based on all metrics,
with larger absolute and percentage errors. While LSTM is often strong in time-series
predictions, in this case, it is outperformed by ARIMA.

Overall Conclusion:

• ARIMA is the most accurate model across all metrics, with the lowest MAE, MSE,
RMSE, and MAPE, making it the best performer for this prediction task.

• LSTM performs worse than ARIMA, with higher errors in both absolute and
percentage terms. While LSTM is often a powerful model for time-series data, it is not
the best choice here.

3.3) ETHEREUM

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

24
Test P-value Significance Test statistic CV- CV-5% CV- CV-1%
level 10% 2.5%
ADF 0.471043 0.05 -1.623217 -2.86247 -3.43246
KPSS 0.01 0.05 6.057117 0.347 0.463 0.574 0.739
The series is likely to be non-stationary

Differencing:

Significance Test CV- CV-


Test
P-value level statistic 10% CV-5% 2.5% CV-1%
ADF 2.19E-16 0.05 -9.57827 -2.86247 -3.43246
KPSS 1.00E-01 0.05 0.044971 0.347 0.463 0.574 0.739
The series is likely to be stationary.

Model 1: LSTM Model

Model Performance (Epochs and Loss):

❖ Loss Values Across Epochs:

As training progressed, the loss steadily decreased, reaching 0.0012 by Epoch 7, demonstrating
effective learning and improvement. After Epoch 3, the reduction in loss became more gradual,

25
suggesting the model was nearing its optimal performance. By Epoch 7, the loss had stabilized,
indicating that further training would likely yield only minor improvements, and the model
might soon converge. This pattern reflects a successful learning process, with the model
approaching a point of diminishing returns in performance enhancements.

❖ Model Evaluation - RMSE:

The RMSE value of 0.0371 on the test data indicates that, on average, the model's predictions
deviate from the actual values by around 3.71%. In the context of Ethereum 's highly volatile
prices, this suggests the model has relatively good predictive accuracy.

Graphical Insights (Actual vs Predicted Prices):

Interpretation:

Model Performance: The model appears to capture the general trend of the actual prices, with
the predicted line following the overall shape of the actual line. However, there are noticeable
deviations, particularly during periods of high volatility.

Underestimation: The model consistently underestimates the actual prices, particularly during
periods of rapid price increases. This suggests that the model might be struggling to capture
the full extent of price fluctuations.

Lag: There seems to be a slight lag between the actual and predicted prices, indicating that the
model might not be responding quickly enough to sudden price changes.

26
Interpretation:

• Downward Trend: The forecasted prices exhibit a consistent downward trend over the
40-day period.

• Linearity: The trend appears to be linear, suggesting a constant rate of price decline.

• Magnitude of Decrease: The predicted price decrease is relatively significant, with a


drop from approximately 2400 to 1600.

The LSTM model predicts a steady decline in Ethereum prices over the next 40 days. However,
it's important to note that cryptocurrency markets are highly volatile, and the actual price
movements may deviate significantly from the forecast. The model's prediction is based on
historical data and may not fully capture the impact of future events or market sentiment.

Model observation:

The LSTM model shows effective learning with a steady decrease in loss across epochs,
stabilizing by Epoch 7, and achieving an RMSE of 0.0371, indicating decent predictive
accuracy. However, the model underestimates price increases and exhibits lag when responding
to sudden market shifts, likely due to the volatile nature of cryptocurrencies. The forecasted
linear decline in Ethereum prices over the next 40 days suggests the model may oversimplify
trends and fail to capture the complexity and rapid fluctuations typical of cryptocurrency
markets.

27
Conclusion:

The LSTM model effectively captures the general price trend but struggles with rapid market
fluctuations, especially during periods of volatility. The linear downward trend prediction may
suggest an oversimplified understanding of price movements. Given cryptocurrency’s
unpredictable nature, the model might benefit from incorporating more sophisticated features.
Enhancing the model by adding features like trading volume or external financial data, and
refining its architecture—such as modifying the number of LSTM layers or introducing
dropout layers—could improve its ability to capture complex price patterns and manage market
volatility more effectively.

Model 2: ARIMA Model

Interpretation of SARIMAX (ARIMA(5, 1, 0)) Model Results:

❖ Model: ARIMA(5, 1, 0) (Autoregressive Integrated Moving Average model with order (5


, 1, 0)). This means:
• 5 -AR (Autoregressive) terms: The model uses 5 past values of the series to predict the
current value.
• 1 -differencing (I): Differencing is applied once to make the series stationary.
• 0- MA (Moving Average) terms: No past errors are used in the model.

28
❖ Coefficients of AR Terms:
• AR.L1 (Lag 1): -0.0700 (p-value < 0.05). This coefficient indicates that the value of
the series at lag 1 has a negative effect on the current value. The negative sign
suggests that when the price was high at the previous step, it has a tendency to pull
back slightly in the next step.
• AR.L2 (Lag 2): 0.0175 (p-value = 0.041). The coefficient for lag 2 is small but
positive, suggesting a weak positive relationship between the value at lag 2 and the
current value.
• AR.L3 (Lag 3): 0.0404 (p-value < 0.05). This positive coefficient indicates a
significant positive impact of lag 3 on the current price.
• AR.L4 (Lag 4): 0.0298 (p-value < 0.05). This positive coefficient suggests the price
from 4 steps back contributes positively.
• AR.L5 (Lag 5): -0.0508 (p-value < 0.05). This indicates that the price 5 steps back
have a significant negative impact.

❖ Error Term (Sigma²):


• Sigma² (Error variance): 4943.3434. This is the variance of the residuals (errors),
indicating the amount of unexplained variation in the model.

❖ Statistical Tests:
• Ljung-Box (L1) (Q) Test: This test checks for autocorrelation in the residuals. A p-v
alue of 0.83 indicates no significant autocorrelation, meaning the model residuals are
independent.
• Jarque-Bera (JB) Test: This tests for normality in residuals. A JB value of 48793.94
with a p-value of 0.00 indicates non-normality in residuals, which may suggest the
presence of outliers or skewness in the data.
• Heteroskedasticity Test (H): The test shows H = 12.50 and a p-value of 0.00, meani
ng that heteroskedasticity (non-constant variance) is present. This suggests that the
variance of the residuals is not constant over time.

❖ Model Fit Criteria:


• AIC (Akaike Information Criterion): 34979.579. A lower AIC indicates a better fit
of the model, but it's used comparatively with other models.
29
• BIC (Bayesian Information Criterion): 35015.781. Similar to AIC, but BIC penaliz
es complex models more, so lower values are preferable.

❖ Skewness and Kurtosis:


• Skewness: -0.63, indicating that the residuals are slightly skewed to the left (negatively
skewed).
• Kurtosis: 22.45, indicating heavy-tailed data, suggesting that the residuals have
extreme values or outliers.

❖ Residual Diagnostics:

• No Clear Pattern: The residuals appear to be centered around zero and do not exhibit
any clear patterns or trends. This suggests that the ARIMA model is capturing the
underlying trend and seasonality in the data.
• Randomness: The residuals seem to be randomly distributed, indicating that the model
is not missing any significant components.

30
• Outliers: There are a few outliers, which are points that deviate significantly from the
rest of the data. These outliers could be due to unusual events or noise in the data.

• Distribution Shape: The density plot shows a clear peak around zero, indicating that the
majority of residuals are close to zero. This is a good sign as it suggests that the model is
accurately predicting the values in most cases.
• Skewness: The distribution is slightly skewed to the right, with a longer tail on the right
side. This indicates that there are some larger positive residuals, meaning the model und
erestimates the values in some cases.
• Kurtosis: The distribution appears to be leptokurtic, meaning it has heavier tails than a n
ormal distribution. This suggests that there are some outliers or extreme values in the
residuals.

The graph shows the forecasted prices for the next 40 days, starting from September 28, 2024.
The mean squared error (MSE) is a metric used to evaluate the accuracy of the forecast.

• Initial Spike: The forecasted prices start with a sharp increase, reaching a peak around
October 1, 2024.
• Stabilization: After the initial spike, the prices stabilize at a relatively constant level.
• MSE: The MSE of 34259.14 indicates that the model's predictions are relatively far
from the actual values. This suggests that the model may not be capturing the underlying
dynamics of the price series effectively.

31
MODELS PERFORMANCE COMPARISON

MODEL MAE MSE RMSE MAPE


ARIMA 150.6316 34259.14 185.0922 6.258512
LSTM 135.3454 25984.5 161.1971 5.327235

Model Interpretation Based on Accuracy Metrics


❖ ARIMA Model
• MAE (Mean Absolute Error): 150.63 indicates that the ARIMA model's predictions
deviate from the actual values by an average of 150.63 units.
• MSE (Mean Squared Error): 34,259.14 shows that larger errors are penalized more,
and it is significantly higher than the MAE, highlighting that ARIMA may have larger
outliers.
• RMSE (Root Mean Squared Error): 185.09, derived from MSE, further emphasizes
the ARIMA model's predictive errors, with larger errors contributing more to the over
all result.
• MAPE (Mean Absolute Percentage Error): 6.26% means ARIMA's predictions are,
on average, 6.26% off from the actual values.

Conclusion for ARIMA:


The ARIMA model shows reasonable accuracy with a relatively low MAPE, indicating that
it’s performance is decent in terms of percentage error, though larger errors may have a
stronger impact.

❖ LSTM Model
• MAE: 135.35, lower than ARIMA, implies that LSTM's average prediction error is
smaller, indicating a closer fit to the actual data.
• MSE: 25,984.50, also lower than ARIMA, suggests that LSTM handles larger errors
better, reducing the effect of outliers.
• RMSE: 161.20, shows the predictive accuracy is better than ARIMA, and fewer
extreme errors are influencing the model.
• MAPE: 5.33%, the lowest among the three models, indicates that LSTM provides the
most accurate predictions in terms of percentage deviation from the actual values.

32
Conclusion for LSTM: The LSTM model has the best overall performance across all error m
etrics, showing its strength in handling time series data. It has lower errors and is more accura
te in its predictions compared to ARIMA and Logistic Regression.

Overall Conclusion:
• The LSTM model is the most accurate and reliable across all the given metrics, with
the lowest errors and best overall performance.
• The ARIMA model performs reasonably well but is not as accurate as LSTM. It could
still be useful depending on the specific use case.

3.4) BNB

Stationarity Test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

Intrepretation:

Test P value Sig Test statistics 1% 5% 10% Decision


ADF 0.729836 0.05 -1.062387 -3.4330 -2.862 Accept H0; it is
non-stationary

KPSS 0.010000 0.05 5.497626 0.739 0.463 0.347 Reject H0; it is


non-stationary

33
Differencing

Test P value Sig Test 1% 5% 10% Decision


statistics
ADF 1.908805e-16 0.05 -9.602071 -3.4330 -2.8627 Reject H0; it is
stationary

KPSS 1.000000e-01 0.05 0.068863 0.739 0.463 0.347 Accept H0; it is


stationary

34
Model 1: LSTM

• Training Epochs and Loss


• Epochs 1-50 Analysis: The training process involved the model undergoing 50 epochs,
during which the loss consistently decreased from an initial value of 0.0089 to a final
value of 0.0015. This trend indicates that the model was effectively learning and
adapting to the data over time. The steady reduction in loss values suggests that the
model was progressively fitting the data better, showcasing its capacity to minimize
errors effectively.
• Improving Loss: The significant drop in loss values throughout the training epochs
reflects a successful learning process. This reduction demonstrates that the model is
capturing the underlying patterns and trends in the training data, which is critical for
achieving accurate predictions.
• Small Loss Values: The low loss values achieved by the end of the training signify
strong performance during the training phase. This effective minimization of error
implies that the model is performing well and can make reliable predictions based on
the training data.
• Potential Overfitting: However, the training results also raise concerns about potential
overfitting. As the loss plateaus in the later epochs, it suggests that the model may have
captured most of the trends present in the data. Continuing to train beyond 50 epochs
may not yield significant improvements in performance and could lead to overfitting,
where the model learns noise rather than the actual signal in the data

• Model Evaluation – RMSE

An RMSE of 0.0122 indicates that the model is accurately predicting the target variable with
minimal error, reflecting high precision in its predictions. This low value shows that the model
has effectively learned from the training data and is well-suited for the task, demonstrating a
strong fit between the predicted and actual values. Overall, the RMSE confirms the model's
reliability and robustness in delivering accurate forecasts.

35
• Accuracy and Trend Capturing

• Trend Alignment

The actual prices, represented by the blue line, demonstrate significant volatility, showcasing
sharp fluctuations over time. In contrast, the predicted prices, indicated by the orange line,
closely follow the trends of the actual prices, suggesting that the prediction model captures the
underlying market behavior effectively. However, there are instances, particularly during
periods of rapid price changes, where the predicted values lag behind the actual prices.

• Smooth Fit

The predicted prices appear smoother compared to the actual prices, indicating that the model
incorporates some level of smoothing, making it less sensitive to short-term fluctuations while
effectively capturing longer-term trends. This smoothness suggests that while the model
performs well for long-term forecasts, it may struggle with short-term accuracy during volatile
market conditions, highlighting potential areas for improvement in forecasting strategies.

❖ Predicted price

36
• General Trend: The forecast predicts a consistent decline in prices over the next 40
days. Starting from approximately 550 at the beginning of October, the price steadily
drops to around 250 by the end of October.
• Rate of Decline: The curve suggests that the rate of decline is gradual but continuous.
It does not show any major fluctuations or spikes, indicating a smooth downward
trajectory in the predicted values.
• Prediction Accuracy: The forecasted prices show a steady, predictable pattern. There
are no signs of volatility or unexpected changes, implying that the forecasting model is
confident about this declining trend over the time period.
• The forecast suggests a bearish outlook for the asset or stock, with prices expected to
decrease significantly over the next 40 days. This steady decline could indicate market
conditions such as oversupply, weak demand, or other external factors leading to a price
drop. This forecast implies caution, as prices may continue to drop in the near future
unless external factors change.

Model 2: ARIMA

❖ AR Coefficients:

The autoregressive coefficients (ar.L1 to ar.L5) show how past values influence the current
value. Specifically:

• ar.L1 (-0.1167): The first lag has a significant negative impact on the current price, as
the coefficient is statistically significant (p-value < 0.05).
• ar.L2 (0.0860): This coefficient is positive and statistically significant (p-value < 0.05),
indicating that the second lag has a positive influence on the current price.
• ar.L3 (0.0551): Positive and statistically significant (p-value < 0.05), suggesting that
the third lag positively influences the current price.
• ar.L4 (0.0008): The fourth lag is not significant, as the coefficient is very close to zero
and the p-value is high (p = 0.934), indicating no meaningful influence on the current
price.
• ar.L5 (-0.0941): This coefficient is negative and statistically significant (p-value <
0.05), suggesting a significant negative influence from the fifth lag on the current price.

37
• sigma² (149.4350): This represents the estimated variance of the error term. The higher
value suggests the model has captured some noise in the data.

❖ Model Fit Metrics:


• AIC (19,419.965): The Akaike Information Criterion, used to compare models, where
lower values indicate better model fit. This value suggests the quality of the model, but
it is more meaningful when compared with alternative models.
• BIC (19,454.846): The Bayesian Information Criterion penalizes for model
complexity. A lower BIC suggests a better model fit, but again, it should be compared
with other models.
• Log Likelihood (-9,703.982): The log-likelihood measures how well the model fits the
data. Higher values (closer to zero) indicate a better fit.

❖ Skewness and Kurtosis

• Skew value: -0.36 The negative skewness indicates that the distribution of the residuals
is slightly left-skewed. This means that the left tail of the distribution (representing
lower price values) is longer or fatter than the right tail, suggesting that there are more
small, negative residuals than positive ones.
• Kurtosis value: 32.19 The kurtosis is significantly higher than 3 (which would indicate
a normal distribution), suggesting that the residuals have heavy tails. This high kurtosis
indicates that the residuals experience extreme values more often than would be
expected under a normal distribution.

❖ Residual diagnostics

38
• Ljung-Box (L1) (Q = 0.02, Prob(Q) = 0.88): The high p-value suggests that there is
no significant autocorrelation in the residuals. This indicates that the model has
adequately captured the patterns in the time series data.
• Jarque-Bera (JB = 87,884.71, Prob(JB) = 0.00): The Jarque-Bera test indicates that
the residuals are not normally distributed (p-value < 0.05). This suggests that there may
still be skewness or kurtosis in the data that the model has not captured.
• Heteroskedasticity (H = 152.59, Prob(H) = 0.00): The test indicates significant
heteroskedasticity (variance changes over time) in the residuals, suggesting the
potential need for a model that accounts for changing volatility (e.g., GARCH).

❖ Forecasted price

The forecasted prices in the plot are relatively stable, maintaining an average around 535.0
with minimal fluctuations over the forecast horizon. This stability suggests that the model
expects the price to stabilize and remain close to this level in the future. The flat trend
indicates a lack of significant volatility, reflecting a period of equilibrium in the market.

The lack of substantial change in the forecasted prices may imply that the model captures
the mean-reverting behavior of the time series rather than anticipating large price swings
or trends. This suggests that any deviations from the average price are expected to be
temporary, reinforcing the idea that prices will converge back to the forecasted level rather
than exhibiting significant upward or downward movements in the near term.

39
Model performance comparison

Metric ARIMA LSTM


MAE nan 33.532433685302735
MSE 58621.62946074004 1361.5518546470605
RMSE 242.1190398558941 36.89921211417746
MAPE nan 6.021393035301048

❖ ARIMA Model

• MSE (Mean Squared Error): 58,621.63, which is quite high, indicating that the
model's predictions are not close to the actual values in terms of variance.
• RMSE (Root Mean Squared Error): 242.12, also high, indicating the model has a
substantial error magnitude in predicting values.
• Inference: While the ARIMA model has provided results for MSE and RMSE affect
the model’s overall reliability and accuracy.

❖ LSTM Model

• MAE (Mean Absolute Error): 33.53, which is relatively low and indicates that the
average error in predictions is small.
• MSE (Mean Squared Error): 1,361.55, which is significantly lower compared to the
ARIMA model, indicating better accuracy in predicting values close to the actual ones.
• RMSE (Root Mean Squared Error): 36.90, further reflecting that the LSTM model
performs well with much smaller errors compared to ARIMA.
• MAPE (Mean Absolute Percentage Error): 6.02%, indicating that the model's
predictions are accurate within a small percentage error of the actual values.
• Inference: The LSTM model performs well across all metrics, showing that it captures
the underlying patterns in the data effectively. Its low error metrics make it the most
accurate and suitable model for this task.

40
Conclusion

• The LSTM model is the most effective based on its low error rates across all metrics,
showing high predictive accuracy.
• The ARIMA model shows high errors (MSE, RMSE), and the presence of Nan in key
metrics like MAE and MAPE indicates potential issues with data or model reliability.

3.5) TETHER

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

Intrepretation:

Test P value Sig Test 1% 5% 10% Decision


statistics
ADF 1.094265e-18 0.05 -10.499348 -3.4328 -2.8626 Reject H0; it
is stationary

KPSS 1.000000e-01 0.05 0.163888 0.739 0.463 0.347 Accept H0;


it is
stationary

.
This is stationary data.

41
Model 1: LSTM
❖ Training Epochs and Loss

• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased, in
dicating that the model was learning and improving over time. The loss values start at
0.0239 and gradually decrease.
• Improving Loss: The loss reduction shows that the model is fitting the data better as
training progresses.
• Small Loss Values: The low loss values suggest that the model is performing well
during training, as it is minimizing the error effectively.
• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.

❖ Model Evaluation – RMSE

A RMSE of 0.0122 shows that the model is accurately predicting the target variable with
minimal error, confirming that the model has successfully learned from the training data and is
well-suited for the task at hand. This low RMSE value indicates high prediction precision and
a strong fit between the model and the data.

42
• Accuracy and Trend Capturing

• Trend Alignment: The predicted prices (orange dots) generally follow the overall trend
of the actual Tether prices (blue line), indicating that the model has successfully
captured the general movement of prices. However, there are clear deviations in certain
periods where the actual prices experience sharper fluctuations, showing that the model
struggles to fully capture the volatility in those instances.
• Smoother Predictions: The predicted prices are noticeably smoother compared to the
actual prices, which show more frequent and larger spikes and dips. This smoothing
effect suggests that the model is filtering out short-term market noise and focusing on
broader trends. While this can be beneficial for generating stable predictions, it also
means the model may be underestimating sudden, significant price movements in the
actual data.

❖ Predicted Prices

43
❖ General Trend:

There is an upward trend in the plot. The predicted prices are start from 1.0020647048950195
on 2024-09-28 and increase to 1.0055464506149292 on 2024-11-06.

❖ Prediction Characteristics:
• The predictions do not show significant price volatility or large fluctuations over the
40-day forecast. This indicates that the model predicts stable price movement with
upward changes.

• Prices are converging to a narrow range toward the end of the 40 days (from 1.002064
7048950195 to 1.0055464506149292), suggesting a stable market outlook according
to the LSTM model.

❖ Key Observations
• Good Fit for Smooth Trends: The model successfully captures a clear upward trend
in the predicted prices, making it suitable for scenarios where smooth, predictable
trends are observed. This characteristic indicates that the LSTM model is well-
calibrated to detect and project consistent growth patterns over time.
• Limited Volatility Capture: The predictions demonstrate minimal price volatility,
suggesting that the model may not be designed to account for sudden market
fluctuations or high-risk scenarios. This limitation means that while the model provides
reliable forecasts for stable market conditions, it may overlook more erratic behaviours
in price movements that could occur in volatile markets.

44
Model 2: ARIMA Model

❖ AR coefficients:
The autoregressive coefficients (ar.L1 to ar.L5) show how the past values influence the current
value. Specifically:

• ar. L1 (-0.3931): The first lag has a strong and significant negative impact on the
current price, with a p-value of 0.000, indicating it is highly statistically significant. A
1-unit increase in the first lag of price results in a 0.3931 decrease in the current price.
• ar. L2 (-0.1808): The second lag also has a significant negative influence on the current
price (p-value < 0.05), showing that past price movements negatively affect the current
price, but less so than the first lag. A 1-unit increase in the second lag of price results
in a 0.1808 decrease in the current price.
• ar. L3 (-0.1555): The third lag continues to have a significant negative effect on the
current price, with a coefficient of -0.1555 and a p-value of 0.000, meaning it has a
statistically significant role. A 1-unit increase in the third lag of price results in a 0.1555
decrease in the current price.
• ar. L4 (-0.1224): This lag has a smaller but still statistically significant negative impact
on the current price, with a p-value of 0.000. A 1-unit increase in the fourth lag of price
results in a 0.1224 decrease in the current price.

45
• ar. L5 (-0.0299): The fifth lag has the smallest negative effect, but it is still statistically
significant (p-value < 0.05), indicating a minor impact. A 1-unit increase in the fifth lag
of price results in a 0.0299 decrease in the current price.
• sigma2 (1.342e-05): This is the variance of the residuals (errors). It has a very small
value and is statistically significant (p-value = 0.000), indicating the model has captured
a considerable portion of the variance in the data.

❖ Model Fit Statistics:

• Log Likelihood (11235.274): This value indicates the goodness of fit for the model.
Higher values (closer to 0) suggest a better model fit to the data.
• AIC (-22458.549): The Akaike Information Criterion helps compare models, with
lower values indicating a better model fit. This value is notably low, suggesting that the
model fits well relative to potential alternatives.
• BIC (-22423.183): The Bayesian Information Criterion, similar to AIC, also favors
lower values. A BIC this low suggests a strong model fit while penalizing for
complexity.

❖ Skewness and Kurtosis

• Skewness: 0.56: Indicates that the distribution of residuals is right-skewed.


• Kurtosis: 39.59: A high kurtosis value indicates that the residuals have heavy tails,
suggesting the presence of outliers.

❖ Residual diagnostics

46
• Ljung-Box (L1) (Q: 3.19): The Ljung-Box test checks for autocorrelation in the
residuals. A Q value of 3.19 indicates no significant autocorrelation at lag 1 (since the
p-value is 0.07), suggesting that the residuals are white noise.
• Jarque-Bera (JB: 149759.29, Prob (JB): 0.00): The Jarque-Bera test indicates a
significant deviation from normality (p < 0.05), suggesting that the residuals may not
be normally distributed.
• Heteroskedasticity (Prob(H): 0.00): The test for heteroskedasticity indicates the
presence of non-constant variance in the residuals, which could affect model
predictions and inferences.

❖ Forecasted Prices

• The forecasted prices for the next 40 days are mostly stable around 3.0 after the initial
period, with minimal fluctuations. This indicates that the model expects the price to
remain steady over the forecast horizon, with no significant trends or deviations
predicted.
• At the start of the forecast (around 2024-10-01), there is a sharp increase in the price,
rising quickly to just under 4.0. This is followed by a correction, where the price
declines over the next few days and settles around 3.0.

47
• Beyond the first week of October, the price stabilizes and shows little to no change until
November 1st, where it continues to hover around 3.0. This flattening indicates that the
model predicts very low volatility, suggesting that the price may have reached an
equilibrium.
• This type of behavior, where the price stabilizes after an initial fluctuation, may indicate
that the ARIMA model is capturing the underlying mean-reverting characteristics of the
time series. The model suggests that prices are likely to settle near an average value
after a short-term spike.
• The model does not predict any substantial upward or downward trends after the initial
adjustment. The price is expected to stabilize, with no signs of further large fluctuations
in the near future.

❖ Key takeaway
• Trend Detection: The model captures short-term mean-reverting behavior, showing an
initial spike followed by price stabilization around 3.0. No significant long-term trends
are detected.
• Volatility Capture: The model captures early volatility but predicts a stable price after
the first week of October, implying limited volatility beyond that point.
• Improvement: Incorporating models like LSTM or hybrid approaches could better
capture long-term trends. Using GARCH could improve the model’s ability to forecast
dynamic volatility. Addressing residual non-normality and heteroskedasticity would
enhance robustness

Model performance comparison

Metric ARIMA LSTM


MAE nan 0.00021842650413512744
MSE 9.35648204162056 e-08 6.671413513154018 e-08

RMSE 0.0003058836713788521 0.0002582907956771595


MAPE nan 0.021835715043512845

48
❖ ARIMA Model
• MSE (Mean Squared Error): Very low at 9.35648204162056e-08, suggesting that the
model's predictions are close to the actual values in terms of variance.
• RMSE (Root Mean Squared Error): Low at 0.0003058836713788521, indicating
good performance in terms of error magnitude.
• Inference: While the ARIMA model shows low MSE and RMSE, the presence of nan
for MAE and MAPE indicates potential data quality issues that might hinder its overall
reliability.

❖ LSTM Model
• MAE: 0.00021842650413512744, which is low and indicates that the average
prediction error is small.
• MSE: 6.671413513154018e-08, also very low, suggesting that the model predicts
values that are very close to the actual values.
• RMSE: 0.0002582907956771595, indicating that the model performs well with small
errors in prediction.
• MAPE: 0.021835715043512845, or approximately 2.18%, showing that the model's
predictions are accurate within a small percentage of the actual values.
• Inference: The LSTM model performs well across all metrics, indicating it effectively
captures patterns in the data, making it suitable for this task.

Conclusion

• The LSTM model is the most effective based on its metrics, showing low error rates
and high predictive accuracy.
• The ARIMA model has low error metrics.
• The Logistic Regression model shows moderate performance, with relatively high
percentage errors compared to LSTM, indicating it might not be the best fit for this
data.

49
3.6) SOLANA
Stationarity test:
ADF TEST:
H0: The time series possesses a unit root and is non-stationary.
H1: The time series does not have a unit root (the series is stationary).
KPSS TEST:
H0: The time series is stationary around a deterministic trend (trend-stationary).
H1: The time series is non-stationary (has a unit root).

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 0.517835 0.05 -1.531582 -3.43468 -2.863453 Accept H0
KPSS 0.010000 0.05 1.051244 0.739 0.463 0.347 Reject H0
The series is likely to be non-stationary
Differencing

50
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 1.830 e-08 0.05 -6.417035 -3.434 -2.863 Reject H0
KPSS 1.000 e-01 0.05 0.110194 0.739 0.463 0.347 Accept H0
The series is likely to be stationary.
Model 1: LSTM
Inference from LSTM Model Training and Predictions:
❖ Training Epochs and Loss:
• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased,
indicating that the model was learning and improving over time. The loss values start
at 0.0081 and gradually decrease to as low as 0.0015.
o Improving Loss: The loss reduction shows that the model is fitting the data
better as training progresses.
o Small Loss Values: The low loss values suggest that the model is performing
well during training, as it is minimizing the error effectively.
• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.

❖ Model Evaluation - RMSE:


• RMSE: 0.0394: The Root Mean Squared Error (RMSE) of 0.0394 is relatively low,
indicating that the model has good prediction accuracy on the log returns. RMSE is a
good measure of how well the model fits the data, and lower values represent better
fits.
❖ Accuracy and Trend Capturing:

51
• Trend Alignment: The predicted prices closely follow the actual price trends,
especially in areas where the price fluctuates. This shows that the LSTM model has
effectively captured the major trends and seasonal patterns in the time series data.
• Smooth Fit: The predicted prices (orange dots) are generally smoother than the actual
prices (blue line). The actual prices tend to be more volatile, while the predicted prices
smooth out some of the short-term noise in the data.

❖ Predicted Prices for the Next 40 Days:

• General Trend:
o The predicted prices start from 151.22 on 2024-09-28 and show a very gradual
increase up to 154.59 by 2024-10-06, and then remain fairly flat thereafter.
o After 2024-10-06, there is a slight drop, and prices fluctuate around 152 to 154
until 2024-11-06.

• Prediction Characteristics:
o The predictions do not show significant price volatility or large fluctuations over
the 40-day forecast. This indicates that the model predicts stable price
movement with only minor upward or downward changes.
o Prices are converging to a narrow range toward the end of the 40 days (from
151.55 to 154.59), suggesting a stable market outlook according to the LSTM
model.
❖ Key Observations:
• Good Fit for Smooth Trends: The LSTM model has successfully learned the general
trends in price movements, as indicated by low training loss and RMSE. It is predicting
fairly stable price movements over the forecast period.

52
• Limited Volatility Capture: The model's predictions indicate low price volatility,
which may not fully reflect more chaotic or rapidly changing market conditions.
Depending on the actual market behavior, this could be a limitation in real-world
applicability.

Model 2: ARIMA Model


Inference for ARIMA Model (SARIMAX Results):

❖ Model Overview:
o The ARIMA model used here is ARIMA(5, 1, 0), which indicates:
▪ AR(5): The model uses five lagged values of the time series for the
autoregressive component.
▪ I(1): The data has been differenced once to make the time series
stationary.
▪ MA(0): There is no moving average component in this model.

❖ Key Parameters:
o AR coefficients:
▪ The autoregressive coefficients (ar.L1 to ar.L5) show how the past
values influence the current value. Specifically:
▪ ar.L1 (-0.0081): The first lag has a very small and non-
significant impact on the current price.

53
▪ ar.L2 (0.0297), ar.L3 (0.0425), ar.L4 (0.0658): These
coefficients are positive and statistically significant (p-values <
0.05), indicating that the second, third, and fourth lag values
have a positive influence on the current price.
▪ ar.L5 (-0.1019): This coefficient is negative and statistically
significant, indicating that the fifth lag has a significant negative
influence on the current price.
o sigma^2 (22.35): This represents the estimated variance of the error term. The
low value indicates that the model has a relatively low level of noise.

❖ Model Fit Metrics:


o AIC (8869.684): The Akaike Information Criterion is used to compare models.
The lower the AIC, the better the model. This value indicates the quality of the
model, but it would be more informative when compared with alternative
models.
o BIC (8901.524): The Bayesian Information Criterion, similar to AIC, penalizes
for model complexity. Again, a lower BIC suggests a better model fit.
o Log Likelihood (-4428.842): The log-likelihood measures how well the model
fits the data. Higher values (closer to 0) indicate a better fit.

❖ Residual Diagnostics:

o Ljung-Box (L1) (Q) = 0.01, Prob(Q) = 0.94: The high p-value indicates that
there is no significant autocorrelation in the residuals, which suggests that the
model has adequately captured the patterns in the time series data.

54
o Jarque-Bera (JB = 2531.69, Prob(JB) = 0.00): The Jarque-Bera test indicates
that the residuals are not normally distributed (p-value < 0.05). This suggests
that there may still be some skewness or kurtosis present in the data that the
model has not captured.
o Heteroskedasticity (H = 0.93, Prob(H) = 0.42): The p-value for this test
indicates that there is no significant heteroskedasticity (variance changes over
time) in the residuals.
❖ Forecasted Prices:

o The forecasted prices for the upcoming days are relatively stable, around
142.87, showing minimal fluctuations over the forecast horizon. This suggests
that the model expects the price to stabilize and hover around this level in the
future.
o This lack of significant change in forecasted prices may be an indication that
the ARIMA model is primarily capturing the mean-reverting behavior of the
time series rather than anticipating large swings or trends.
❖ Model Performance:
o Mean Squared Error (MSE = 95.31): This value quantifies the average
squared difference between the actual and predicted values. A lower MSE
indicates better model performance, but its value alone doesn’t offer much
insight without comparison to alternative models.
o Based on the MSE, we can infer that the ARIMA model provides a reasonably
good fit for the data but may not capture short-term price volatility as effectively
as other models (e.g., LSTM).

55
Key Takeaways:
• Trend Detection: The ARIMA model seems to predict a stable price trend in the
forecasted period, which may indicate that the time series is reverting to its mean after
significant fluctuations.
• Volatility Capture: While the ARIMA model does a good job of predicting general
trends, it appears to miss some short-term fluctuations in the data, as shown by the high
Jarque-Bera test statistic.
• Further Improvements: Depending on the specific use case, further tuning (such as
increasing the AR terms or experimenting with seasonal components) or trying
alternative models (such as SARIMA or LSTM) might enhance predictive performance,
especially in the presence of volatility or trends.
In summary, the ARIMA(5,1,0) model provides a good overall fit to the data, but its ability to
predict short-term fluctuations and account for residual skewness and kurtosis may be limited.

Models Performance Comparison:

MODEL MAE MSE RMSE MAPE


ARIMA nan 95.3138 9.7628 nan
LSTM 6.9096 73.3505 8.5644 4.9786

❖ ARIMA Model:
Inference:
• The ARIMA model has a reasonable MSE and RMSE values, indicating that the
predictions are somewhat accurate.
• However, the absence of meaningful MAE and MAPE could point to issues in certain
areas, such as missing or infinite values, which suggest that this model may not have
been robust enough to handle all aspects of the dataset.

❖ LSTM Model:
Inference:
• The LSTM model outperforms the ARIMA model in terms of MSE and RMSE,
indicating that it predicts more accurately.
• The MAE is lower than the ARIMA model, implying that on average, the LSTM
model's predictions are closer to the actual values.

56
• The MAPE of 4.98% indicates that the LSTM model makes more accurate percentage-
based predictions, making it a more reliable model overall for this dataset.

Conclusion:
• LSTM performs the best across all metrics, making it the most accurate model for
predicting this time series.
• ARIMA performs reasonably well, though it is less accurate than LSTM.
Thus, for future time series forecasting tasks, LSTM is the best choice based on these metrics.

3.7) USDC
Model 1: LSTM Model
Model Performance (Epochs and Loss):
❖ Loss Values Across Epochs:
o The training began with an initial loss of 0.0477 and dropped significantly by
the second epoch to 0.0047. By the fifth epoch, the loss further reduced to
0.0033 and continued to hover around this value for the remaining epochs.
o The loss values indicate that the model quickly improved and learned
meaningful patterns in the early epochs but did not see a significant drop after
the fifth epoch, suggesting diminishing returns on further training.
❖ Model Evaluation - RMSE:
o The Root Mean Squared Error (RMSE) achieved is 0.01126, which is
relatively low and implies the model is doing a decent job at predicting future
prices. The low RMSE reflects the accuracy of predictions compared to actual
price values, though there is still room for improvement, particularly in volatile
areas.

57
Graphical Insights (Actual vs Predicted Prices):

❖ Actual and Predicted Price Trends:


o In the graph (from the reference image), the blue line represents the actual
prices, while the orange dots show the predicted prices. The predicted prices
follow the general trends of the actual data fairly well, indicating that the LSTM
model has captured the broad market movements effectively.

o Smoothness of Predictions:
▪ The orange dots (predicted prices) exhibit a smoother trajectory
compared to the actual price line, which is more volatile. This reflects a
characteristic of LSTM models where short-term noise in the data is
smoothed out, capturing broader trends but potentially missing sudden
spikes or drops.

❖ Trend Alignment:
o The predicted prices closely follow the actual prices, particularly in the general
peaks and troughs. However, there are moments where the predicted values
slightly deviate from the actual values, especially around sharp changes. This is
a common limitation of LSTM models, which may struggle with sudden,
dramatic shifts in time-series data.

58
❖ Predicted Prices for the Next 40 Days:

• Gradual Increase:
o The LSTM model forecasts a slight, consistent increase in prices over the next
40 days. Starting from 1.0023 on 2024-09-28, the predictions gradually rise to
1.0062 by 2024-11-06.
o Low Volatility:
▪ The predicted prices do not show significant fluctuations, with daily
increments ranging between 0.0001 to 0.0003. This suggests that the
model expects minimal volatility and stable price movement, which
could be a limitation when applied to more volatile markets.
❖ Model Observations:
• Trend Learning:
o The LSTM model has effectively learned the major price movements but does
not fully capture the high-frequency volatility of the market. The smooth nature
of the predictions suggests that the model is more attuned to long-term trends
than short-term price fluctuations.
• Prediction Stability:
o Over the 40-day forecast, the predicted prices stabilize around a narrow range
of 1.0023 to 1.0062, suggesting a steady market outlook. However, this might
not be reflective of real-world scenarios, especially in volatile markets, where
price movements can be more erratic.
❖ Conclusion and Recommendations:
• Good Trend Prediction:
o The LSTM model is well-suited for capturing broad trends in the price data, as
demonstrated by the close alignment between actual and predicted values and

59
the low RMSE. However, its tendency to smooth out fluctuations means it may
underperform in highly volatile markets.
• Improvement Suggestions:
o Adding more features (e.g., trading volume, external financial data) and fine-
tuning the model’s architecture (such as adjusting the number of LSTM layers
or incorporating dropout layers) could help the model capture more complex
price dynamics and volatility.

Model 2: ARIMA Model


Inference for ARIMA Model (SARIMAX Results):

❖ Model Overview:
• The model used is ARIMA(5, 1, 0), meaning:
o AR(5): The model incorporates five lagged values of the time series for the
autoregressive component.
o I(1): The data has been differenced once to make it stationary.
o MA(0): There is no moving average component in this model.
❖ Key Parameters:
• Autoregressive (AR) Coefficients:
o ar.L1 (-0.6069): The first lag has a large negative and statistically significant
impact on the current price.

60
o ar.L2 (-0.3491): The second lag also has a significant negative impact, though
smaller than the first.
o ar.L3 (-0.3237), ar.L4 (-0.2517), ar.L5 (-0.1681): These lags further contribute
negatively, but their impact decreases with each lag. All of these coefficients are
statistically significant (p-values < 0.05), indicating their relevance in predicting
the current price.
• Sigma² (3.936e-05): This value represents the estimated variance of the error term. The
low value suggests that the model has a small level of noise, indicating strong predictive
performance.
❖ Model Fit Metrics:
• AIC (-15193.797): The Akaike Information Criterion is a measure of model quality,
and the very low value here suggests that the model fits the data well. A lower AIC
indicates better model performance.
• BIC (-15159.950): The Bayesian Information Criterion is similar to AIC but penalizes
for model complexity. The low BIC value confirms a good model fit, but comparisons
with alternative models are necessary to contextualize this performance.
• Log Likelihood (7602.898): The high log-likelihood indicates that the model fits the
data well, as values closer to zero are preferable.
❖ Residual Diagnostics:

• Ljung-Box (L1) (Q = 0.37), Prob(Q) = 0.54: The high p-value shows that there is no
significant autocorrelation in the residuals, meaning the model has captured the
underlying patterns in the data effectively.
• Jarque-Bera (JB = 10,136,496.38), Prob(JB) = 0.00: The very low p-value and high
JB statistic suggest that the residuals are not normally distributed, indicating the

61
presence of significant skewness or kurtosis in the data that the model may not be fully
capturing.
• Heteroskedasticity (H = 0.02), Prob(H) = 0.00: The very low p-value for the
heteroskedasticity test indicates that there is significant heteroskedasticity in the
residuals, meaning the variance of the residuals changes over time.

❖ Forecasted Prices:

• The forecasted prices from 2024-09-28 to 2024-11-06 show very little fluctuation,
ranging from 0.999843 to 0.999827, suggesting that the ARIMA model predicts a stable
trend in the price over this period.
• Stability in Forecasts: The lack of significant change in forecasted prices indicates
that the model is capturing a mean-reverting behavior in the time series, with minimal
expectation of large swings or trends.
❖ Model Performance:
• Mean Squared Error (MSE = 5.54e-08): The extremely low MSE suggests that the
model is performing well, providing accurate predictions with minimal deviation from
actual values. However, this number alone is not sufficient to judge performance
without comparing it to alternative models.
❖ Key Takeaways:
• Trend Detection: The ARIMA model is predicting stable prices in the forecasted
period, suggesting that the time series is reverting to its mean after past fluctuations.
• Volatility Capture: While the ARIMA model captures general trends well, the
significant Jarque-Bera test statistic and heteroskedasticity suggest it struggles to
capture short-term volatility or distributional anomalies.

62
• Further Improvements: The model could benefit from further tuning, such as
incorporating a moving average component (ARIMA(5,1,1) or SARIMA), or exploring
alternative models like LSTM for better volatility and short-term trend capture.
In summary, the ARIMA(5,1,0) model offers strong overall performance with good trend
detection, but it may underperform in capturing price volatility and addressing the non-normal
distribution of residuals.

Model Performance Comparison:

MODEL MAE MSE RMSE MAPE


ARIMA 0.00017 5.54108 0.0002 0.0172
LSTM 0.00070 5.50078 0.00074 0.07013

ARIMA Model:

• Inference:
o The ARIMA model performs exceptionally well with very low errors across all
metrics.
o The MSE (5.54108) and RMSE (0.0002) values indicate highly accurate
predictions that are very close to the actual values.
o A MAE of 0.00017 demonstrates the precision of ARIMA in predicting values.
o The MAPE value of 0.0172% shows minimal percentage error, indicating
reliable forecasting ability for future values in the time series.
o This strong performance suggests that ARIMA effectively captures long-term
trends and reduces overall prediction errors, making it a highly suitable model
for this dataset.

LSTM Model:

• Inference:
o While the LSTM model performs well, it does not surpass the accuracy of the
ARIMA model.
o The MSE (5.50078) and RMSE (0.00074) values are slightly higher compared
to ARIMA, indicating larger prediction errors.
o A MAE of 0.00070 reveals that LSTM is not as precise on average as ARIMA.
o The MAPE value of 0.07013% indicates a higher percentage error compared to
ARIMA, though LSTM still manages to capture patterns effectively.

63
o LSTM’s strength lies in its ability to model complex, non-linear patterns, but in
this dataset, ARIMA outperforms it in terms of accuracy.

Conclusion:

• ARIMA is the most suitable model for this dataset, achieving superior performance
with near-zero errors and minimal prediction deviation.
• LSTM, while capable of handling more complex patterns, does not match ARIMA's
precision, especially in terms of percentage-based error (MAPE).

3.8) XRP

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

64
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 0.005402 0.05 -3.619349 -3.43221 -2.862364 Accept H0
KPSS 0.010000 0.05 3.310496 0.739 0.574 0.347 Reject H0
The series is likely to be non-stationary.

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 5.767e-22 0.05 -11.8943 -3.432 -2.863 Reject H0
KPSS 1.000 e-01 0.05 0.01384 0.739 0.463 0.347 Accept H0
The series is likely to be stationary.

Model 1: LSTM

Inference from LSTM Model Training and Predictions:

❖ Training Epochs and Loss:

• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased over
the course of training, indicating that the model was learning and improving its
predictions. The loss values start at 0.0029 and gradually decrease to 0.00068 by the
later epochs.
o Improving Loss: The consistent reduction in loss values shows that the model
is effectively fitting the data as the training progresses, capturing patterns in the
dataset.

65
o Small Loss Values: The low loss values (reaching as low as 0.00068) suggest
that the model is performing well and minimizing prediction error effectively
during the training process.
• Potential Overfitting: The loss starts to plateau toward the later epochs, indicating that
the model has captured most of the trends. Further training beyond 50 epochs may not
significantly improve the model's performance, signaling potential overfitting if
training continues without improvements.

❖ Model Evaluation - RMSE:


• RMSE: 0.0117: The Root Mean Squared Error (RMSE) of 0.0117 is very low,
demonstrating high prediction accuracy. The low RMSE indicates that the model is
providing reliable forecasts for future prices with minimal deviation from actual values.

• Accuracy and Trend Capturing:

• Trend Alignment: The predicted prices in the first image (orange dots) closely follow
the actual price trends (blue line), especially during periods of price fluctuation. This
suggests that the model has effectively captured major trends and patterns in the time
series data.
66
• Smooth Fit: The predicted prices are smoother compared to the actual prices, which
exhibit more short-term volatility. The LSTM model smooths out some of this noise,
focusing on longer-term trends.
❖ Predicted Prices for the Next 40 Days:

• General Trend:

o The second image shows the forecasted prices over the next 40 days, starting at
0.5683 on 2024-09-28 and gradually decreasing to 0.3449 by 2024-11-06.
o There is a consistent, gradual downward trend throughout the forecast period,
with no significant volatility or large fluctuations.

❖ Prediction Characteristics:

o The predictions do not show significant price volatility or large fluctuations over the
40-day forecast period. This indicates that the model predicts stable price movement
with a consistent downward trend.
o Starting from 0.5683 on 2024-09-28, the prices steadily decrease to 0.3449 by 2024-
11-06. The predicted prices show only minor daily decreases, with no sharp spikes or
rapid changes.
o This smooth decline suggests that the model anticipates low volatility in the market,
with prices gradually falling without major upward or downward fluctuations. The
overall trend is a consistent decrease rather than erratic movement.

67
❖ Key Observations:

o Good Fit for General Trends: The LSTM model has effectively captured the general
trends in the data, with a smooth fit between predicted and actual prices, as indicated
by the low loss and RMSE values.
o Limited Volatility Capture: The model predicts a stable decline in prices over the
forecasted 40 days, with limited price volatility. This may not fully capture chaotic or
rapidly fluctuating market conditions, which could be a limitation in real-world
applications depending on the context.

Model 2: ARIMA Model

Inference for ARIMA Model (SARIMAX Results):

❖ Model Overview:
❖ The ARIMA Model: The model used here is ARIMA(5, 1, 0), which means:

o AR (5): Five autoregressive terms are included, meaning the model uses five
lagged values of the time series.

o I (1): The data is differenced once to ensure stationarity.

o MA (0): There is no moving average component in this model.

68
❖ Key Parameters:

• AR Coefficients:
o ar.L1 (-0.0111): The first lag has a small, non-significant impact on the current
price (p-value > 0.05).
o ar.L2 (-0.0003): The second lag also has a non-significant influence (p-value >
0.05).
o ar.L3 (0.0417), ar.L4 (0.0362), ar.L5 (0.0580): These lags have positive and
statistically significant effects (p-values < 0.05), indicating that the third, fourth,
and fifth lagged values positively influence the current price.
• Error Variance (sigma²): The estimated error variance is 0.00150.00150.0015,
indicating a relatively low noise level in the model.

❖ Model Fit Metrics:


o AIC (-12746.514): This low value suggests a relatively good model fit. However,
it would be most informative when compared to alternative models.
o BIC (-12709.560): Also, a low value, supporting the model fit but again more
meaningful in comparison with other models.
o Log Likelihood (6379.257): Indicates a good model fit, with higher values
indicating a better fit.
❖ Residual Diagnostics:

69
o Ljung-Box (L1) (Q = 0.02, Prob(Q) = 0.89): The high p-value indicates no
significant autocorrelation in the residuals, suggesting the model has adequately
captured the time series patterns.
o Jarque-Bera (JB = 745352.21, Prob (JB) = 0.00): The p-value suggests that
the residuals are not normally distributed, with potential skewness and kurtosis
not captured by the model.
o Heteroskedasticity (H = 0.46, Prob(H) = 0.50): The high p-value implies no
significant.

❖ Forecasted Prices:

o The forecasted prices are relatively stable, hovering around 0.562228 after an
initial drop. This suggests that the model anticipates the price to stabilize at this
level with minimal fluctuations through the forecast horizon.
o The lack of significant change in the forecasted prices indicates that the ARIMA
model is likely capturing the mean-reverting behavior of the time series rather
than predicting any large fluctuations or trends.

70
Model Performance:

• Mean Squared Error (MSE = 0.0006925):


o This low MSE quantifies the average squared difference between the actual and
predicted values, indicating that the model performs well in terms of fit.
However, this MSE alone doesn’t provide much insight without comparison to
alternative models.
o The low MSE suggests that while the ARIMA model fits the overall trend well,
it may not capture short-term price volatility as effectively as some other models
(e.g., LSTM).

Key Takeaways:

• Trend Detection:
o The ARIMA model appears to predict a steady price level in the forecasted
period, likely indicating that the time series is reverting to its mean after an
initial adjustment.
• Volatility Capture:
o While the ARIMA model is effective at predicting the general trend, it seems to
miss out on short-term fluctuations in the data, as shown by the lack of
variability in the forecast after October 6, 2024.
• Further Improvements:
o Depending on the use case, additional tuning (e.g., increasing AR terms or
experimenting with seasonal components) or using alternative models like
SARIMA or LSTM might enhance predictive performance, especially in cases
of price volatility or trends.

In Summary: The ARIMA (5,1,0) model provides a reasonable overall fit to the data,
successfully predicting a stable trend with low MSE. However, its ability to capture short-term
fluctuations may be limited. Further tuning or the adoption of more advanced models could
improve performance, particularly in capturing volatility and short-term variations.

71
Models Performance Comparison:

MODEL MAE MSE RMSE MAPE


ARIMA nan 0.00069 0.02631 nan
LSTM 0.0228 0.00074 0.02726 3.9390

ARIMA Model:

Inference:

• The ARIMA model shows a very low Mean Squared Error (MSE = 0.00069) and Root
Mean Squared Error (RMSE = 0.02631), indicating that it provides accurate predictions
in terms of squared error metrics.

• However, the absence of meaningful Mean Absolute Error (MAE) and Mean Absolute
Percentage Error (MAPE) values suggests that there may be limitations or issues with
this model when handling the full range of data, possibly due to missing values or the
nature of the ARIMA model's predictions on this specific dataset.

LSTM Model:

Inference:

• The LSTM model demonstrates better performance compared to ARIMA, with a lower
MAE (0.0228) and a comparable MSE (0.00074), indicating a higher level of predictive
accuracy.

• The MAPE of 3.9390% shows that the LSTM model's percentage-based errors are
small, making it a more reliable choice for capturing the time series patterns in this
dataset.

Conclusion:

• LSTM is the most accurate model across all metrics, making it the best option for
predicting this dataset.

• ARIMA provides reasonable results, especially with low MSE and RMSE, though its
missing MAE and MAPE values highlight some potential limitations.

72
Based on these metrics, LSTM would be the preferred choice for future time series forecasting
tasks on this dataset, due to its superior accuracy and lower percentage-based error.

3.9) DOGECOIN

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 0.035913 0.05 -2.989225 -3.43282 -2.86263 Accept H0
KPSS 0.010000 0.05 3.052097 0.739 0.463 0.347 Reject H0
The series is likely to be non-stationary

73
Differencing

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 1.830 e-08 0.05 -8.441660 -3.432 -2.862 Reject H0
KPSS 1.000 e-01 0.05 0.025181 0.739 0.463 0.347 Accept H0
The series is likely to be stationary.

Model 1: LSTM

Inference from LSTM Model Training and Predictions:

❖ Training Epochs and Loss:

• Epochs 1-50: The model trained for 50 epochs, and the loss consistently decreased,
indicating that the model was learning and improving over time. The loss values start
at 0.0036 and gradually decrease to as low as 0.0010.

o Improving Loss: The loss reduction shows that the model is fitting the data
better as training progresses.

o Small Loss Values: The low loss values suggest that the model is performing
well during training, as it is minimizing the error effectively.

• Potential Overfitting: Since the loss plateaus at later epochs, it might indicate that the
model has captured most of the trends, and continuing beyond 50 epochs might not
improve results significantly.

74
❖ Model Evaluation - RMSE:

• RMSE: 0.0144: The Root Mean Squared Error (RMSE) of 0.0144 is relatively low,
indicating that the model has good prediction accuracy on the log returns. RMSE is a
good measure of how well the model fits the data, and lower values represent better
fits.

❖ Accuracy and Trend Capturing:

• Trend Alignment: The predicted prices closely follow the actual price trends,
especially in areas where the price fluctuates. This shows that the LSTM model has
effectively captured the major trends and seasonal patterns in the time series data.
• Smooth Fit: The predicted prices (orange dots) are generally smoother than the actual
prices (blue line). The actual prices tend to be more volatile, while the predicted prices
smooth out some of the short-term noise in the data.

❖ Predicted Prices for the Next 40 Days:

75
• General Trend:

o The predicted prices start from 0.109 on 2024-10-01 and show a gradual
decrease to 0.079 by 2024-11-06.

• Prediction Characteristics:

o The predictions do not show significant price volatility or large fluctuations over
the 40-day forecast. This indicates that the model predicts stable price
movement with only minor upward or downward changes.

o Prices are converging to a narrow range toward the end of the 40 days (from
0.109 to 0.079), suggesting a stable market outlook according to the LSTM
model.

❖ Key Observations:

• Good Fit for Smooth Trends: The LSTM model has successfully learned the general
trends in price movements, as indicated by low training loss and RMSE. It is predicting
fairly stable price movements over the forecast period.

• Limited Volatility Capture: The model's predictions indicate low price volatility,
which may not fully reflect more chaotic or rapidly changing market conditions.
Depending on the actual market behavior, this could be a limitation in real-world
applicability.

Model 2: ARIMA Model

Inference for ARIMA Model (SARIMAX Results):

76
❖ Model Overview:

o The ARIMA model used here is ARIMA(5, 1, 0), which indicates:

▪ AR(5): The model uses five lagged values of the time series for the
autoregressive component.

▪ I(1): The data has been differenced once to make the time series
stationary.

▪ MA(0): There is no moving average component in this model.

❖ Key Parameters:

o AR coefficients:

▪ The autoregressive coefficients (ar.L1 to ar.L5) show how the past


values influence the current value. Specifically:

ar. L1 (-0.1585): The first lag has a negative and statistically


significant impact on the current price (p < 0.05). This indicates
that the previous period's price negatively influences the current
price.
ar. L2 (0.0604): The second lag has a positive and statistically
significant impact on the current price, suggesting that the price
from two periods ago positively influences the current price.
ar. L3 (0.1064): The third lag also shows a positive and
significant influence, with a stronger effect compared to the
second lag.
ar. L4 (0.0269): The fourth lag shows a smaller but still
positive and statistically significant impact on the current price.
ar. L5 (-0.2006): The fifth lag has a strong negative influence
on the current price, with a significant impact (p < 0.05),
indicating that past values further back have a dampening effect
on the current price.

o sigma^2 (0.0001): This represents the estimated variance of the error term
(white noise). The very low value suggests that the model has low noise,
meaning the model residuals are small, implying good fit.

77
❖ Model Fit Metrics:

Log Likelihood (8180.997): A higher log-likelihood value suggests a better


fit of the model to the data. Since the log-likelihood is a positive number, it
indicates that the model fits well.

AIC (-16349.995): The Akaike Information Criterion (AIC) is very low,


suggesting a strong model fit. Lower AIC values are preferred.
BIC (-16314.739): Similar to AIC, the Bayesian Information Criterion (BIC)
is also low, reinforcing that the model complexity is balanced with good fit.
HQIC (-16337.229): The Hannan-Quinn Information Criterion (HQIC) is
another metric to evaluate model fit, and its relatively low value further supports
the idea that the model is well-fitted to the data.

❖ Residual Diagnostics:

Ljung-Box (Q = 1.39, p = 0.24): The p-value greater than 0.05 indicates that
the residuals are uncorrelated, implying the model is adequately capturing the
autocorrelations in the data.

Jarque-Bera (JB = 916475.81, p = 0.00): A p-value of 0 for the Jarque-Bera


test suggests that the residuals are not normally distributed, which may indicate
some skewness or kurtosis in the residuals.

78
Heteroskedasticity (213.45, p = 0.00): A significant p-value suggests the
presence of heteroskedasticity in the residuals, meaning the variance of the
residual’s changes over time.

❖ Forecasted Prices:

• Price Stability:
o The forecasted prices show some initial fluctuations in the first few days of
October 2024, with a spike and a dip before stabilizing around the 0.1014 mark.
The model predicts a consistent trend with minimal changes beyond mid-
October, suggesting that the price will stabilize and remain steady for the
remainder of the forecast period.
o This behavior indicates that the ARIMA model is capturing a mean-reverting
process after the initial volatility, projecting stability in future prices.

Key Takeaways:

• Trend Detection:
o The ARIMA (5,1,0) model effectively predicts a stable price trend after early
October. This suggests that the time series has a mean-reverting tendency, and
the model has captured this long-term stability well.
• Volatility Capture:
o While the model's performance is strong in terms of accuracy (as indicated by
the low MSE), it might be missing out on capturing the initial short-term

79
volatility. The forecast shows that after early fluctuations, the model expects
little to no volatility moving forward, which might not fully represent real-world
scenarios with continuous fluctuations.
• Further Improvements:
o If capturing short-term price swings is important, further tuning (e.g., adding
more autoregressive terms or experimenting with alternative models like
SARIMA or even LSTM) could improve the model's ability to handle volatility.
o Alternative models that are more sensitive to short-term changes could offer
better predictions of fluctuations while maintaining long-term stability.

Model Performance Comparison

MODEL MAE MSE RMSE MAPE


LSTM 0.0056 4.8661 0.0069 5.2429
ARIMA 0.0048 4.3437 0.0065 4.4875

❖ ARIMA Model

Inference:

• The MAE of 0.0056 is slightly higher than LSTM, implying that the ARIMA model’s
predictions are marginally less close to the actual values.
• The MSE of 4.8661 and RMSE of 0.0069 indicate that ARIMA's predictions are less
accurate than LSTM in this particular case.
• With a MAPE of 5.2429%, ARIMA performs slightly worse than LSTM in percentage-
based predictions, suggesting it may not handle this dataset as well.

❖ LSTM Model:

Inference:

• The LSTM model demonstrates a solid performance with a MAE of 0.0048 and a
reasonable MSE of 4.3437, which indicates relatively accurate predictions.
• The RMSE of 0.0065 suggests that the LSTM model provides a fairly close
approximation of the actual values.

80
• The MAPE of 4.4875% signifies that the model is reliable for percentage-based
predictions.
• Overall, the LSTM model shows good accuracy and handles this dataset effectively

Conclusion:

• LSTM performs the best across all metrics, making it the most accurate model for
predicting this time series.

Thus, for future time series forecasting tasks, LSTM is the best choice based on these metrics.

3.10) TONCOIN

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary)

H1: The time series is non-stationary (has a unit root).

81
TEST P Value Sig. Test Stat 1% 5% 10% Decision
Value
ADF 0.891952 0.05 -0.500329 -3.44082 -2.866160 - Accept H0
KPSS 0.010000 0.05 2.701556 0.739 0.463 0.347 Reject H0
The series is likely to be non-stationary

Differencing:

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 2.50827e-07 0.05 -5.920892 -3.4408 -2.86160 Reject H0
KPSS 1.000 e-01 0.05 0.167286 0.739 0.463 0.347 Accept H0

The series is likely to be stationary.

Model 1: LSTM

Inference from LSTM Model Training and Predictions:

❖ Training Epochs and Loss:


➢ The model was trained for 50 epochs, with a loss starting at 0.0165 and decreasing
consistently to around 0.0021 by the 10th epoch. This reduction in loss indicates that

82
the model effectively learned the patterns in the data, minimizing prediction errors over
time.
➢ Small Loss Values: The final loss values indicate the model's strong performance,
particularly for time-series forecasting tasks like stock price prediction. The drop in loss
from 0.0165 to 0.0021 suggests that the model is capable of capturing the underlying
trend.
➢ Potential Overfitting: Despite the improvement in loss, it’s important to check for
overfitting, as the model might have reached its limit in terms of additional gains in
accuracy beyond the 50th epoch.
❖ Model Evaluation - RMSE:
➢ RMSE: 0.0612: The Root Mean Squared Error (RMSE) of 0.0612 shows that the model
has decent accuracy in capturing price movements. RMSE values closer to zero indicate
better fits, and here the RMSE suggests the LSTM model is able to predict stock prices
reasonably well.
❖ Accuracy and Trend Capturing:

Accuracy and Trend Capturing:

➢ Trend Alignment: The model successfully captured the price trends and patterns seen
in the historical data, as demonstrated by how closely the predicted prices (orange line)
follow the actual prices (blue line) in the chart. However, some noise or volatility in
actual prices is smoothed out in the predictions, leading to smoother curves in the
forecasted prices.

83
➢ Smooth Fit: The smoother nature of the predicted prices indicates that the LSTM
model was effective in filtering out short-term volatility while capturing broader market
trends, which could be beneficial for long-term investors.

❖ Predicted Prices for the Next 40 Days:

➢ General Trend: Starting from 5.73 on September 28, 2024, the predicted prices show
a steady upward trend, reaching approximately 6.02 by November 6, 2024. The growth
is gradual, indicating stable price movement.
➢ Prediction Characteristics: The forecast does not show any significant price volatility
or dramatic changes, suggesting that the model predicts a relatively stable market for
the next 40 days, with prices fluctuating within a narrow range.
❖ Key Observations:
➢ Good Fit for Smooth Trends: The LSTM model is highly effective at learning long-
term price trends while maintaining low prediction errors, as seen by the low loss and
RMSE values. This is useful for generating forecasts where long-term stability is
expected.
➢ Limited Volatility Capture: Although the model does well in predicting the overall
trend, it appears less capable of capturing short-term volatility. This could pose a
limitation in more volatile or speculative market conditions where rapid price
movements are common.
❖ In summary, the LSTM model provides a reliable forecast for the next 40 days, predicting
steady price growth with limited volatility. While the model is well-suited for capturing

84
broad trends, its prediction smoothness may overlook short-term price fluctuations, which
could be significant in highly dynamic markets.

Model 2: ARIMA Model

Inference for ARIMA Model (SARIMAX Results):

Inference from ARIMA Model Training and Predictions for Cryptocurrency:

❖ Model Overview:

• The model used is ARIMA (5,1,0), which implies:


o AR (5): The autoregressive component uses five lagged values.
o I (1): The time series has been differenced once to make it stationary.
o MA (0): No moving average component is considered.

❖ Key Parameters:

• AR Coefficients:
o ar. L1 (-0.0214): The first lag is small and statistically insignificant, indicating
it has little impact on the current price.

85
o ar. L2 (-0.0240): Similarly, the second lag is insignificant with a minimal
impact.
o ar. L3 (-0.0294): The third lag also lacks significance, suggesting little
influence on price changes.
o ar. L4 (-0.0608): Statistically significant (p = 0.011), meaning the fourth lag
negatively impacts the price.
o ar. L5 (-0.0523): Also, significant (p = 0.025), indicating a strong negative
influence from the fifth lag.
• Sigma² (0.0267): Represents the variance of the error term. A relatively small value
suggests low noise in the model’s predictions.

❖ Model Fit Metrics:

• AIC (-464.183): The Akaike Information Criterion is quite low, indicating a better
model fit relative to others.
• BIC (-437.732): The Bayesian Information Criterion is also low but slightly higher than
AIC, indicating that the model avoids overfitting.
• Log Likelihood (238.091): A positive value suggests the model fits the data relatively
well.

❖ Residual Diagnostics:

• Ljung-Box (Q = 0.00, Prob = 0.98): This p-value suggests no significant


autocorrelation in the residuals, implying the model has successfully captured patterns
in the data.

86
• Jarque-Bera (JB = 1586.37, Prob = 0.00): The residuals are not normally distributed,
which may indicate skewness or kurtosis in the data that has not been fully captured by
the model.
• Heteroskedasticity (H = 19.71, Prob = 0.00): This test indicates there is
heteroskedasticity, suggesting that the variance changes over time.

❖ Forecasted Prices:

• The forecasted prices show minimal fluctuations and suggest a stable price around 6.84
over the forecast horizon. This indicates that the model expects the price to stabilize
without significant volatility.

❖ Model Performance:

• Mean Squared Error (MSE = 1.894): The model provides a decent fit with a relatively
low MSE. However, this model may miss capturing some short-term volatility and
fluctuations due to its simplistic ARIMA structure.

87
7. Key Takeaways:

• Trend Detection: The ARIMA (5,1,0) model forecasts a stable price trend in the
coming days, which may suggest a return to the mean after fluctuations.
• Volatility Capture: Although the model predicts overall trends well, it misses short-
term volatility, as seen from the non-normal residuals (Jarque-Bera test).
• Further Model Improvements: Further tuning (e.g., adding seasonal components or
experimenting with other models like SARIMA or machine learning models like
LSTM) may enhance predictive accuracy, especially in volatile markets like
cryptocurrencies.

Conclusion:

The ARIMA (5,1,0) model provides a reasonably good fit for predicting general trends in
cryptocurrency prices, but may not capture rapid short-term fluctuations due to its simplicity.

Models Performance Comparison:

MODEL MAE MSE RMSE MAPE


ARIMA 1.2744798 1.89440224 1.37637285 23.93294499
LSTM 4.3961050 19.6027089 4.42749466 78.69529471

Inference for the Models (ARIMA and LSTM):

❖ ARIMA Model:

• Mean Absolute Error (MAE: 1.27): The ARIMA model has a relatively high MAE,
indicating that its predictions deviate more from the actual values on average.
• Mean Squared Error (MSE: 1.89): A higher MSE value suggests that the ARIMA
model produces predictions with significant error.
• Root Mean Squared Error (RMSE: 1.38): The RMSE also reflects a considerable
difference between predicted and actual values, implying that the ARIMA model's
predictions are not highly accurate.

88
• Mean Absolute Percentage Error (MAPE: 23.93%): With a MAPE close to 24%,
the ARIMA model's predictions are off by a considerable percentage on average, which
could limit its reliability in this case.
• Conclusion: The ARIMA model shows moderate prediction performance. It is
somewhat accurate but produces considerable error across all metrics, meaning it might
not be suitable for highly accurate short-term predictions in this dataset.

❖ LSTM Model:

• Mean Absolute Error (MAE: 4.40): The LSTM model has a much higher MAE
compared to ARIMA, indicating that the predictions deviate more significantly from the
actual values.
• Mean Squared Error (MSE: 19.60): A very high MSE reveals that the LSTM model
is producing large errors in its predictions, with much worse accuracy compared to
ARIMA.
• Root Mean Squared Error (RMSE: 4.43): Similarly, the high RMSE confirms that the
LSTM model has substantial variance between predicted and actual values.
• Mean Absolute Percentage Error (MAPE: 78.70%): The MAPE indicates that the
LSTM model’s percentage-based error is around 79%, which is extremely high, showing
poor predictive performance.
• Conclusion: The LSTM model significantly underperforms in this case, with large
errors across all metrics. Despite its potential in other time series tasks, it is not a good
fit for this dataset or may require significant adjustments in tuning to improve
performance.

Overall Conclusion:

• LSTM Model: LSTM performs the worst across all metrics, with large prediction
errors and high percentage-based inaccuracies, making it unsuitable for this dataset.
• Recommendation: ARIMA can be chosen for this specific task, as both models
perform similarly and offer reasonable predictions. LSTM, on the other hand, would
need significant improvements or reconfiguration to provide useful results.

89
3.10) TRON

Stationarity test:

ADF TEST:

H0: The time series possesses a unit root and is non-stationary.

H1: The time series does not have a unit root (the series is stationary).

KPSS TEST:

H0: The time series is stationary around a deterministic trend (trend-stationary).

H1: The time series is non-stationary (has a unit root).

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 0.376673 0.05 -1.807645 -3.43298 -2.86270 Accept H0
KPSS 0.010000 0.05 5.079495 0.739 0.463 0.347 Reject H0
The series is likely to be non-stationary

90
Differencing

TEST P Value Sig. Test Stat 1% 5% 10% Decision


Value
ADF 1.387 e-19 0.05 -10.8685 -3.432 -2.8627 Reject H0
KPSS 1.000 e-01 0.05 0.030382 0.739 0.463 0.347 Accept H0
The series is likely to be stationary.

Model 1: LSTM

Inference from LSTM Model Training and Predictions

❖ Training Epochs and Loss:

Epochs 1-50: The model trained for 50 epochs, with a consistent reduction in loss,
showing learning and improvements over time.

➢ Initial Loss: It started with a loss of 0.0083 in the first epoch, dropping to 0.0016 by
the 10th epoch and further reducing to 0.0013 by the end of training.
➢ Small Loss Values: These values indicate that the model has been effective in
minimizing error, suggesting a good fit.
➢ Potential Overfitting: Although the loss continued to decrease, the plateau around
later epochs (around 50) might suggest that further training would not significantly
improve performance.
❖ Model Evaluation - RMSE:

RMSE: The model’s RMSE for the prediction was 0.03705, which is low. This low value
suggests that the model is relatively accurate in predicting stock price movements based
on past trends. RMSE is a strong indicator of the model’s fit, and lower values point to a
better fit.

91
❖ Accuracy and Trend Capturing:
• Trend Alignment: The predicted prices align closely with actual trends, particularly
in periods with significant price fluctuations, demonstrating that the LSTM model
successfully captured major patterns.
• Smoothing of Volatility: The predicted prices tend to smooth out short-term
volatility, showing a less erratic trend compared to the actual data. This suggests that
while the model captures long-term trends, it may not be fully sensitive to rapid
market changes.
❖ Predicted Prices for the Next 40 Days:

❖ General Trend:
➢ Starting at 0.1404 on 2024-09-28, the predicted prices exhibit a steady decline over
the first two weeks, reaching around 0.084 by 2024-11-06.

92
➢ This indicates a downtrend over the next 40 days, with no significant upward
movement or price recovery expected.
❖ Prediction Characteristics:
➢ The model does not predict substantial volatility. Instead, it shows a continuous,
gradual price decline.
➢ Toward the end of the forecast, the predicted prices converge to a narrow range
between 0.0798 and 0.0855, reflecting a stable and less volatile outlook for the
market.
❖ Key Observations:

Good Fit for Smooth Trends: The model has effectively captured the overall trend and
patterns of the time series. The low RMSE and loss values reinforce its ability to predict price
movements with relatively high accuracy.

Limited Volatility Capture: The predictions show reduced volatility, which could be a
limitation, especially if the actual market experiences more chaotic and sudden price shifts.
This suggests that the model is better suited for smoother market conditions and might need
adjustments for highly volatile markets.

Summary:

The LSTM model demonstrates solid performance in capturing general trends, but it exhibits
limitations in responding to short-term volatility. Over the next 40 days, the predicted decline
in prices points to a bearish outlook, with stable and narrow price movements toward the end
of the forecast period.

93
MODEL 2: ARIMA

Inference from ARIMA Model (SARIMAX Results):

❖ Model Overview:

• The model used is ARIMA (5,1,0), which represents:


o AR (5): The autoregressive part uses five lagged values of the time series.
o I (1): The time series has been differenced once to ensure stationarity.
o MA (0): No moving average component is included in the model.

❖ Key Parameters:

• AR Coefficients:
o ar. L1 (-0.0472): The first lag has a statistically significant negative influence
(p-value = 0.000) on the current price.
o ar. L2 (0.1324): The second lag has a positive and highly significant impact
on the price.
o ar. L3 (0.0872): Similarly, the third lag is also positive and significant.
o ar. L4 (-0.1278): The fourth lag has a strong negative effect, with a high
significance level (p-value = 0.000).
o ar. L5 (-0.0343): The fifth lag is negative and statistically significant (p-value
= 0.000).

94
• Sigma² (1.544e-05): This is the variance of the error term, indicating very low noise
in the model, suggesting accurate predictions.

❖ Model Fit Metrics:

• AIC (-20328.903): The Akaike Information Criterion is low, indicating a good fit of
the model.
• BIC (-20294.034): The Bayesian Information Criterion is slightly higher than AIC,
but still low, confirming a good fit while accounting for model complexity.
• Log Likelihood (10170.452): A high log likelihood shows that the model fits the data
well.

❖ Residual Diagnostics:

• Ljung-Box (Q = 0.01, Prob = 0.94): The high p-value indicates that there is no
significant autocorrelation in the residuals, meaning the model has effectively
captured the patterns in the time series data.
• Jarque-Bera (JB = 1048252.76, Prob = 0.00): The residuals are not normally
distributed, as indicated by the low p-value, suggesting skewness or high kurtosis.
• Heteroskedasticity (H = 0.15, Prob(H) = 0.00): There is evidence of
heteroskedasticity, meaning the variance of the residual’s changes over time, which
might affect prediction accuracy.

95
❖ Forecasted Prices:

• The forecasted prices remain relatively stable around 0.1349 over the next 40 days,
showing minimal fluctuations. This indicates the model is capturing a consistent
trend, without large deviations or volatility.
• Price Stabilization: The ARIMA model suggests that the price will stabilize around
this level, without anticipating any significant spikes or drops.

❖ Model Performance:

• Mean Squared Error (MSE = 0.0003668): The MSE is very low, indicating the
model provides a good fit. The low error suggests accurate predictions with minimal
variance from actual values.

❖ Key Takeaways:

• Trend Prediction: The ARIMA (5,1,0) model forecasts stable trends in the future,
indicating that the series may be mean-reverting after fluctuations.
• Volatility: While the model predicts the general trend well, it may not capture short-
term volatility, as suggested by the high Jarque-Bera statistic and the evidence of
heteroskedasticity.
• Further Improvements: The model could potentially be improved by incorporating
seasonal or volatility components (e.g., SARIMA or GARCH) or by exploring more

96
complex machine learning models (like LSTM) for better handling of market
dynamics.

Conclusion:

The ARIMA (5,1,0) model provides a good fit for the data and predicts a stable price trend
for the next 40 days. However, its ability to predict sharp short-term movements may be
limited due to residual skewness and kurtosis. Further model tuning or alternative approaches
may improve performance in more volatile market conditions.

Models Performance Comparison:

MODEL MAE MSE RMSE MAPE


ARIMA 0.018588825 0.000366180 0.0191524939 12.02954170788
LSTM 0.1513101731 0.022915655 0.1513791782 98.551004800

Inference for the Models (ARIMA, LSTM, and Logistic Regression):

❖ ARIMA Model:

• Mean Absolute Error (MAE: 0.0186): The ARIMA model has a relatively low
MAE, indicating that, on average, the model's predictions are quite close to the actual
values.
• Mean Squared Error (MSE: 0.0003668): The low MSE suggests that the ARIMA
model predicts with minimal error, showing it performs well for this time series data.
• Root Mean Squared Error (RMSE: 0.0192): The RMSE is also low, confirming
that the ARIMA model's predictions are accurate with minimal variance from actual
values.
• Mean Absolute Percentage Error (MAPE: 12.03%): The MAPE suggests that, on
average, the ARIMA model’s predictions are off by about 12%, which is a reasonably
accurate performance for forecasting crypto prices.

97
• Conclusion: The ARIMA model provides a good balance between accuracy and
stability. While not perfect, it is reliable for general trend predictions but could
potentially be further improved for capturing short-term fluctuations.

❖ LSTM Model:

• Mean Absolute Error (MAE: 0.1513): The LSTM model has a much higher MAE
compared to ARIMA, indicating that the average deviation from actual values is
larger.
• Mean Squared Error (MSE: 0.0229): The LSTM model has a significantly higher
MSE than ARIMA, showing that the predictions are less accurate and exhibit more
error.
• Root Mean Squared Error (RMSE: 0.1514): Similarly, the RMSE is higher for
LSTM, indicating that its predictions are further from actual values.
• Mean Absolute Percentage Error (MAPE: 98.55%): A very high MAPE indicates
that the LSTM model performs poorly in terms of percentage-based accuracy. It
suggests that on average, the LSTM model's predictions are off by nearly 99%.
• Conclusion: Despite LSTM's potential for capturing complex patterns in time series
data, it performed poorly in this case, likely due to the nature of the dataset or model
configuration.

Overall Conclusion:

• ARIMA Model: ARIMA performs well, providing reasonably accurate predictions


and a strong fit for the data.
• LSTM Model: LSTM significantly underperforms in all metrics, suggesting it is not
well-suited for this dataset or requires further tuning.
• Recommendation: For this specific task ARIMA would be the preferred models due
to their strong performance, while LSTM is not recommended based on these metrics.

98
CHAPTER 4: FINDINGS, IMPLICATIONS AND CONCLUSION

4.1) Findings:

CRYPTOS APPROPRIATE MODEL


Bitcoin (BTC) ARIMA
Ethereum (ETH) LSTM
BNB (BNB) LSTM
Tether (USDT) LSTM
XRP (XRP) LSTM
Dogecoin (DOGE) LSTM
Solana (SOL) LSTM
USDC (USDC) ARIMA
Toncoin (TON) ARIMA
Tron (TRX) ARIMA

❖ ARIMA Model:

ARIMA (AutoRegressive Integrated Moving Average) is a statistical model that works well for
linear, stationary time-series data. It assumes a certain amount of autocorrelation in the
data, which means future values can be predicted by linear combinations of past values. This
model also performs well with short-term dependencies in data.

Reasons why some cryptocurrencies suit ARIMA:

• USDC, Bitcoin, TRON: These cryptocurrencies tend to have more stable, consistent
price movements compared to more volatile ones.
• USDC (USD Coin) is a stablecoin, which means it’s designed to maintain a stable value
over time, typically pegged to the U.S. dollar. Its price fluctuations are minimal and
predictable, making it more suitable for a linear model like ARIMA.
• Bitcoin has historically shown some degree of long-term trends and stationarity,
meaning ARIMA can identify patterns over time and make reasonable forecasts based
on past data.
• TRON has less extreme volatility compared to many altcoins, making its price patterns
more amenable to linear modelling over certain periods.

99
❖ LSTM Model:

LSTM (Long Short-Term Memory) is a type of deep learning model particularly suited for
non-linear, non-stationary time-series data. It’s powerful at capturing long-term
dependencies in the data, making it ideal for handling complex patterns, sudden shifts, and
high volatility. LSTM can model both short-term and long-term trends by learning from the
data over time.

Reasons why some cryptocurrencies suit LSTM:

• Solana, ETH, XRP, Dogecoin, TONCOIN, BNB, Tether: These cryptocurrencies are
more volatile and tend to exhibit complex price behaviours with non-linear trends.
• Solana, ETH (Ethereum), XRP: These are high-growth, highly volatile altcoins. Their
prices are influenced by a wide range of factors such as network updates, partnerships,
and market sentiment, which leads to non-linear and unpredictable trends. LSTM's
ability to capture long-term dependencies and non-linearity makes it more effective for
these assets.
• Dogecoin: Known for its sudden surges and social media-driven price jumps,
Dogecoin's price movements are highly non-linear and hard to predict with a simple
linear model, making LSTM a better fit.
• TONCOIN and BNB: These cryptocurrencies may exhibit sporadic price spikes due
to market activities or ecosystem changes, which LSTM can adapt to by capturing those
long-term dependencies.
• Tether (USDT): While Tether is a stablecoin like USDC, its trading volume and
liquidity patterns can have short-term fluctuations that are better captured by LSTM
due to its ability to handle more complex, non-linear trends.

Key Differences in Model Performance:

• ARIMA excels with cryptocurrencies that have more predictable, linear price
movements, making it suitable for stablecoins and relatively less volatile assets.
• LSTM performs better with volatile, non-linear time series, capturing more complex
patterns and shifts in high-growth or volatile cryptocurrencies.

100
4.2) Implications

Ethereum:

The non-stationary nature of the Ethereum price series requires transformations like
differencing for accurate modelling, enabling reliable predictions with ARIMA. However,
LSTM outperforms ARIMA in price prediction by effectively capturing long-term
dependencies and complex, non-linear relationships, resulting in lower prediction errors
(RMSE = 0.0371, MAE = 0.0244) and better adaptability to market fluctuations. While ARIMA
is suitable for short-term trends, its limitations with extreme volatility make LSTM the superior
choice for forecasting in the cryptocurrency market.

Bitcoin:

The stationarity tests (ADF and KPSS) reveal that Bitcoin's price series is non-stationary,
necessitating differencing for accurate forecasting, particularly in the volatile cryptocurrency
market. The ARIMA model stands out for its simplicity, interpretability, and effectiveness in
handling non-stationary data, boasting superior performance metrics like MAE, MSE, and
RMSE, along with robust diagnostic capabilities. In comparison, the LSTM model achieves an
RMSE of 0.0356, demonstrating good predictive accuracy but underestimating sharp upward
movements and lagging during rapid price changes. While ARIMA generally outperforms
LSTM in handling Bitcoin's price dynamics, both models exhibit limitations in predicting
sudden fluctuations inherent to the cryptocurrency market.

Solana:

Analysis of Solana's time series data indicates non-stationarity, confirmed by ADF and KPSS
tests, with differencing resulting in stationarity. The LSTM model outperforms ARIMA
across all evaluation metrics (MSE, RMSE, MAE, and MAPE), effectively capturing trends
and indicating a stable price outlook, though potential overfitting is suggested by plateauing
training loss. In contrast, the ARIMA(5, 1, 0) model fits mean-reverting behavior and indicates
stable trends but struggles with short-term volatility, as shown by a high Jarque-Bera test
statistic indicating residual non-normality. While LSTM demonstrates superior predictive
power, a complementary approach that combines insights from both models could enhance
forecasting accuracy and provide a deeper understanding of market dynamics for informed
trading strategies.

101
USDC:

The analysis of USDC price trends indicates that the ARIMA model, with its low error metrics,
is effective for predicting trends in stable market conditions, making it a reliable tool for
financial analysts and traders in strategic planning and risk management. Conversely, while the
LSTM model underperformed in accuracy, its ability to model complex, non-linear patterns
offers potential for capturing short-term price fluctuations, particularly in volatile markets. This
suggests that integrating additional features into LSTM models could enhance predictive
accuracy. Overall, the findings emphasize the importance of selecting suitable forecasting
methods based on data characteristics and the need for thorough model evaluation to improve
performance in dynamic financial markets.

Tether:

The stationarity of the Tether price time series, confirmed by ADF and KPSS tests, enhances
forecasting accuracy and model selection, making reliable predictions possible with techniques
like ARIMA and LSTM. The LSTM model excels in capturing long-term dependencies and
trends, showing low error rates and making it particularly effective for informed trading
decisions in the volatile cryptocurrency market. Conversely, while the ARIMA model offers
valuable insights into overall market behavior and long-term trends, its mean-reverting nature
limits its effectiveness during high volatility and rapid price fluctuations. Thus, while LSTM
is preferred for accurate forecasting in dynamic environments, ARIMA can complement it
by providing foundational insights into the price structure.

BNB:

The analysis of BNB price prediction highlights critical implications regarding stationarity,
with both ADF and KPSS tests confirming the original time series was non-stationary.
Differencing was successfully applied to stabilize the series, enhancing the reliability of
subsequent modelling efforts. The Long Short-Term Memory (LSTM) model demonstrated
impressive predictive capabilities, achieving a Root Mean Square Error (RMSE) of 0.0122,
although it struggled to capture sharp price surges. In contrast, the ARIMA model showed less
favourable results, with high Mean Squared Error (MSE) and RMSE values, indicating limited
forecasting reliability. While ARIMA passed the Ljung-Box test for autocorrelation, the
residuals indicated non-normality, suggesting potential enhancements like integrating
Generalized Autoregressive Conditional Heteroskedasticity (GARCH) to better capture market

102
volatility. Overall, the LSTM model is prioritized for BNB price forecasting due to its
superior performance, with continuous monitoring and retraining essential to adapt to the
dynamic nature of the cryptocurrency market.

Toncoin:

The analysis of TONCOIN price prediction through stationarity testing and model evaluation
revealed crucial insights into the data characteristics and model performances. Both the ADF
and KPSS tests confirmed that the original time series was non-stationary, necessitating
differencing to stabilize the data for reliable forecasting. Once differenced, the series passed
both tests, indicating it was stationary and suitable for modelling with ARIMA and LSTM,
which depend on consistent statistical properties. The LSTM model showed potential in
learning underlying patterns, as indicated by a significant decrease in training loss from 0.0165
to 0.0021 over 50 epochs; however, a high RMSE of 4.43 suggested challenges in capturing
short-term volatility. Conversely, the ARIMA(5,1,0) model demonstrated better performance,
with lower MAE (1.27) and MSE (1.89), making it more effective for detecting stable price
trends, despite limitations in handling rapid fluctuations. Overall, ARIMA emerged as the
more reliable choice for forecasting TONCOIN prices, especially given its adaptability to the
nature of the time series data.

TRON:

The analysis of TRON price prediction indicates that stationarity tests confirm the time series
becomes stationary after differencing, with ADF and KPSS tests rejecting the null hypothesis
of non-stationarity. The ARIMA model demonstrates lower error metrics (MAE, MSE, RMSE),
effectively capturing stable trends and providing a reliable forecasting tool for relatively stable
market conditions, supported by low AIC and BIC values. In contrast, the LSTM model shows
significant limitations, with higher error metrics and poor predictive accuracy, struggling to
capture short-term volatility, which could pose risks for trading strategies. Overall, ARIMA
significantly outperforms LSTM in accuracy and reliability, making it the preferred model
for forecasting TRON prices, especially for capturing long-term trends and ensuring consistent
predictions.

103
Dogecoin:

The analysis of Dogecoin's price prediction reveals that the LSTM model outperforms the
ARIMA model in overall accuracy and predictive capability. The LSTM model achieved a
lower RMSE (0.0069) compared to ARIMA's RMSE (0.0065), indicating better handling of
complex time series patterns and capturing trends more effectively. However, ARIMA
demonstrated slightly better performance metrics in terms of MAE and MSE, suggesting its
robustness in providing reliable forecasts, especially in stable market conditions. Despite
ARIMA's strong performance, the findings suggest that the LSTM model's ability to model
non-linear relationships and capture intricate dynamics in Dogecoin's price data positions it as
the superior choice for future forecasting tasks, particularly in volatile markets.

XRP:

The stationarity tests for XRP's price series revealed non-stationarity initially, but after
differencing, both the ADF and KPSS tests confirmed stationarity. In comparing models, the
LSTM model trained for 50 epochs, showing a low RMSE of 0.0117 and capturing general
trends well, though it smoothed out short-term volatility. The ARIMA model (5,1,0) also
showed a good fit, with a lower MSE (0.00069), capturing steady price levels but missing short-
term fluctuations. Overall, the LSTM model proved more accurate, with better performance
across metrics, making it the preferred choice for predicting XRP prices.

Crypto Index (BITW):

The LSTM model's superior performance in predicting the Bitwise 10 Crypto Index (BITW)
with 85% accuracy highlights its effectiveness in capturing the nonlinear patterns of volatile
cryptocurrency markets, offering investors better decision-making tools and risk management
strategies. In contrast, the ARIMA model’s 50% accuracy reveals its limitations in this dynamic
environment, suggesting that traditional linear models may pose substantial risks for investors.
This analysis advocates for the adoption of advanced machine learning techniques like LSTM
for more reliable forecasting and encourages financial institutions to develop robust
frameworks for real-time analysis, driving innovation in cryptocurrency asset management.

104
4.3) Conclusion:

I. Stationarity and Differencing:

The analysis underscores the necessity of addressing non-stationarity in price series through
differencing, as evidenced in cryptocurrencies like Ethereum, Bitcoin, and TRON. This
transformation is crucial for reliable modelling and forecasting.

However, while ARIMA excels in handling stationary data, LSTM’s architecture allows it to
learn from sequences, making it better suited for capturing the underlying dynamics of non-
stationary data once transformed.

II. Model Performance:

LSTM consistently outperforms ARIMA in capturing complex, non-linear relationships and


long-term dependencies across several cryptocurrencies, particularly in volatile market
conditions. For example, LSTM achieves notable accuracy (85%) when predicting BITW,
highlighting its ability to navigate the intricate patterns of the cryptocurrency market.

ARIMA, while effective in simpler scenarios, demonstrates limitations in adapting to


extreme price movements, often underestimating sharp fluctuations. This is particularly evident
with Bitcoin and Dogecoin, where ARIMA captures stable trends but struggles with short-term
volatility, leading to lower predictive accuracy in rapidly changing environments.

III. Strategic Implications for Investors:

The findings advocate for a shift towards advanced machine learning techniques like LSTM
for more accurate and reliable forecasting in the cryptocurrency market. Investors and financial
institutions are encouraged to develop frameworks that leverage LSTM's capabilities,
particularly for risk management and decision-making.

ARIMA can still play a complementary role by providing foundational insights into price
structures and stable trends, especially in less volatile scenarios. Integrating both approaches
may yield enhanced forecasting accuracy and a deeper understanding of market dynamics, thus
empowering investors to make more informed trading decisions.

The analysis reveals that while traditional models like ARIMA have their merits, the
complexity and volatility of the cryptocurrency market necessitate the adoption of more
sophisticated techniques like LSTM for effective forecasting. This strategic approach can

105
significantly improve risk management and decision-making processes, ensuring stakeholders
remain competitive in an ever-evolving financial landscape.

CHAPTER 5: REFERENCES

https://www.investing.com/crypto/bitcoin/historical-data
https://www.investing.com/indices/investing.com-eth-usd
https://www.investing.com/indices/investing.com-usdt-usd
https://www.investing.com/indices/investing.com-bnb-usd
https://www.investing.com/indices/investing.com-sol-usd
https://www.investing.com/crypto/tether/usdt-usdc
https://www.investing.com/indices/investing.com-xrp-usd
https://www.investing.com/crypto/dogecoin
https://www.investing.com/crypto/toncoin/ton-usd
https://www.investing.com/crypto/tron/historical-data
https://finance.yahoo.com/quote/BITW/history/

DRIVE LINK(Workings): https://drive.google.com/drive/folders/1-


rP4Wd6V4bwKiu7VV9StidNYc-4rxeq_?usp=sharing

106

You might also like