IFT6759
Winter 2025
Advanced Projects in Machine Learning
Task 3: Predicting Stock Market Trends
Description
This project aims to leverage advanced machine learning techniques to predict stock market trends based on historical
data and various economic indicators. The stock market is inherently complex, exhibiting non-linear, dynamic, and
noisy behavior influenced by global events, market sentiment, and macroeconomic factors. The primary goal is to
build robust predictive models capable of forecasting stock prices or market trends over short and long-term periods.
Data
An important aspect of this project is gaining experience with unstructured datasets and working with API calls.
To encourage creativity and problem-solving, we do not mandate a fixed dataset. Instead, students are expected to
strategize and identify suitable data sources for their analysis. Some suggestions include:
• Historical stock prices from platforms like Yahoo Finance or Google Finance.
• Financial news articles from reputable websites.
• Social media sentiment data gathered from platforms such as Twitter or Reddit.
• Free-tier APIs, such as Polygon, for financial and market data.
Related Works
This project builds on a rich body of literature and existing methodologies in the intersection of finance and machine
learning. Below are key areas of relevant work:
Time Series Forecasting Models
Traditional methods like ARIMA and GARCH have long been used for modeling financial time series, capturing
linear dependencies and volatility clustering in stock prices [1]. Advanced deep learning models such as Long Short-
Term Memory (LSTM) networks [2] and GRUs have shown promise in capturing complex temporal dependencies in
financial data. Transformer-based architectures like Temporal Fusion Transformers (TFT) have recently emerged as
powerful tools for handling long-term dependencies and integrating multimodal data [3].
Feature Engineering and Sentiment Analysis
Research has highlighted the importance of combining technical indicators (e.g., RSI, MACD) with external factors
like news sentiment and social media analysis for better predictive accuracy. Natural Language Processing (NLP)
methods, such as BERT [4] and FinBERT [5], are increasingly used to extract sentiment signals from unstructured
text data.
Tree-Based and Ensemble Learning Methods
Models like Random Forests [6], Gradient Boosting Machines (e.g., XGBoost [7], LightGBM [8]) have been effective
in building robust, interpretable stock prediction models, especially when dealing with tabular datasets.
1
Hybrid and Multi-Modal Models
Recent works integrate multimodal data—such as numerical, textual, and image-based information—to improve
market predictions. Examples include combining price data with sentiment and macroeconomic indicators [9, 10].
Expectations
This project aims to provide a comprehensive understanding of applying machine learning techniques to financial
time series data. By the end of the project, students should be able to preprocess and analyze large datasets of stock
prices and related indicators, build and evaluate predictive models, and interpret the results to derive actionable
insights. The emphasis is placed on robust data engineering, leveraging domain knowledge, and understanding the
limitations and assumptions of predictive models in the dynamic and noisy stock market environment. A significant
focus will be on the generalizability and explainability of the models. Students are expected to ensure their models
perform well across different market conditions and can provide insights into their predictions. To validate these
aspects, all models must be rigorously tested using real stock price data.
References
[1] Box, G. E., Jenkins, G. M., Reinsel, G. C., & Ljung, G. M. (2015). Time Series Analysis: Forecasting and
Control. Wiley.
[2] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
[3] Lim, B., Zohren, S., & Roberts, S. (2021). Temporal Fusion Transformers for Interpretable Multi-horizon Time
Series Forecasting. arXiv preprint arXiv:2106.13008.
[4] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Trans-
formers for Language Understanding. arXiv preprint arXiv:1810.04805.
[5] Araci, D. (2019). FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining.
arXiv preprint arXiv:1908.10063.
[6] Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
[7] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, 785-794.
[8] Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., ... & Liu, T. Y. (2017). LightGBM: A Highly Efficient
Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30.
[9] Tsantekidis, A., Passalis, G., Tefas, A., & Kakadiaris, I. A. (2017). Forecasting stock prices from the limit order
book using deep learning models. Expert Systems with Applications, 88, 195-206.
[10] Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market
predictions. European Journal of Operational Research, 268(3), 969-982.