Stock Return Prediction using Machine Learning-Based Techniques
The paper attempts to forecast the intraday return of HNX index by using 3 machine learning models: Support Vector Machine, Random Forest, and Extra-Trees Classifier. Kernel principal component analysis is used for feature extraction and dimension reduction. The prediction performance is compared to the classic Logistic Regression’s. Our empirical results show that Extra-Trees Classifier has the highest prediction accuracy of about 55% which outperforms Logistic Regression by about 0.6%. Although both Extra-Trees Classifier and Random Forest (RF) are based on the same approach, the former always obtains better prediction performance. Besides, while not providing the optimal results, Support Vector Machine seems not to depend on the number of features and training length.
 E. F. Fama, & K. R. French, “Dividend yields and expected stock returns”, Journal of Financial Economics, 3-25, 1988.
 S. P. Kothari & J. Shanken, “Book-to-market, dividend yield, and expected market returns: A time-series analysis”, Journal of Financial Economics, 169-203, 1997.
 J. Y. Campbell, “Stock returns and the term structure”, Journal of Financial Economics, 373-399, 1987.
 A. Goyal & I. Welch, “Predicting the equity premium with dividend ratios”. Management Science, 639-654, 2003.
 A. Ang & G. Bekaert, “Return predictability: Is it there?”, Review of Financial Studies, 651-707, 2007.
 Z. Guo, H. Wang, Q. Liu & J. Yang, “A Feature Fusion Based Forecasting Model for Financial Time Series”, PLoS ONE 9(6), 2014.
 S. Pyo, J. Lee, M. Cha & H. Jang, “Predictability of machine learning techniques to forecast the trends of market index prices: Hypothesis testing for the Korean stock markets”, PLoS ONE 12(11), 2017.
 T. Kim & H. Y. Kim, “Forecasting stock prices with a feature fusion LSTM-CNN model using different representations of the same data”, PLoS ONE 14(2), 2019.
 M. S. Ismail, M. S. Noorani, M. Ismail, and F. A. Razak, “Predicting next day direction of stock price movement using machine learning methods with persistent homology: Evidence from Kuala Lumpur Stock Exchange”, Applied Soft Computing, 2020.
 W. Li & F. Mei, “Asset returns in deep learning methods: An empirical analysis on SSE 50 and CSI 300”, Research in International Business and Finance, 2020.
 C. Lohrmann & P. Luukka, “Classification of intraday S&P500 returns with a Random Forest”. International Journal of Forecasting, 390-407, 2019.
 Y. Zhang, F. Ma & B. Zhu, “Intraday momentum and stock return predictability: Evidence from China”, Economic Modeling, 2018.
 T. Geva & J. Zahavi, “Predicting intraday stock returns by integrating market data and financial news reports”, Mediterranean Conference on Information Systems, 2010.
 I. T. Jolliffe & J. Cadima, “Principal component analysis: a review and recent developments”, Philos Trans A Math Phys Eng Sci, 374(2065):20150202, 2016.
 Schölkopf & Bernhard, "Nonlinear Component Analysis as a Kernel Eigenvalue Problem", Neural Computation, 10 (5), 1299–1319, 1998.
 C. Cortes & V. N. Vapnik, "Support-vector networks", Machine Learning, 20 (3), 273–297, 1995.
 B. Kamiński, M. Jakubczyk & P. Szufel, "A framework for sensitivity analysis of decision trees", Central European Journal of Operations Research, 26 (1), 135–159, 2017.
 L. Breiman, "Random Forests", Machine Learning, 45 (1), 5–32, 2001.
 P. Geurts, D. Ernst & L. Wehenkel, “Extremely randomized trees”, Machine Learning, 63, 3–42, 2006.
 S. K. Palei & S. K. Das, "Logistic regression model for prediction of roof fall risks in bord and pillar workings in coal mines: An approach", Safety Science, 47, 88–96, 2009.
 A. Burkov, “The Hundred-Page Machine Learning Book”, 2019.