進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2006202023232400
論文名稱(中文) 整合新聞資訊之情緒、語義及圖像特徵對國際金融指標之預測
論文名稱(英文) International Financial Indices Prediction Incorporating News Information Based on Sentiment, Semantic and Image Representations
校院名稱 成功大學
系所名稱(中) 統計學系
系所名稱(英) Department of Statistics
學年度 108
學期 2
出版年 109
研究生(中文) 翁萃瑩
研究生(英文) Tsui-Ying Weng
學號 R26071024
學位類別 碩士
語文別 英文
論文頁數 97頁
口試委員 指導教授-鄭順林
口試委員-顏盟峯
口試委員-林良靖
中文關鍵字 國際股票市場  圖像特徵  語義特徵  情緒分析 
英文關鍵字 Multivariate Adaptive Regression Splines  Image Representation  Semantic Representation  Sentiment Analysis 
學科別分類
中文摘要 為了更完整的保留新聞資訊對金融指標預測的影響,本研究整合了新聞所擷取出的情緒、語義及圖像特徵,將新聞資訊從一般單純考慮字的層級擴展到以句子,甚或是以段落為單位的層級。選用的金融指標包含來自美國地區的標普500指數(S&P 500)、大陸的上證指數(SSE)、香港的恆生指數(HSI)以及台灣的綜合加權指數(TWII),另也挑選了這四個地區的各50個個股進行資料探勘。在模型方面,我們使用了有解釋及推論能力的統計模型--多元適應性雲形迴歸 (Multivariate Adaptive Regression Splines, MARS),MARS模型的特色在於能夠對資料進行局部配適,也能容納不同變數間的交互作用。

為了驗證預測的一致性,本研究共涵蓋了兩種不同的資料切割方式來決定訓練集和測試集,分別是以2018年一整年作為訓練集,並使用2019年初至3月中的資料作為測試集的情況,以及只訓練到2018年11月,並且利用2018年12月後至2019年3月中的資料作為測試集的情形。選擇2018年12月為切點的原因在於,中美貿易戰的加劇導致股市更加震盪,我們期望以上兩種切割方式的結果可以互相呼應,以強化研究的推論。根據實驗結果,相比於只考慮一般技術指標的模型,若使用本研究所提出的研究架構,在模型中加入新聞情緒特徵、語義特徵及圖像特徵能夠讓綜合指標的預測得到提升,此情形針對S&P 500和TWII尤其明顯。此外,相較於加入一般技術指標及以字為單位之情緒特徵的模型,使用本研究之研究架構也能使多數的情況獲得改善。
英文摘要 The main goal of this study is to integrate news information more thoroughly into the prediction of financial indices. To accomplish this purpose, we combine news information based on sentiment, semantic and image aspects together to attain the information on the basis of word, sentence as well as paragraph. Four financial indices in different regions are considered, inclusive of Standard and Poor's 500 Index (S&P 500) in the United States, Shanghai Stock Exchange Index (SSE) in China, Hang Seng Index (HSI) in Hong Kong and Taiwan Stock Exchange Weighted Index (TWII) in Taiwan. In addition, 200 individual stocks among these four regions are taken into account as well. With regard to the prediction model, we take advantage of Multivariate Adaptive Regression Splines (MARS) approach for its capability of making inference. It also features the ability to fit locally and accommodate higher interaction terms.

To validate the consistency of the prediction results, we have two different splits of datasets in the study. The first training set is from January 1st, 2018 to December 31st, 2018 and the testing set is from January 1st, 2019 to March 15th, 2019, while the other training set and testing set are from January 1st, 2018 to November 30th, 2018 and from December 1st, 2018 to March 15th, 2019 respectively. The reason is that the US-China tradewar became more intense in December, 2018, resulting in the unstable status in the stock market. We anticipate that the results between the two data splits can correspond to each other. According to the results, we summarize that the models containing news sentiment features, semantic features as well as image representations, which are obtained through our proposed procedure, outperform the one with common basic and technical features only, especially for S&P 500 and TWII. Compared to the model with basic, technical and extra news word-level sentiment features in the same time, we find that the sentence-level sentiment features, semantic and image representations we construct are helpful for the prediction in many cases.
論文目次 摘要 i
Abstract ii
Acknowledgements iii
Table of Contents iv
List of Tables vi
List of Figures vii
Chapter 1. Introduction 1
1.1. Research Background............................. 1
1.2. Research Process ............................... 3
1.3. Data Description ............................... 6
1.3.1. Historical Stock Dataset ....................... 6
1.3.2. Financial News Dataset........................ 7
Chapter 2. Related Work 10
2.1. Stock Prediction................................ 10
2.1.1. Statistical Methods .......................... 10
2.1.2. Technical and Sentiment Analysis .................. 12
2.2. Advanced News Feature Extraction...................... 14
2.2.1. Text Features:BERT ......................... 14
2.2.2. Image Representations ........................ 16
Chapter 3. Methodology 18
3.1. Bidirectional Encoder Representations from Transformers (BERT) . . . . . 18
3.2. Multivariate Adaptive Regression Splines (MARS) . . . . . . . . . . . . . 20
Chapter 4. Features Extraction 24
4.1. Basic and Technical Features Selection.................... 24
4.1.1. Features Selection........................... 24
4.1.2. Lag Setting .............................. 27
4.2. News Sub-Sampling Process ......................... 28
4.2.1. Sub-Sampling Method 1 (NS1): Based on Significant Events . . . . 28
4.2.2. Sub-Sampling Method 2 (NS2): Based on 8 Integrated Indices . . . 28
4.2.3. Sub-Sampling Method 3 (NS3): Based on 4 Integrated Indices . . . 29
4.3. News Features Extraction........................... 30
4.3.1. Word-Level Sentiment Features ................... 30
4.3.2. Sentence-Level Sentiment Features.................. 31
4.3.3. Novel Features: Semantic Features.................. 32
4.3.4. Novel Features: Image Representations . . . . . . . . . . . . . . . 34
4.3.5. Lag Setting .............................. 42
4.3.6. Modified News Features ....................... 42
Chapter 5. Evaluation 44
5.1. Experiment Structures............................. 44
5.2. Threshold for Sentence-level Sentiment ................... 47
5.3. Case Studies for Modified Features...................... 47
5.3.1. Features Selection Based on Random Forest . . . . . . . . . . . . . 48
5.3.2. Dimensions Reduction for BERT Embeddings Based on Random Forest 49
5.4. Evaluation Metrics .............................. 52
5.5. Modeling Results for Integrated Indices ................... 53
5.5.1. Prediction Results........................... 53
5.5.2. Important Features .......................... 56
5.6. Data Mining Searching Results on Individual Stocks . . . . . . . . . . . . 59
5.6.1. US Individual Stocks ......................... 60
5.6.2. CN Individual Stocks......................... 61
5.6.3. HK Individual Stocks......................... 63
5.6.4. TW Individual Stocks......................... 65
5.7. Discussion of Implementation Time ..................... 70
Chapter 6. Conclusion 74
6.1. Summary ................................... 74
6.1.1. Summary for Integrated Indices Modeling . . . . . . . . . . . . . . 74
6.1.2. Summary for Data Mining Searching on Individual Stocks . . . . . 76
6.2. Discussion................................... 76
6.3. FutureWork.................................. 77
References 79
Appendix A. List of Individual Stocks 84
Appendix B. Vital Events Wordlists for News Selection 85
Appendix C. Mean News Counts among Months and Sources 86
Appendix D. Adjusted LMD and CLMD Wordlists 90
Appendix E. Wordlists for Specific Categories 91
Appendix F. Features Summary 92
Appendix G. MARS Prediction Accuracy for Individual Stocks in Accuracy Order 93
參考文獻 [1] Ãlvarez, M. J., Gonzãlez, e., Bianconi, F., Armesto, J., and Fernãndez, A. Colour and texture features for image retrieval in granite industry. DYNA 77 (03 2010), 121–130.
[2] Aiolfi, M., and Favero, C. A. Model uncertainty, thick modelling and the predictability of stock returns. Journal of Forecasting 24, 4 (2005), 233–254.
[3] Ajinkya, B. B., and Jain, P. C. The behavior of daily stock market trading volume. Journal of Accounting and Economics 11, 4 (1989), 331–359.
[4] Araci, D.Finbert: Financial sentiment analysis with pre-trained language models, 2019. arXiv preprint arXiv:1908.10063.
[5] Awartani, B. M., and Corradi, V. Predicting the volatility of the S&P-500 stock index via garch models: the role of asymmetries. International Journal of Forecasting 21, 1 (2005), 167–183.
[6] Bollen, J., Mao, H., and Zeng, X. Twitter mood predicts the stock market. Journal of Computational Science 2, 1 (2011), 1–8.
[7] Chan, Y.-H. Financial indices prediction through integrating sentiment analysis with factors of international stock markets. Master’s thesis, Department of Statistics National Cheng Kung University, 2019.
[8] Chen, F., Yuan, Z., and Huang, Y. Multi-source data fusion for aspect-level sentiment classification. Knowledge-Based Systems 187 (2020), 104831.
[9] Chen, K., Zhou, Y., and Dai, F. A lstm-based method for stock returns prediction: A case study of china stock market. In 2015 IEEE International Conference on Big Data (Big Data) (2015), pp. 2823–2824.
[10] Cui, Y., Che, W., Liu, T., Qin, B., Yang, Z., Wang, S., and Hu, G. Pre-training with whole word masking for chinese bert, 2019. arXiv preprint arXiv:1906.08101.
[11] Deng, L., and Wiebe, J. MPQA 3.0: An entity/event-level sentiment corpus. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Denver, Colorado, May–June 2015), Association for Computational Linguistics, pp. 1323–1328.
[12] Deng, S., Mitsubuchi, T., Shioda, K., Shimada, T., and Sakurai, A. Combining technical analysis with sentiment analysis for stock price prediction. In 2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing (2011), pp. 800–807.
[13] Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding, 2018. arXiv preprint arXiv:1810.04805.
[14] Ding, X., Zhang, Y., Liu, T., and Duan, J. Deep learning for event-driven stock pre- diction. In Proceedings of the 24th International Conference on Artificial Intelligence (2015), IJCAI’15, AAAI Press, p. 2327–2333.
[15] Friedman, J., Hastie, T., and Tibshirani, R. The elements of statistical learning, vol. 1. Springer Series in Statistics New York, 2001.
[16] Friedman, J. H. Multivariate adaptive regression splines. The Annals of Statistics 19, 1 (1991), 1–67.
[17] García-Lamont, F., Cervantes, J., Lopez-Chau, A., and Rodríguez, L. Segmentation of images by color features: A survey. Neurocomputing 292 (03 2018).
[18] Haralick, R. M., Shanmugam, K., and Dinstein, I. Textural features for image classification. IEEE Transactions on Systems, Man, and Cybernetics SMC-3, 6 (1973), 610–621.
[19] Hiew, J. Z. G., Huang, X., Mou, H., Li, D., Wu, Q., and Xu, Y. Bert-based financial sentiment index and lstm-based stock return predictability, 2019. arXiv preprint arXiv:1906.09024.
[20] Hu, M., Zhao, S., Guo, H., Cheng, R., and Su, Z. Learning to detect opinion snippet for aspect-based sentiment analysis, 2019. arXiv preprint arXiv:1909.11297.
[21] Hu, M. K. Visual pattern recognition by moment invariants. IRE Transactions on Information Theory 8, 2 (1962), 179–187.
[22] Huang, C.-J., Yang, D.-X., and Chuang, Y.-T. Application of wrapper approach and composite classifier to the stock trend prediction. Expert Systems with Applications 34, 4 (2008), 2870–2878.
[23] Jwa, H., Oh, D., Park, K., Kang, J. M., and Lim, H. exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Applied Sciences 9, 4062 (2019), 35–65.
[24] Kao, L.-J., Chiu, C.-C., Lu, C.-J., and Chang, C.-H. A hybrid approach by integrating wavelet-based feature extraction with mars and svr for stock index forecasting. Decision Support Systems 54, 3 (2013), 1228–1244.
[25] Karanja, E. M., Masupe, S., and Jeffrey, M. G. Analysis of internet of things malware using image texture features and machine learning techniques. Internet of Things 9 (2020), 100153.
[26] Kumar, G., and Bhatia, P. K. A detailed review of feature extraction in image processing systems. In 2014 Fourth International Conference on Advanced Computing Communication Technologies (2014), pp. 5–12.
[27] Lek, H. H., and Poo, D. C. C. Aspect-based twitter sentiment classification. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (2013), pp. 366–373.
[28] Li, X., Xie, H., Chen, L., Wang, J., and Deng, X. News impact on stock price return via sentiment analysis. Knowledge-Based Systems 69 (2014), 14–23.
[29] Liu, C., Wang, J., Xiao, D., and Liang, Q. Forecasting S&P 500 stock index using statistical learning models. Open Journal of Statistics 6, 06 (2016), 1067.
[30] Liu, Y., and Lapata, M. Text summarization with pretrained encoders, 2019. arXiv preprint arXiv:1908.08345.
[31] Long, W., Lu, Z., and Cui, L. Deep learning-based feature engineering for stock price movement prediction. Knowledge-Based Systems 164 (2019), 163–173.
[32] Loughran, T., and McDonald, B. When is a liability not a liability? textual analysis, dictionaries, and 10-Ks. The Journal of Finance 66, 1 (2011), 35–65.
[33] Loughran, T., and McDonald, B. Textual analysis in accounting and finance: A survey. Journal of Accounting Research 54, 4 (2016), 1187–1230.
[34] Lu,C.-J.,Chang,C.-H.,Chen,C.-Y.,Chiu,C.-C.,andLee,T.-S. Stock index prediction: A comparison of mars, bpn and svr in an emerging market. In IEEM 2009 - IEEE International Conference on Industrial Engineering and Engineering Management (01 2010), pp. 2343–2347.
[35] Ma, F., Zhang, Y., Wahab, M., and Lai, X. The role of jumps in the agricultural futures market on forecasting stock market volatility: New evidence. Journal of Forecasting (2019).
[36] Malo, P., Sinha, A., Takala, P., Korhonen, P., and Wallenius, J. Good debt or bad debt: Detecting semantic orientations in economic texts, 2013. arXiv preprint arXiv:1307.5336.
[37] Nguyen, T. H., Shirai, K., and Velcin, J. Sentiment analysis on social media for stock movement prediction. Expert Systems with Applications 42, 24 (2015), 9603–9611.
[38] Oliveira, N., Cortez, P., and Areal, N. The impact of microblogging data for stock market prediction: Using twitter to predict returns, volatility, trading volume and survey sentiment indices. Expert Systems with Applications 73 (2017), 125–144.
[39] Patel, J., Shah, S., Thakkar, P., and Kotecha, K. Predicting stock and stock price index movement using trend deterministic data preparation and machine learning techniques. Expert Systems with Applications 42, 1 (2015), 259–268.
[40] Ren, R., Wu, D. D., and Liu, T. Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal 13, 1 (2019), 760–770.
[41] Rosin, P. L. Measuring shape: ellipticity, rectangularity, and triangularity. In Machine Vision and Applications (2003), vol. 14, p. 172–184.
[42] Shapiro, A. H., Sudhof, M., and Wilson, D. Measuring news sentiment. Working Paper Series 2017-1, Federal Reserve Bank of San Francisco, 2020.
[43] Si, J., Mukherjee, A., Liu, B., Pan, S. J., Li, Q., and Li, H. Exploiting social relations and sentiment for stock prediction. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (Doha, Qatar, Oct. 2014), Association for Computational Linguistics, pp. 1139–1145.
[44] Siao, J. S., Hwang, R. C., and Chu, C. K. Predicting recovery rates using logistic quantile regression with bounded outcomes. Quantitative Finance 16, 5 (2016), 777– 792.
[45] Souza, H. E. D., Barbedo, C. H. D. S., and Araújo, G. S. Does investor attention affect trading volume in the brazilian stock market? Research in International Business and Finance 44 (2018), 480–487.
[46] Tanaka-Yamawaki, M., and Tokuoka, S. Adaptive use of technical indicators for the prediction of intra-day stock prices. Physica A: Statistical Mechanics and its Applications 383, 1 (2007), 125–133. Econophysics Colloquium 2006 and Third Bonzenfreies Colloquium.
[47] Teh, C. H., and Chin, R. T. On image analysis by the methods of moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 10, 4 (1988), 496–513.
[48] Tian, D. A review on image feature extraction and representation techniques. International Journal of Multimedia and Ubiquitous Engineering 8 (01 2013), 385–395.
[49] Tian, D., Zhao, X., and Shi, Z. Support vector machine with mixture of kernels for image classification. In Intelligent Information Processing VI (Berlin, Heidelberg, 2012), Z. Shi, D. Leake, and S. Vadera, Eds., Springer Berlin Heidelberg, pp. 68–76.
[50] Umamaheswari, C., Bhavani, R., and Sikamani, K. T. Texture and color feature extraction from ceramic tiles for various flaws detection classification. International Journal on Future Revolution in Computer Science & Communication Engineering 4, 1 (2018), 169–179.
[51] Weng, B., Lu, L., Wang, X., Megahed, F. M., and Martinez, W. Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications 112 (2018), 258–273.
[52] Yu, L., Chen, H., Wang, S., and Lai, K. K. Evolving least squares support vector ma- chines for stock market trend mining. IEEE Transactions on Evolutionary Computation 13, 1 (2009), 87–102.
[53] Yu, S., Chen, Y., and Zaidi, H. A financial service chatbot based on deep bidirectional transformers, 2020. arXiv preprint arXiv:2003.04987.
[54] Zahn, C. T., and Roskies, R. Z. Fourier descriptors for plane closed curves. IEEE Transactions on Computers C-21, 3 (1972), 269–281.
[55] Zayed, N.,and Elnemr, H.A. Statistical analysis of haralick texture features to discriminate lung abnormalities. International Journal of Biomedical Imaging 2015 (2015), 267807.
[56] Zhai, Y., Hsu, A., and Halgamuge, S. K. Combining news and technical indicators in daily stock price trends prediction. In Advances in Neural Networks – ISNN 2007 (Berlin, Heidelberg, 2007), D. Liu, S. Fei, Z. Hou, H. Zhang, and C. Sun, Eds., Springer Berlin Heidelberg, pp. 1087–1096.
[57] Zhang, L., Fu, S., and Li, B. Research on stock price forecast based on news sentiment analysis—a case study of alibaba. In Computational Science – ICCS 2018 (Cham,2018), Y. Shi, H. Fu, Y. Tian, V. V. Krzhizhanovskaya, M. H. Lees, J. Dongarra, and P. M. A. Sloot, Eds., Springer International Publishing, pp. 429–442.
[58] Zhang, Y.,and Wu, L.Stock market prediction of S&P500 via combination of improved bco approach and bp neural network. Expert Systems with Applications 36, 5 (2009), 8849–8854.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw