系統識別號 U0026-1106201423273600
論文名稱(中文) 應用新聞分析與搜尋趨勢預測股價之波動
論文名稱(英文) Using news analysis and search trends to forecast stock volatility
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 102
學期 2
出版年 103
研究生(中文) 黃俊達
研究生(英文) Jyun-Da Huang
學號 R76011121
學位類別 碩士
語文別 中文
論文頁數 76頁
口試委員 指導教授-王惠嘉
中文關鍵字 中文斷詞  特徵選取  情緒分析  搜尋趨勢  股價波動預測 
英文關鍵字 Chinese Word Segmentation  Feature Selection  Sentiment Analysis  Search Trends  Forecast Stock Volatility 
中文摘要 隨著資訊科技與網際網路的快速發展,網路資訊超越紙本報紙成為民眾取得資訊的主要管道,許多民眾瀏覽財經網站就是為了閱讀財經相關新聞,然而過多的資訊容易造成資訊過載的問題,使得使用者面對大量非結構化的資訊時,難以快速且有效地做出分析與判斷,文字探勘的相關技術可用來解決此一問題。
英文摘要 Recently, Internet has become the main source to collect financial information. In them, online financial news carries information about the firm's qualitative information which influences stock volatility. In the past, many studies used sentiment analysis on financial news to forecast stock price volatility. However, news written in Chinese meet several language-processing issues. For example, Chinese cannot use a blank character to separate words which cause extracted features may be wrong. In addition, as a lot of people used search engine to find out financial information, search trends may be used to analyse the current economic trends. In this research, a method uses search suggestions to improve the quality of Chinese word segmentation was proposed. The proposed method aims to select suitable features from news articles and to improve news sentiment analysis on financial news in order to forecasting stock price volatility. According to experimental results, using search suggestions could improve Chinese word segmentation thus increasing forecast accuracy of news sentiment analysis. About search behaviour, search trends analysis that could be found relevance between search terms and stock prices by pattern matching. From a compared results, used the search trends analysis could get higher forecast accuracy than news sentiment analysis.
論文目次 1. 緒論 1
1.1. 研究背景與動機 2
1.2. 研究目的 5
1.3. 研究範圍與限制 6
1.4. 研究流程 7
1.5. 論文大綱 7
2. 文獻探討 9
2.1. 自然語言處理 9
2.1.1. 中文斷詞處理 9
2.2. 搜尋引擎 12
2.2.1. 搜尋建議 12
2.2.2. 搜尋趨勢 13
2.3. 文件分析 14
2.3.1. 相似度計算 15
2.3.2. 文件分類 16
2.4. 特徵處理 19
2.4.1. 特徵擷取 19
2.4.2. 特徵選取 19
2.4.3. 情緒分析 22
2.5. 股價趨勢預測 22
2.6. 小結 24
3. 研究方法 25
3.1. 研究架構 25
3.2. 資料前處理模組 26
3.3. 斷詞改善模組 27
3.3.1. 潛在字詞偵測 28
3.3.2. 字詞合併與篩選 32
3.4. 預測模組 35
3.4.1. 新聞情緒分析與預測 36
3.4.2. 搜尋趨勢分析與預測 41
3.5. 小結 46
4. 系統建置與驗證 47
4.1. 系統建置 47
4.2. 實驗設計 47
4.2.1. 資料來源 48
4.2.2. 評估指標 49
4.3. 實驗結果與分析 50
4.3.1. 實驗一:中文斷詞比較 50
4.3.2. 實驗二:新聞情緒分析與預測 53
4.3.3. 實驗三:搜尋趨勢分析與預測 57
4.4. 小結 65
5. 結論及未來研究方向 66
5.1. 研究成果 66
5.2. 未來研究方向 70
參考文獻 72
參考文獻 英文文獻
Berndt, D. J., & Clifford, J. (1994). Using Dynamic Time Warping to Find Patterns in Time Series. Paper presented at the KDD workshop.
Butler, M., & Keselj, V. (2009). Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports. Paper presented at the Proceedings of the 22nd Canadian Conference on Artificial Intelligence: Advances in Artificial Intelligence, Kelowna, Canada.
Campbell, J. Y., & Shiller, R. J. (1988). Cointegration and tests of present value models: National Bureau of Economic Research Cambridge, Mass., USA.
Chang, P.-C., Galley, M., & Manning, C. D. (2008). Optimizing Chinese word segmentation for machine translation performance. Paper presented at the Proceedings of the Third Workshop on Statistical Machine Translation, Columbus, Ohio.
Chen, K.-J., & Bai, M.-H. (1998). Unknown Word Detection for Chinese by a Corpus-based Learning Method. Computational Linguistics and Chinese Language Processing, 3(1), 27-44.
Chen, K.-J., & Liu, S.-H. (1992). Word identification for Mandarin Chinese sentences. Paper presented at the Proceedings of the 14th conference on Computational linguistics - Volume 1, Nantes, France.
Chen, K.-J., & Ma, W.-Y. (2002). Unknown word extraction for Chinese documents. Paper presented at the Proceedings of the 19th international conference on Computational linguistics - Volume 1, Taipei, Taiwan.
Chiu, Y.-T., & Chen, Y.-L. (2011). An IPC-based vector space model for patent retrieval. Information Processing & Management, 47(3), 309-322.
Choi, H., & Varian, H. A. L. (2012). Predicting the Present with Google Trends. Economic Record, 88, 2-9.
Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51-89.
Ciravegna, D., & Petrelli, D. (2001). User involvement in adaptive information extraction: Position paper.
Coelho, M. S. (2012). Patterns in financial markets: Dynamic time warping.
Das, S. R., & Chen, M. Y. (2007). Yahoo! for Amazon: Sentiment extraction from small talk on the web. Management Science, 53(9), 1375-1388.
Davis, N. F., Breslin, N., & Creagh, T. (2013). Using Google Trends to Assess Global Interest in ‘Dysport®’ for the Treatment of Overactive Bladder. Urology, 82(5), 1189.
Dumais, S., & Chen, H. (2000). Hierarchical classification of Web content. Paper presented at the Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, Athens, Greece.
Forman, G. (2002). Choose Your Words Carefully: An Empirical Study of Feature Selection Metrics for Text Classification. Paper presented at the Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery.
Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res., 3, 1289-1305.
Fu, G., & Luke, K.-K. (2005). Chinese named entity recognition using lexicalized HMMs. SIGKDD Explor. Newsl., 7(1), 19-25.
Fu, T.-C. (2011). A review on time series data mining. Engineering Applications of Artificial Intelligence, 24(1), 164-181.
Galavotti, L., Sebastiani, F., & Simi, M. (2000). Feature selection and negative evidence in automated text categorization. Paper presented at the Proceedings of KDD.
Groth, S. S., & Muntermann, J. (2009). Supporting Investment Management Processes with Machine Learning Techniques. Paper presented at the Wirtschaftsinformatik (2).
Groth, S. S., & Muntermann, J. (2011). An intraday market risk management approach based on textual analysis. Decision Support Systems, 50(4), 680-691.
Hagenau, M., Liebmann, M., & Neumann, D. (2013). Automated news reading: Stock price prediction based on financial news using context-capturing features. Decision Support Systems, 55(3), 685-697.
Konchady, M. (2006). Text Mining Application Programming (Programming Series): Charles River Media, Inc.
Leinweber, D. (2011). Event-Driven Trading and the “New News”. Journal of Portfolio Management, 38(1), 110.
Levy, R., & Manning, C. (2003). Is it harder to parse Chinese, or the Chinese Treebank? Paper presented at the Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, Sapporo, Japan.
Li, F. (2010). The Information Content of Forward-Looking Statements in Corporate Filings—A Naïve Bayesian Machine Learning Approach. Journal of Accounting Research, 48(5), 1049-1102.
Li, S., Xia, R., Zong, C., & Huang, C.-R. (2009). A framework of feature selection methods for text categorization. Paper presented at the Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2, Suntec, Singapore.
Ma, W.-Y., & Chen, K.-J. (2003). A bottom-up merging algorithm for Chinese unknown word extraction. Paper presented at the Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17, Sapporo, Japan.
MacKinlay, A. C. (1997). Event Studies in Economics and Finance. Journal of Economic Literature, 35(1), 13-39.
Mittermayer, M. A. (2004, 5-8 Jan. 2004). Forecasting Intraday stock price trends with text mining techniques. Paper presented at the System Sciences, 2004. Proceedings of the 37th Annual Hawaii International Conference on.
O'Leary, D. E. (2011). Blog mining-review and extensions: “From each according to his opinion”. Decision Support Systems, 51(4), 821-830.
Omar, N., Jusoh, F., Ibrahim, R., & Othman, M. (2013). Review of Feature Selection for Solving Classification Problems. Journal of Information System Research and Innovation, 3.
Palmer, D. D. (1997). A trainable rule-based algorithm for word segmentation. Paper presented at the Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Madrid, Spain.
Preis, T., Moat, H. S., & Stanley, H. E. (2013). Quantifying Trading Behavior in Financial Markets Using Google Trends. Sci. Rep., 3.
Salton, G. (1991). Developments in Automatic Text Retrieval. Science, 253(5023), 974-980.
Salton, G., Wong, A., & Yang, C. S. (1975). A vector space model for automatic indexing. Commun. ACM, 18(11), 613-620.
Schumaker, R. P., & Chen, H. (2009). Textual analysis of stock market prediction using breaking financial news: The AZFin text system. ACM Trans. Inf. Syst., 27(2), 1-19.
Sun, X., Wang, H., & Li, W. (2012). Fast online training with frequency-adaptive learning rates for Chinese word segmentation and new word detection. Paper presented at the Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1, Jeju Island, Korea.
Teahan, W. J., McNab, R., Wen, Y., & Witten, I. H. (2000). A compression-based algorithm for Chinese word segmentation. Comput. Linguist., 26(3), 375-393.
Tetlock, P. C. (2007). Giving content to investor sentiment: The role of media in the stock market. Journal of Finance, 62(3), 1139.
Tetlock, P. C. (2011). All the news that's fit to reprint: Do investors react to stale information? Review of Financial Studies, 24(5), 1481.
Tetlock, P. C., Saar-Tsechansky, M., & Macskassy, S. (2008). More than words: Quantifying language to measure firms' fundamentals. The Journal of Finance, 63(3), 1437-1467.
Tsai, J.-L., Kliegl, R., & Yan, M. (2012). Parafoveal semantic information extraction in traditional Chinese reading. Acta Psychologica, 141(1), 17-23.
Vapnik, V. N. (1995). The nature of statistical learning theory: Springer-Verlag New York, Inc.
Wang, F. L., & Yang, C. C. (2007). Mining Web data for Chinese segmentation. Journal of the American Society for Information Science and Technology, 58(12), 1820-1837.
Wong, P.-k., & Chan, C. (1996). Chinese word segmentation based on maximum matching and word binding force. Paper presented at the Proceedings of the 16th conference on Computational linguistics - Volume 1, Copenhagen, Denmark.
Wu, Z., & Tseng, G. (1993). Chinese text segmentation for text retrieval: achievements and problems. J. Am. Soc. Inf. Sci., 44(9), 532-542.
Yang, C. C., Luk, J. W. K., Yung, S. K., & Yen, J. (2000). Combination and boundary detection approaches on Chinese indexing. Journal of the American Society for Information Science, 51(4), 340-351.
Ye, Y., Wu, Q., Li, Y., Chow, K. P., Hui, L. C. K., & Yiu, S. M. (2013). Unknown Chinese word extraction based on variety of overlapping strings. Inf. Process. Manage., 49(2), 497-512.
Yeh, C.-L., & Lee, H.-J. (1991). Rule-based word identification for Mandarin Chinese sentences-A unification approach. Computer Processing of Chinese and Oriental Languages, 5(2), 97-118.
Zeng, D., Wei, D., Chau, M., & Wang, F. (2011). Domain-specific Chinese word segmentation using suffix tree and mutual information. Information Systems Frontiers, 13(1), 115-125.
Gao, Z.-M. (2012). 語料庫建構技術研究報告. Retrieved August 30th, 2013, from http://wd.naer.edu.tw/project/NAER-101-12-F-2-03-00-2-01.pdf
Google. (2012). Ngram Viewer. Retrieved July 30th, 2013, from http://books.google.com/ngrams/datasets
Google. (2013). Google搜尋趨勢. Retrieved August 20th, 2013, from http://goo.gl/pJ0XRk
Google. (2014). Google自動完成. Retrieved January 27th, 2014, from https://support.google.com/websearch/answer/106230?hl=zh-tw
InsightXplorer. (2011). 2011.06 創市際報紙新聞網站篇. Retrieved July 30th, 2013, from http://www.insightxplorer.com/specialtopic/2011_06_17.htm
InsightXplorer. (2012). 2012.11 財經理財資訊網站篇. Retrieved July 30th, 2013, from http://www.insightxplorer.com/specialtopic/2012_11_12.htm
InsightXplorer. (2013). 2013.06 創市際月刊報告書. Retrieved July 30th, 2013, from http://news.ixresearch.com/?p=7207
MoneyDJ理財網. (2014). 財經知識庫-標準差. Retrieved May 11th, 2014, from http://www.moneydj.com/KMDJ/wiki/WikiViewer.aspx?Title=%E6%A8%99%E6%BA%96%E5%B7%AE
StatCounter. (2013). Top 5 Search Engines in Taiwan. Retrieved November 6th, 2013, from http://gs.statcounter.com/#search_engine-TW-daily-20130101-20131231
Wikipedia. (2013). Google Trends. Retrieved July 30th, 2013, from http://en.wikipedia.org/wiki/Google_Trends
Wikipedia. (2014). Google Ngram Viewer. Retrieved May 11th, 2014, from http://en.wikipedia.org/wiki/Google_Ngram_Viewer
Yahoo! (2013). 什麼是搜尋建議?. Retrieved July 30th, 2013, from http://help.yahoo.com/kb/index?page=content&id=SLN12539&locale=zh_TW&y=PROD_TWAUCT
世新華人傳播教育網. (2013). 2013臺灣民眾媒體評鑑大調查. Retrieved July 30th, 2013, from http://cce-online.shu.edu.tw/index.php/2012-03-09-18-00-59/797-1231
財團法人台灣網路資訊中心. (2013). 2013年台灣寬頻網路使用調查. Retrieved July 30th, 2013, from http://www.twnic.net.tw/download/200307/20130926c.pdf
資策會FIND. (2013). 2012年9月底止台灣上網人口. Retrieved July 30th, 2013, from http://www.find.org.tw/find/home.aspx?page=many&id=357
  • 同意授權校內瀏覽/列印電子全文服務,於2019-07-09起公開。

  • 如您有疑問,請聯絡圖書館