進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0706202011591900
論文名稱(中文) 具適應性停止策略之深度強化學習法於時間序列早期分類之研究
論文名稱(英文) Deep Reinforcement Learning with Adaptive-Halting Policy for Temporal Early Classification
校院名稱 成功大學
系所名稱(中) 工業與資訊管理學系
系所名稱(英) Department of Industrial and Information Management
學年度 108
學期 2
出版年 109
研究生(中文) 周佳志
研究生(英文) Chia-Chih Chou
學號 R36074113
學位類別 碩士
語文別 中文
論文頁數 52頁
口試委員 指導教授-李昇暾
口試委員-林清河
口試委員-耿伯文
口試委員-王宏仁
中文關鍵字 強化學習  深度確定性策略梯度  時間序列早期預測  多變量時間序列 
英文關鍵字 Reinforcement Learning  Deep Deterministic Policy Gradient  Early Prediction  Multivariate Time Series 
學科別分類
中文摘要 預測問題一直以來是機器學習中相當重要的課題,無論是在製造業、醫療業或是金融業,若能及早察覺先機或異常,並採取相對應的反應,在背後所帶來的利益或是降低的成本是非常可觀的。時間序列預測是利用過去一段時間內某事件的時間特徵,來預測未來一段時間內對於該事件發生的可能性。在現實生活中,許多時間序列預測的研究都是使用多變量的時間序列,多變量時間序列在每個時間點上同時擁有著許多不同的特徵,這些特徵能讓機器更能了解現實的狀況。隨著運算科技的成熟,深度學習網路常被應用到像多變量時間序列預測的問題,如:使用卷積神經網路或是長短期記憶等深度學習的方法,可以讓機器的學習更能有效率且準確地分析未來可能的趨勢。強化學習的發展,嘗試著讓機器學習如何靠自己,根據環境狀態而做出行動,以取得最大化的預期利益。讓機器在未知的環境學習如何做決定。由Deep mind公司所提出的深度確定性策略梯度(Deep Deterministic Policy Gradient, DDPG),能有效的分析連續訊號以及動作,讓機器與高維度空間環境互動,學習做出最佳的決策。本研究將會結合上述的研究,本研究以DDPG作為主要研究框架,提出一個具適應性停止策略之深度強化學習網路框架用於早期預測(EarlyDDPG),分析多變量時間序列並且進行早期預測。讓機器能學習該在什麼時間點上中止訓練,並且做出一定水準的預測結果。目前本研究與先前文獻相比,本研究能使用更短的時間資訊,獲得與先前文獻接近的結果,甚至能有更好的預測結果。在未來若需要能應用於真實環境,本研究也能藉此框架因應各種情境作出早期預測。
本研究使用美國加州大學河邊分校專門為時間序列分類問題所建立的UCR時間序列分類資料庫來進行本次的實驗研究。本研究參考Martinez, Perrin, Ramasso與 Rombaut (2018)的文獻,使用了wafer機台製程資料集、ECG心肌梗塞心電圖資料集以及GunPoint動作位置資料集三項資料集作為本次的實驗資料集,這三項資料集皆為真實資料集。
最後,本研究所提出的具適應性停止策略之深度強化學習框架(EarlyDDPG)的實驗結果,經過在多數的實驗資料集的測試下,獲得不錯的預測結果,並且同時在預測所使用的時間資訊上大多能比先前文獻提早至少40%的時間,可以用更少的資訊預測出接近或是更好的預測結果。
英文摘要 Prediction problem is always an important issue in machine learning. No matter in manufacturing, healthcare, or finance, it will response the solution when it recognizes opportunities or anomalies. The benefit which the prediction problem creates is very considerable. Time series prediction utilizes the time characteristics of an event in the past period of time to predict the possibility of the event occurring in the next period of time. In real life, most of studies on time series prediction mostly use multivariate time series. Multivariate time series have many different characteristics at each time point, and these characteristics can make the machine better understand the real situation. With the development of reinforcement learning, the machine tries to make the machine rely on itself and act according to the state of the environment to obtain the maximum expected benefits, letting the machine learn how to make decisions in an unknown environment. Deep Deterministic Policy Gradient (DDPG), proposed by Deep Mind, can effectively analyze continuous signals and actions, allowing machines to interact with high-dimensional spatial environments and learn to make the best decisions by themselves. First, our study uses DDPG as the main research framework, and proposes a deep reinforcement learning network framework with adaptive halting policy for early prediction (EarlyDDPG). Second, EarlyDDPG analyze multivariate time series and make early predictions. Third, we allow the machine to learn when to stop training and make a certain level of prediction. Our study expects to use less time information to achieve the results of existing methods,or even better prediction results. Our research used the UCR time series classification database by the University of California, Riverside to conduct our experiment. We used the wafer dataset, the ECG dataset and the GunPoint dataset as the experimental data set. All of these datasets are all real datasets. According to the experimental results, EarlyDDPG has obtained good prediction results in most of cases. Seconds, the time information required for prediction of EarlyDDPG is at least 40% earlier than the previous literature, which mean it can use less information to predicts close or better prediction results.
論文目次 摘要 i
英文摘要 iii
誌謝 x
目錄 xiii
圖目錄 xv
表目錄 xv
第一章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 3
1.3 研究流程 3
第二章 文獻探討 5
2.1 多變量早期時間序列分類 5
2.2 小波轉換 6
2.3 深度學習 Deep Learning 8
2.3.1 卷積神經網路Convolutional Neural Network 8
2.3.2 長短期記憶 Long Short-Term Memory 10
2.4 強化學習 Reinforcement Learning 13
2.4.1 強化學習基本框架 13
2.4.2 深度Q學習網路Deep Q-Learning Network 14
2.4.3 策略梯度Policy-gradient 16
2.4.4 深度確定性策略梯度Deep Deterministic Policy Gradient 19
第三章 研究方法 23
3.1 問題及符號定義 23
3.2 研究框架 24
3.2.1 資料前處理 25
3.2.2 特徵萃取 25
3.2.3 強化學習框架 26
3.2.3.1 狀態、動作以及環境設定 26
3.2.3.2 獎懲值設定 27
3.2.3.3 DDPG 及預測模型架構設定 28
3.2.4 超參數說明 29
第四章 實驗結果與分析 31
4.1 實驗架構 31
4.2 UCR資料集說明 32
4.3 離散小波轉換 35
4.4 超參數以及架構設定 36
4.5 評估指標 39
4.6 實驗結果與比較 41
第五章 結論與未來展望 45
5.1 結論 45
5.2 未來展望 46
參考文獻 47
附錄 52
參考文獻 Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157-166. doi:10.1109/72.279181
Bopardikar, A. S. (1999). Wavelet transforms: introduction to theory and applications: Pearson Education.
Casas, N. (2017). Deep deterministic policy gradient for urban traffic light control. arXiv preprint arXiv:1703.09035. Retrieved from https://arxiv.org/abs/1703.09035.
Dau, H. A., Bagnall, A., Kamgar, K., Yeh, C.-C. M., Zhu, Y., Gharghabi, S., Ratanamahatana, C. A., & Keogh, E. (2019). The UCR time series archive. IEEE/CAA Journal of Automatica Sinica, 6(6), 1293-1305.
Doya, K. (1993). Bifurcations of recurrent neural networks in gradient descent learning. IEEE Transactions on Neural Networks, 1(75), 164.
Rodríguez, B., & Pennachin, C. (2007). Artificial general intelligence (Vol. 2): Springer.
Grossberg, S., & Merrill, J. W. (1992). A neural network model of adaptively timed reinforcement learning and hippocampal dynamics. Cognitive brain research, 1(1), 3-38.
Grossmann, A., & Morlet, J. (1984). Decomposition of Hardy functions into square integrable wavelets of constant shape. SIAM journal on mathematical analysis, 15(4), 723-736.
Hartvigsen, T., Sen, C., Kong, X., & Rundensteiner, E. (2019). Adaptive-Halting Policy Network for Early Classification. Paper presented at the The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Anchorage AK USA. doi:10.1145/3292500.3330974.
He, G., Duan, Y., Peng, R., Jing, X., Qian, T., & Wang, L. (2015). Early classification on multivariate time series. Neurocomputing, 149, 777-787.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.
Huang, B.-Q., Cao, G.-Y., & Guo, M. (2005). Reinforcement learning neural network to the problem of autonomous mobile robot obstacle avoidance. Paper presented at the 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China. doi:10.1109/ICMLC.2005.1526924.
Huang, H.-S., Liu, C.-L., & Tseng, V. S. (2018). Multivariate time series early classification using multi-domain deep neural network. Paper presented at the 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA), Turin, Italy. doi:10.1109/DSAA.2018.00019.
Jaeger, H., & Haas, H. (2004). Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. science, 304(5667), 78-80.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 10.1145/3065386, 1097-1105. doi:10.1145/3065386
Le, T. P., Quang, N. D., Choi, S., & Chung, T. (2018). Learning a self-driving bicycle using deep deterministic policy Gradient. Paper presented at the 2018 18th International Conference on Control, Automation and Systems (ICCAS), Daegwallyeong, South Korea.
Li, Y. (2017). Deep reinforcement learning: An overview. arXiv preprint arXiv:1701.07274. Retrieved from https://arxiv.org/abs/1701.07274.
Liang, X., Du, X., Wang, G., & Han, Z. (2019). A Deep Reinforcement Learning Network for Traffic Light Cycle Control. IEEE Transactions on Vehicular Technology, 68(2), 1243-1253.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., & Wierstra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Retrieved from https://arxiv.org/abs/1509.02971.
Lin, C.-T., & Jou, C.-P. (1999). Controlling chaos by GA-based reinforcement learning neural network. IEEE Transactions on Neural Networks, 10(4), 846-859.
Lin, E., Chen, Q., & Qi, X. (2020). Deep reinforcement learning for imbalanced classification. Applied Intelligence, 1-15.
Lin, L.-J. (1992). Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3-4), 293-321.
Liu, C.-L., Hsaio, W.-H., & Tu, Y.-C. (2018). Time series classification with multivariate convolutional neural network. IEEE Transactions on Industrial Electronics, 66(6), 4788-4797.
Martinez, C., Perrin, G., Ramasso, E., & Rombaut, M. (2018). A deep reinforcement learning approach for early classification of time series. Paper presented at the 2018 26th European Signal Processing Conference (EUSIPCO), Rome, Italy. doi:10.23919/EUSIPCO.2018.8553544.
Martinez, C., Ramasso, E., Perrin, G., & Rombaut, M. (2020). Adaptive early classification of temporal sequences using deep reinforcement learning. Knowledge-Based Systems, 190, 105290.
Mikolov, T. (2012). Statistical language models based on neural networks. (PhD), Brno university of Technology, Retrieved from https://www.fit.vut.cz/study/phd-thesis/283/
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., & Ostrovski, G. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529.
Pascanu, R., Mikolov, T., & Bengio, Y. (2013). On the difficulty of training recurrent neural networks. Paper presented at the International conference on machine learning. doi:10.1.1.421.8930.
Rodríguez, J. J., Alonso, C. J., & Boström, H. (2001). Boosting interval based literals. Intelligent Data Analysis, 5(3), 245-262.
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., & Riedmiller, M. (2014). Deterministic policy gradient algorithms. Paper presented at the Proceedings of the 31st International Conference on International Conference on Machine Learning, Bejing, China.
Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction: MIT press.
Sutton, R. S., McAllester, D. A., Singh, S. P., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. Paper presented at the Advances in neural information processing systems.
Tsitsiklis, J. N., & Van Roy, B. (1997). Analysis of temporal-diffference learning with function approximation. IEEE Transactions on Automatic Control, 42(5), 674-690. doi:10.1109/9.580874
Wang, W., Chen, C., Wang, W., Rai, P., & Carin, L. (2016). Earliness-aware deep convolutional networks for early time series classification. arXiv preprint arXiv:1611.04578. Retrieved from https://arxiv.org/abs/1611.04578.
Xing, Z., Pei, J., Dong, G., & Yu, P. S. (2008). Mining sequence classifiers for early prediction. Paper presented at the Proceedings of the 2008 SIAM international conference on data mining. doi:10.1137/1.9781611972788.59.
Xing, Z., Pei, J., & Philip, S. Y. (2009). Early prediction on time series: a nearest neighbor approach. Paper presented at the Twenty-First International Joint Conference on Artificial Intelligence. doi:10.1.1.437.9625.
Xing, Z., Pei, J., Yu, P. S., & Wang, K. (2011). Extracting interpretable features for early classification on time series. Paper presented at the Proceedings of the 2011 SIAM International Conference on Data Mining. doi:10.1.1.437.9625.
Xu, J., Hou, Z., Wang, W., Xu, B., Zhang, K., & Chen, K. (2018). Feedback Deep Deterministic Policy Gradient With Fuzzy Reward for Robotic Multiple Peg-in-Hole Assembly Tasks. IEEE Transactions on Industrial Informatics, 15(3), 1658-1667.
Yang, C.-L., Yang, C.-Y., Chen, Z.-X., & Lo, N.-W. (2019). Multivariate Time Series Data Transformation for Convolutional Neural Network. Paper presented at the 2019 IEEE/SICE International Symposium on System Integration (SII), Paris, France. doi:10.1109/SII.2019.8700425.
Yang, J., Nguyen, M. N., San, P. P., Li, X. L., & Krishnaswamy, S. (2015). Deep convolutional neural networks on multichannel time series for human activity recognition. Paper presented at the Proceedings of the 24th International Conference on Artificial Intelligence. doi:2832747.2832806.
Zhao, B., Lu, H., Chen, S., Liu, J., & Wu, D. (2017). Convolutional neural networks for time series classification. Journal of Systems Engineering and Electronics, 28(1), 162-169.
Zheng, Y., Liu, Q., Chen, E., Ge, Y., & Zhao, J. L. (2014). Time series classification using multi-channels deep convolutional neural networks. Paper presented at the International Conference on Web-Age Information Management. doi:978-3-319-08010-9_33.

論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-06-15起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw