進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0607202018301300
論文名稱(中文) 基於標籤基因組和深度強化學習在大量離散動作之動態電影推薦演算法
論文名稱(英文) Dynamic Movie Recommendation Algorithm Based on Tag Genome and Deep Reinforcement Learning in Large Discrete Action Spaces
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 108
學期 2
出版年 109
研究生(中文) 張淙垣
研究生(英文) Tsung-Yuan Chang
學號 N96071172
學位類別 碩士
語文別 中文
論文頁數 54頁
口試委員 口試委員-趙涵捷
口試委員-黃悅民
口試委員-黃仁竑
口試委員-賴盈勳
口試委員-陳世曄
指導教授-賴槿峰
中文關鍵字 推薦演算法  強化學習  大量離散動作 
英文關鍵字 Recommendation algorithm  Reinforcement learning  Large number of discrete actions 
學科別分類
中文摘要 由於現今社會全世界使用者產生的數據量日趨龐大,如何從大量資料中提取出有用的特徵,並將其產生價值就非常重要,推薦演算法也因此蓬勃發展,透過新的技術達到更加的效能及推薦結果。
在本論文中,我們提出了一種新型態的推薦演算法模型,以深度強化學習做為基礎,定義其架構及相關參數,根據不同使用者點擊不同電影並產生的評分,進行對使用者的個性化推薦,搭配線上模擬環境進行預訓練,且針對在大量的電影系統下的問題進行改良,期望提升效能並做出良好的推薦。
在實驗部分,本研究採用開源資料集MovieLens進行電影推薦,並以獎勵分數、點擊率、標準化折現累積收益、Top-k準確率及回合平均步數五個指標來評估模型好壞,實驗設計包括評估Deep Q Network與Dueling Deep Q Network架構的表現,調整不同實驗參數及動作選擇方法,並分別在大量離散動作空間下及冷啟動問題下進行實驗,與不同協同過濾的推薦方法比較結果,最後測試非訓練集中使用者的推薦。
實驗結果中,在大量動作下與傳統的DQN演算法比較我們的方法能在早期收斂且提升20倍以上的速度,與user based、item based的knn以及Bayesian Personalized Ranking比較顯示,在多項指標中表現較佳且能動態更新模型,此方法即使在系統完全沒見過的使用者下也能給出不錯的推薦,在驗證和測試資料集中都沒有過度擬合的問題,證明此模型的通用性。
英文摘要 In this paper, we proposed a recommendation algorithm which is based on deep reinforcement learning, to define its architecture and related parameters. According to different users click on different movies and their ratings, personalized recommendations for users. In addition to combining the characteristics of different recommendation algorithms, such as the advantages of collaborative filtering and content-based recommendation. It is improved for problems in large discrete action spaces of reinforcement learning, hoping to improve system performance and make good recommend.
Through the experimental results, we found that Comparison with DQN algorithm in a large number of actions, our method can converge at an early epoch and increase the speed by more than 20 times. Comparison with other commonly used recommendation algorithm shows that it performs better in multiple indicators and can dynamically update the model. The method can give good recommendations even for users who have never seen the system at all. There is no overfitting problem in the verification and test dataset, proving the universality of this model.
論文目次 摘要 i
英文延伸摘要 ii
誌謝 vi
目錄 vii
表格 ix
圖片 x
符號表 xi
Chapter 1. 簡介 1
1.1. 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2. 研究方向與貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3. 章節提要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2. 研究背景與文獻探討 4
2.1. 研究背景 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.1. 推薦系統介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1.2. 強化學習概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.3. 神經網路架構研究 . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2. 個性化推薦系統演算法 . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.1. 基於內容推薦 . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.2. 協同過濾推薦 . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.3. 基於關聯規則推薦 . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.4. 基於知識推薦 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.5. 混合推薦 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3. 近年發展 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
Chapter 3. 研究方法 13
3.1. 應用環境說明與問題描述 . . . . . . . . . . . . . . . . . . . . . . . 13
3.2. 線上環境模擬 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3. 強化學習模型 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.1. 網路架構 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3.2. 動作價值函數與策略 . . . . . . . . . . . . . . . . . . . . . . . . 17
3.3.3. 狀態 (State) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3.4. 動作 (Action) . . . . . . . . . . . . . . . . . . . . . . . . . . .19
3.3.5. 獎勵函數 (Reward Function) . . . . . . . . . . . . . . . . . . . . 19
3.3.6. 動作候選策略 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.7. 回合終止條件 . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4. 大量離散動作選擇問題 . . . . . . . . . . . . . . . . . . . . . . . . 21
3.5. 推薦流程及優勢 . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.6. 學習流程 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 4. 實驗方法與結果分析 27
4.1. 實驗設計 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.1. 實驗環境介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2. 實驗資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.1.3. 實驗流程 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2. 實驗結果 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.1. 演算法架構比較實驗 . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2.2. 最大回合步數實驗 . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.3. 動作群集選擇 . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4. 大量動作數量實驗 . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.5. 冷啟動實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.6. 測試集指標實驗 . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.7. 非訓練集使用者實驗 . . . . . . . . . . . . . . . . . . . . . . . . 44
4.2.8. 結果討論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 5. 結論與未來展望 46
5.1. 實驗結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2. 未來展望 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
References 48
Appendix A. 實驗相關表格 53
A.1. 實驗六: 非訓練使用者模型預測各項指標結果 . . . . . . . . . . . . . . 53
參考文獻 [1] D. Reinsel, J. Gantz, and J. Rydning, “The digitization of the world from edge to core,”IDC White Paper, 2018.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems,pp. 1097–1105, 2012.
[3] S. Hochreiter and J. Schmidhuber, “Long short­term memory,” Neural computation,vol. 9, no. 8, pp. 1735–1780, 1997.
[4] R. S. Sutton, A. G. Barto, et al., Introduction to reinforcement learning, vol. 135. MIT press Cambridge, 1998.
[5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
[6] P. Resnick and H. R. Varian, “Recommender systems,” Communications of the ACM,vol. 40, no. 3, pp. 56–58, 1997.
[7] D. P. Bertsekas, D. P. Bertsekas, D. P. Bertsekas, and D. P. Bertsekas, Dynamic programming and optimal control, vol. 1. Athena scientific Belmont, MA, 1995.
[8] M. L. Puterman, Markov decision processes: discrete stochastic dynamic programming.John Wiley & Sons, 2014.
[9] R. S. Sutton, “Generalization in reinforcement learning: Successful examples using sparse coarse coding,” in Advances in neural information processing systems, pp. 1038–1044, 1996.
[10] A. Harutyunyan, M. G. Bellemare, T. Stepleton, and R. Munos, “Q (λ) with off­policy corrections,” in International Conference on Algorithmic Learning Theory, pp. 305–320, Springer, 2016.
[11] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, and M. Riedmiller, “Deterministic policy gradient algorithms,” 2014.
[12] C. J. Watkins and P. Dayan, “Q­learning,” Machine learning, vol. 8, no. 3­4, pp. 279–292, 1992.
[13] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves,M. Riedmiller, A. K. Fidjeland, G. Ostrovski, et al., “Human­level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[14] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[15] J. Schulman, S. Levine, P. Abbeel, M. Jordan, and P. Moritz, “Trust region policy optimization,” in International conference on machine learning, pp. 1889–1897, 2015.
[16] T.­C. Wu, S.­Y. Tseng, C.­F. Lai, C.­Y. Ho, and Y.­H. Lai, “Navigating assistance system for quadcopter with deep reinforcement learning,” in 2018 1st International Cognitive Cities Conference (IC3), pp. 16–19, IEEE, 2018.
[17] D. Kalashnikov, A. Irpan, P. Pastor, J. Ibarz, A. Herzog, E. Jang, D. Quillen, E. Holly,M. Kalakrishnan, V. Vanhoucke, et al., “Qt­opt: Scalable deep reinforcement learning for vision­based robotic manipulation,” arXiv preprint arXiv:1806.10293, 2018.
[18] A. Kendall, J. Hawke, D. Janz, P. Mazur, D. Reda, J.­M. Allen, V.­D. Lam, A. Bewley,and A. Shah, “Learning to drive in a day,” in 2019 International Conference on Robotics and Automation (ICRA), pp. 8248–8254, IEEE, 2019.
[19] G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba, “Openai gym,” arXiv preprint arXiv:1606.01540, 2016.
[20] W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” The bulletin of mathematical biophysics, vol. 5, no. 4, pp. 115–133, 1943.
[21] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
[22] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inception­v4, inception­resnet and the impact of residual connections on learning,” in Thirty­first AAAI conference on artificial intelligence, 2017.
[23] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[24] R. Girshick, “Fast r­cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[25] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r­cnn: Towards real­time object detection with region proposal networks,” in Advances in neural information processing systems,pp. 91–99, 2015.
[26] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask r­cnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
[27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified,real­time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[28] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv:1412.3555, 2014.
[29] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
[30] A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger,S. Satheesh, S. Sengupta, A. Coates, et al., “Deep speech: Scaling up end­to­end speech recognition,” arXiv preprint arXiv:1412.5567, 2014.
[31] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
[32] M. Hausknecht and P. Stone, “Deep recurrent q­learning for partially observable mdps,”in 2015 AAAI Fall Symposium Series, 2015.
[33] M. J. Pazzani and D. Billsus, “Content­based recommendation systems,” in The adaptive web, pp. 325–341, Springer, 2007.
[34] G. Salton and C. Buckley, “Term­weighting approaches in automatic text retrieval,”Information processing & management, vol. 24, no. 5, pp. 513–523, 1988.
[35] T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, 2013.
[36] N. S. Altman, “An introduction to kernel and nearest­neighbor nonparametric regression,” The American Statistician, vol. 46, no. 3, pp. 175–185, 1992.
[37] J. R. Quinlan, “Induction of decision trees,” Machine learning, vol. 1, no. 1, pp. 81–106,1986.
[38] I. Rish et al., “An empirical study of the naive bayes classifier,” in IJCAI 2001 workshop on empirical methods in artificial intelligence, vol. 3, pp. 41–46, 2001.
[39] J. Benesty, J. Chen, Y. Huang, and I. Cohen, “Pearson correlation coefficient,” in Noise reduction in speech processing, pp. 1–4, Springer, 2009.
[40] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50–57, 1999.
[41] F. V. Jensen et al., An introduction to Bayesian networks, vol. 210. UCL press London,1996.
[42] G. H. Golub and C. Reinsch, “Singular value decomposition and least squares solutions,” in Linear Algebra, pp. 134–151, Springer, 1971.
[43] R. Agrawal, R. Srikant, et al., “Fast algorithms for mining association rules,” in Proc. 20th int. conf. very large data bases, VLDB, vol. 1215, pp. 487–499, 1994.
[44] J. Han, J. Pei, and Y. Yin, “Mining frequent patterns without candidate generation,”ACM sigmod record, vol. 29, no. 2, pp. 1–12, 2000.
[45] R. Burke, “Knowledge­based recommender systems,” Encyclopedia of library and information systems, vol. 69, no. Supplement 32, pp. 175–186, 2000.
[46] R. Burke, “Hybrid recommender systems: Survey and experiments,” User modeling and user­adapted interaction, vol. 12, no. 4, pp. 331–370, 2002.
[47] S. Rendle, C. Freudenthaler, Z. Gantner, and L. Schmidt­Thieme, “Bpr: Bayesian personalized ranking from implicit feedback,” arXiv preprint arXiv:1205.2618, 2012.
[48] R. He and J. McAuley, “Vbpr: visual bayesian personalized ranking from implicit feedback,” in Thirtieth AAAI Conference on Artificial Intelligence, 2016.
[49] W. Pan and L. Chen, “Gbpr: group preference based bayesian personalized ranking for one­class collaborative filtering,” in Twenty­Third International Joint Conference on Artificial Intelligence, 2013.
[50] S. Stiebellehner, J. Wang, and S. Yuan, “Learning continuous user representations through hybrid filtering with doc2vec,” arXiv preprint arXiv:1801.00215, 2017.
[51] F. Strub, R. Gaudel, and J. Mary, “Hybrid recommender system based on autoencoders,” in Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, pp. 11–16, 2016.
[52] V. Bellini, V. W. Anelli, T. Di Noia, and E. Di Sciascio, “Auto­encoding user ratings via knowledge graphs in recommendation scenarios,” in Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, pp. 60–66, 2017.
[53] W. Zhao, H. Chai, B. Wang, J. Ye, M. Yang, Z. Zhao, and X. Chen, “Leveraging long and short­term information in content­aware movie recommendation,” arXiv preprint arXiv:1712.09059, 2017.
[54] X. Zhao, C. Gu, H. Zhang, X. Liu, X. Yang, and J. Tang, “Deep reinforcement learning for online advertising in recommender systems,” arXiv preprint arXiv:1909.03602,2019.
[55] G. Zheng, F. Zhang, Z. Zheng, Y. Xiang, N. J. Yuan, X. Xie, and Z. Li, “Drn: A deep reinforcement learning framework for news recommendation,” in Proceedings of the 2018 World Wide Web Conference, pp. 167–176, 2018.
[56] X. Zhao, L. Zhang, Z. Ding, L. Xia, J. Tang, and D. Yin, “Recommendations with negative feedback via pairwise deep reinforcement learning,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp. 1040–1048, 2018.
[57] G. Dulac­Arnold, R. Evans, H. van Hasselt, P. Sunehag, T. Lillicrap, J. Hunt, T. Mann,T. Weber, T. Degris, and B. Coppin, “Deep reinforcement learning in large discrete action spaces,” arXiv preprint arXiv:1512.07679, 2015.
[58] X. Zhao, L. Zhang, L. Xia, Z. Ding, D. Yin, and J. Tang, “Deep reinforcement learning for list­wise recommendations,” arXiv preprint arXiv:1801.00209, 2017.
[59] S. Choi, H. Ha, U. Hwang, C. Kim, J.­W. Ha, and S. Yoon, “Reinforcement learning based recommender system using biclustering technique,” arXiv preprint arXiv:1801.05532, 2018.
[60] H. Chen, X. Dai, H. Cai, W. Zhang, X. Wang, R. Tang, Y. Zhang, and Y. Yu, “Largescale interactive recommendation with tree­structured policy gradient,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3312–3320, 2019.
[61] H. Van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with double qlearning,” in Thirtieth AAAI conference on artificial intelligence, 2016.
[62] Z. Wang, T. Schaul, M. Hessel, H. Van Hasselt, M. Lanctot, and N. De Freitas, “Dueling network architectures for deep reinforcement learning,” arXiv preprint arXiv:1511.06581, 2015.
[63] R. Bellman, “Dynamic programming,” Science, vol. 153, no. 3731, pp. 34–37, 1966.
[64] J. Vig, S. Sen, and J. Riedl, “The tag genome: Encoding community knowledge to support novel interaction,” ACM Transactions on Interactive Intelligent Systems (TiiS),vol. 2, no. 3, pp. 1–44, 2012.
[65] C. J. C. H. Watkins, “Learning from delayed rewards,” 1989.
[66] J. MacQueen et al., “Some methods for classification and analysis of multivariate observations,” in Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol. 1, pp. 281–297, Oakland, CA, USA, 1967.
[67] P. J. Rousseeuw, “Silhouettes: a graphical aid to the interpretation and validation of cluster analysis,” Journal of computational and applied mathematics, vol. 20, pp. 53–65, 1987.
[68] F. M. Harper and J. A. Konstan, “The movielens datasets: History and context,” Acm transactions on interactive intelligent systems (tiis), vol. 5, no. 4, pp. 1–19, 2015.

論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw