進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0908201814445300
論文名稱(中文) 神經嫁接網絡: 利用遷移學習建構多任務情境策略
論文名稱(英文) Neural Grafting Network: Construct Multi-Task Contextual Policy by Transfer Learning
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 106
學期 2
出版年 107
研究生(中文) 陳杰翰
研究生(英文) Jie-Han Chen
電子信箱 ita3051@gmail.com
學號 P76054305
學位類別 碩士
語文別 英文
論文頁數 36頁
口試委員 指導教授-莊坤達
口試委員-謝孫源
口試委員-高宏宇
口試委員-李政德
中文關鍵字 神經網絡  Dropout  遷移學習  多任務強化學習  星海爭霸 
英文關鍵字 Neural Network  Dropout  transfer learning  multi-task reinforcement learning  StarCraft 
學科別分類
中文摘要 近年來,深度強化學習在許多領域都取得令人印象深刻的成果。然而,對於強化學習而言,如何定義獎勵函式對於引導強化學習是至關重要的,獎勵函式在多任務情境下更具挑戰。在沒有足夠獎勵的引導下,強化學習演算法很難引導人工智慧學習到有用的知識。我們在本篇論文中提出一種嶄新的概念來引導強化學習學會多任務相關知識。我們引入 Dropout 作為動態分流的方法,用以控制多任務情境下不同神經元的的激發。同時,也藉由遷移神經網絡策略層的參數來幫助學習。基於這樣的概念,我們實做了一個特殊的神經網絡結構——神經嫁接網絡。實驗結果顯示神經嫁接網絡可藉由強化學習處理多任務的複雜情境,遷移學習也大量減少了學習的時間及幫助神經網絡學習策略知識,並且在這樣的網絡架構下能夠減輕災難性遺忘的效應。此外,遷移策略層的參數在神經嫁接網絡下也出現顯著效果,可支持以往研究認為神經網絡輸出層與任務本身較為相關的假說論點。
英文摘要 Deep reinforcement learning has made impressing performance in many fields recently. However, it’s critical to design the reward function for the learning agent, especially in multi-task reinforcement learning. Without explicit rewards, it’s challenging to learn useful knowledge in multi-task scenario. We proposed a novel concept to make the agent learn the knowledge in multitask decision problem, which uses transfer learning and dynamic neural network routing with Dropout. Based on dynamic neural network routing, we implemented this concept with a special neural network architecture called neural grafting network. The results has shown that neural grafting network can handle domain adaptation problem in multi-task environment, mitigate catastrophic forgetting issue in transferring different prior knowledge for specific task. Besides, transferring the weights from both early layers and latter layers without hidden layers in neural grafting network also helped multi-task agent in training stage significantly, which can be considered as an endorsement of previous assumption in transfer learning which means the weights in latter layers are highly related to the knowledge about the specific task.
論文目次 中文摘要............................................ i
Abstract............................................. ii
Acknowledgment ........................................ iii Contents............................................. iv
ListofFigures ......................................... vi
1 Introduction......................................... 1
2 Background......................................... 4
3 RelatedWork........................................ 7
4 Method ........................................... 9
4.1 Inspiration....................................... 9
4.2 NeuralGraftingNetwork............................... 10
5 Experimental Settings ................................... 13
5.1 LearningEnvironment ................................ 13
5.2 Base model ...................................... 14
5.3 LearningAlgorithm.................................. 15
5.4 Basic tasks....................................... 15
5.4.1 CollectMineralShards-SingleMarine ..................... 15
5.4.2 CollectMineralShards-SingleSCV ...................... 16
5.4.3 DestroyBuildings-SingleBanShee....................... 16
5.5 Complicated tasks .................................. 17
5.5.1 CollectByFiveMarines ............................ 17
5.5.2 CollectByFiveMarines-Sparse ........................ 18
5.5.3 CollectAndDestroy-Sparse .......................... 18
5.6 NeuralGraftingModel ................................ 19
6 Experimental Results.................................... 20
6.1 PerformanceEvaluation ............................... 20
6.1.1 CollectByFiveMarines ............................ 20
6.1.2 CollectByFiveMarines-Sparse ........................ 22
6.1.3 CollectAndDestroy-Sparse .......................... 23
6.2 AblationTest ..................................... 25
6.3 TransferAnalysis ................................... 26
7 Conclusions ......................................... 30
Bibliography .......................................... 32
參考文獻 [1] S. Levine, C. Finn, T. Darrell, and P. Abbeel, “End-to-end training of deep visuomotor policies,” Journal of Machine Learning Research, JMLR, vol. 17, pp. 39:1–39:40, 2016.
[2] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen, “Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection,” I. J. Robotics Res., vol. 37, no. 4-5, pp. 421–436, 2018.
[3] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, p. 529, 2015.
[4] Y. Wu and Y. Tian, “Training agent for first-person shooter game with actor-critic curriculum learning,” in Fifth International Conference on Learning Representations, ICLR, 2016.
[5] G. Lample and D. S. Chaplot, “Playing fps games with deep reinforcement learning.” in AAAI, 2017, pp. 2140–2146.
[6] J. N. Foerster, G. Farquhar, T. Afouras, N. Nardelli, and S. Whiteson, “Counterfactual multi-agent policy gradients,” in Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
[7] J. Li, W. Monroe, A. Ritter, D. Jurafsky, M. Galley, and J. Gao, “Deep reinforcement learning for dialogue generation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP, 2016, pp. 1192–1202.
[8] B. Dhingra, L. Li, X. Li, J. Gao, Y. Chen, F. Ahmed, and L. Deng, “Towards end-to-end reinforcement learning of dialogue agents for information access,” in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, ACL, 2017, pp. 484–495.
[9] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, p. 484, 2016.
[10] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton et al., “Mastering the game of go without human knowledge,” Nature, vol. 550, no. 7676, p. 354, 2017.
[11] D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel et al., “Mastering chess and shogi by self-play with a general reinforcement learning algorithm,” arXiv preprint arXiv:1712.01815, 2017.
[12] B. Zoph and Q. V. Le, “Neural architecture search with reinforcement learning,” in Proceedings of the International Conference on Learning Representations, ICLR, 2017.
[13] C. Liu, B. Zoph, J. Shlens, W. Hua, L.-J. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy, “Progressive neural architecture search,” arXiv preprint arXiv:1712.00559, 2017.
[14] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean, “Ecient neural architecture search via parameter sharing,” arXiv preprint arXiv:1802.03268, 2018.
[15] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable are features in deep neural networks?” in Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems, NIPS, 2014, pp. 3320–3328.
[16] J. Blitzer, R. McDonald, and F. Pereira, “Domain adaptation with structural correspondence learning,” in Proceedings of the 2006 conference on empirical methods in natural language processing, EMNLP, 2006, pp. 120–128.
[17] H. Daume III and D. Marcu, “Domain adaptation for statistical classifiers,” Journal of Artificial Intelligence Research, JAIR, vol. 26, pp. 101–126, 2006.
[18] S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, “Domain adaptation via transfer component analysis,” IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 199–210, 2011.
[19] R. M. French, “Catastrophic forgetting in connectionist networks,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999.
[20] M. McCloskey and N. J. Cohen, “Catastrophic interference in connectionist networks: The sequential learning problem,” in Psychology of learning and motivation, 1989, vol. 24, pp. 109–165.
[21] J. L. McClelland, B. L. McNaughton, and R. C. O’reilly, “Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory.” Psychological review, vol. 102, no. 3, p. 419, 1995.
[22] O. Vinyals, T. Ewalds, S. Bartunov, P. Georgiev, A. S. Vezhnevets, M. Yeo, A. Makhzani, H. Ku ̈ttler, J. Agapiou, J. Schrittwieser, J. Quan, S. Ga↵ney, S. Petersen, K. Simonyan, T. Schaul, H. van Hasselt, D. Silver, T. P. Lillicrap, K. Calderone, P. Keet, A. Brunasso, D. Lawrence, A. Ekermo, J. Repp, and R. Tsing, “Starcraft II: A new challenge for rein- forcement learning,” CoRR, vol. abs/1708.04782, 2017.
[23] R. S. Sutton, A. G. Barto et al., Reinforcement learning: An introduction, 1998.
[24] V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” in Proceedings of the 33nd International Conference on Machine Learning, ICML, 2016, pp. 1928–1937.
[25] V. R. Konda and J. N. Tsitsiklis, “Actor-critic algorithms,” in Advances in neural infor- mation processing systems, NIPS, 2000, pp. 1008–1014.
[26] J. Andreas, D. Klein, and S. Levine, “Modular multitask reinforcement learning with policy sketches,” in Proceedings of the 34th International Conference on Machine Learning, ICML, 2017, pp. 166–175.
[27] C. Devin, A. Gupta, T. Darrell, P. Abbeel, and S. Levine, “Learning modular neural network policies for multi-task and multi-robot transfer,” in International Conference on Robotics and Automation, ICRA, 2017, pp. 2169–2176.
[28] R. S. Sutton, D. Precup, and S. Singh, “Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning,” Artificial intelligence, vol. 112, no. 1-2, pp. 181–211, 1999.
[29] T. D. Kulkarni, K. Narasimhan, A. Saeedi, and J. Tenenbaum, “Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation,” in Advances in neural information processing systems, NIPS, 2016, pp. 3675–3683.
[30] K. Frans, J. Ho, X. Chen, P. Abbeel, and J. Schulman, “META LEARNING SHARED HIERARCHIES,” in International Conference on Learning Representations, ICLR, 2018.
[31] G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” in NIPS Deep Learning and Representation Learning Workshop, 2015.
[32] E. Parisotto, L. J. Ba, and R. Salakhutdinov, “Actor-mimic: Deep multitask and transfer reinforcement learning,” in International Conference on Learning Representations, ICLR, 2016.
[33] Y. Teh, V. Bapst, W. M. Czarnecki, J. Quan, J. Kirkpatrick, R. Hadsell, N. Heess, and R. Pascanu, “Distral: Robust multitask reinforcement learning,” in Advances in Neural Information Processing Systems, NIPS, 2017, pp. 4496–4506.
[34] A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” arXiv preprint arXiv:1606.04671, 2016.
[35] S. Tubbs, E. Rizk, M. M. Shoja, M. Loukas, N. Barbaro, and R. J. Spinne, “Chapter 17 - nerve grafting methods,” in Nerves and Nerve Injuries. Academic Press, 2015, pp. 237 – 248.
[36] X. Glorot and Y. Bengio, “Understanding the diculty of training deep feedforward neural networks,” in Proceedings of the thirteenth international conference on artificial intelligence and statistics, AISTATS, 2010, pp. 249–256.
[37] K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectifiers: Surpassing human- level performance on imagenet classification,” in Proceedings of the IEEE international conference on computer vision, ICCV, 2015, pp. 1026–1034.
[38] S. Io↵e and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proceedings of the 32nd International Conference on Machine Learning, ICML, 2015, pp. 448–456.
[39] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, JMLR, vol. 15, no. 1, pp. 1929–1958, 2014.
[40] R. Pascanu, T. Mikolov, and Y. Bengio, “On the diculty of training recurrent neural networks,” in Proceedings of the 30th International Conference on Machine Learning, ICML, 2013, pp. 1310–1318.
[41] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE Conference on Computer Vision and Pattern Recognition, CVPR, 2015, pp. 3431–3440.
[42] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, “High-dimensional continuous control using generalized advantage estimation,” CoRR, vol. abs/1506.02438, 2015.
[43] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations, ICLR, 2015.
[44] S. Thrun and L. Pratt, Learning to learn, 2012.
[45] T. G. Dietterich, “Hierarchical reinforcement learning with the maxq value function decomposition,” Journal of Artificial Intelligence Research, JAIR, vol. 13, pp. 227–303, 2000.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2020-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw