進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2408202011280500
論文名稱(中文) 四何學習用於演變環境中的行為預測:以智慧管家為例
論文名稱(英文) Quadro-W Learning for Behavior Prediction in Evolved Environment: Case Study of Intelligent Butler
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 108
學期 2
出版年 109
研究生(中文) 蔡冠廷
研究生(英文) Kuan-Ting Tsai
學號 P76074583
學位類別 碩士
語文別 英文
論文頁數 46頁
口試委員 指導教授-鄭憲宗
口試委員-許智威
口試委員-施嘉興
口試委員-陳盈鈞
中文關鍵字 影音識別  行為預測  Q學習  智慧家庭 
英文關鍵字 Audiovisual Recognition  Behavior Prediction  Q Learning  Intelligent House 
學科別分類
中文摘要 近年來隨著嵌入式硬體設備(感測器、微處理器等)的進步、軟體技術的成熟、網際網路的普及與價格的下降,嵌入式系統被廣泛運用於各種場合,包含農作物的生長監控、商品缺陷檢測、交通系統管控與病人生命特徵監測等。然而為了獲得夠多所需資料以得到良好的分析結果,通常會在應用環境中佈滿各式各樣的感測器,這樣子就會衍生許多問題,例如:過多的硬體設備改變了原始環境結構、系統搭設太過耗時、大量的硬體設備使得系統價格高昂與維護困難等,所以應在不影響分析結果的情況下,減少所需的硬體設備來達到目的。此外各個應用場景除了搭設用來收集環境資料的硬體設備外,還有其對應用來分析資料的軟體模型,這些模型通常根據環境下一定的情形進行設定,無法隨著環境的演變而一起演變,模型無法自我學習,這也導致系統的彈性與生命週期降低。
在本研究中,我們提出了一個四何學習(Quadro-W Learning)方法來對人的行為進行預測,四何即為人(Who)、物(What)、地(Where)與時(Wen)。我們透過攝影機所得到的資料來獲得四何資訊,不額外使用多的感測硬體。並以四何資訊為基礎建構行為預測模型,此模型不僅能依照初始的環境進行預測,還能隨著人的生活習慣改變而對模型進行修改,以增加模型的可用性與彈性。
英文摘要 In recent years, with the progress of embedded hardware devices (sensors, microprocessors, etc.), the maturity of software technology, the popularity of the internet and the decline in prices, embedded systems have been widely used in various scenes. Including crop growth monitoring, defection detection of commodity, transportation system management and vital signs monitoring, etc. However, in order to obtain enough information to get good analysis, the application environment is usually full of various sensors. That will make many problems, such as: the original environment changed by hardware devices; the initial system setup is too time-consuming; many hardware devices make the system expensive and the system is difficult to maintain, etc. Without affecting the results, need to reduce the hardware equipment required to achieve the goal. In addition to the hardware equipment used to collect environmental information, each application scene has theirs corresponding software models for analyzing information. This model is usually set according to some situation in the environment, and it can’t evolve with the evolution of the environment. The model cannot learn by itself that leads to decrease the flexibility and life cycle of the system.
In this study, we propose a Quadro-W Learning (QW-Learning) method to predict human behavior. Quadro-W means human (Who), object (what), place (where) and time (when). We only obtain the Quadro-W information through the data which collected by the camera, and not use extra sensors. Build a behavior prediction model by Quadro-W information, this model can not only make predictions based on the initial environment. It can also evolve as evolved environment to increase the flexibility and life cycle.
論文目次 摘要 I
Abstract II
ACKNOWLEDGEMENT IV
LIST OF CONTENTS V
LIST OF FIGURES VI
LIST OF TABLES VII
Chapter 1. Introduction 1
1.1 Introduction & Motivation 1
1.2 Thesis Overview 3
Chapter 2. Background & Related Work 4
2.1 Background 4
2.1.1 Action Recognition 4
2.1.2 Temporal Action Detection 8
2.2 Related Work 9
2.2.1 Residual Block 9
2.2.2 Q-Learning 12
Chapter 3. Method 14
3.1 Problem Description 14
3.2 System Architecture 15
3.3 Data Pre-processing 17
3.4 Quadro-W Model 19
3.4.1 Human Detection & Recognition 19
3.4.2 Object Detection & Recognition 21
3.4.3 Place Recognition 22
3.4.4 Sound Split & Recognition 23
3.5 Quadro-W Information Merge 25
3.6 Evolved Behavior Prediction 30
Chapter 4. Experiment 34
4.1 Experiment Environment Setup 34
4.2 Implementation 34
4.3 Experiment Result 35
Chapter 5. Conclusion & Future Work 43
Reference 45
參考文獻 [1] Library of Congress. "Who is credited with inventing the telephone?" https://www.loc.gov/everyday-mysteries/item/who-is-credited-with-inventing-the-telephone/ (accessed 07/01, 2020).
[2] H. Wang and C. Schmid, "Action Recognition with Improved Trajectories," in 2013 IEEE International Conference on Computer Vision, 1-8 Dec. 2013 2013, pp. 3551-3558, doi: 10.1109/ICCV.2013.441.
[3] K. Simonyan and A. Zisserman, "Two-stream convolutional networks for action recognition in videos," in Advances in neural information processing systems, 2014, pp. 568-576.
[4] J. Yue-Hei Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, "Beyond short snippets: Deep networks for video classification," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 4694-4702.
[5] Z. Qiu, T. Yao, and T. Mei, "Learning spatio-temporal representation with pseudo-3d residual networks," in proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5533-5541.
[6] Z. Shou, D. Wang, and S.-F. Chang, "Temporal action localization in untrimmed videos via multi-stage cnns," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1049-1058.
[7] T. Lin, X. Zhao, H. Su, C. Wang, and M. Yang, "Bsn: Boundary sensitive network for temporal action proposal generation," in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 3-19.
[8] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[9] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[10] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[11] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[12] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2017, pp. 2117-2125.
[13] R. Girshick, J. Donahue, T. Darrell, and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2014, pp. 580-587.
[14] K. He, X. Zhang, S. Ren, and J. Sun, "Spatial pyramid pooling in deep convolutional networks for visual recognition," IEEE transactions on pattern analysis and machine intelligence, vol. 37, no. 9, pp. 1904-1916, 2015.
[15] S. Ren, K. He, R. Girshick, and J. Sun, "Faster r-cnn: Towards real-time object detection with region proposal networks," in Advances in neural information processing systems, 2015, pp. 91-99.
[16] C. J. Watkins and P. Dayan, "Q-learning," Machine learning, vol. 8, no. 3-4, pp. 279-292, 1992.
[17] E. Reinhard, M. Adhikhmin, B. Gooch, and P. Shirley, "Color transfer between images," IEEE Computer graphics and applications, vol. 21, no. 5, pp. 34-41, 2001.
[18] Z. Li, Z. Jing, X. Yang, and S. Sun, "Color transfer based remote sensing image fusion using non-separable wavelet frame transform," Pattern Recognition Letters, vol. 26, no. 13, pp. 2006-2014, 2005/10/01/ 2005, doi: https://doi.org/10.1016/j.patrec.2005.02.010.
[19] L. Vincent and P. Soille, "Watersheds in digital spaces: an efficient algorithm based on immersion simulations," IEEE Transactions on Pattern Analysis & Machine Intelligence, no. 6, pp. 583-598, 1991.
[20] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[21] W. Liu et al., "Ssd: Single shot multibox detector," in European conference on computer vision, 2016: Springer, pp. 21-37.
[22] O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in International Conference on Medical image computing and computer-assisted intervention, 2015: Springer, pp. 234-241.
[23] Google, "Speech-to-Text: Automatic Speech Recognition | Cloud Speech-to-Text." [Online]. Available: https://cloud.google.com/speech-to-text.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-09-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-09-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw