進階搜尋


 
系統識別號 U0026-0812200912110761
論文名稱(中文) 整合視覺特徵與語音資訊之視訊註解方法
論文名稱(英文) Video Annotation by Using Visual and Speech Features
校院名稱 成功大學
系所名稱(中) 資訊工程學系碩博士班
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 94
學期 2
出版年 95
研究生(中文) 陳智仁
研究生(英文) Chih-Jen Chen
學號 p7693148
學位類別 碩士
語文別 中文
論文頁數 72頁
口試委員 口試委員-謝孫源
口試委員-吳宗憲
指導教授-曾新穆
口試委員-林嘉文
中文關鍵字 視訊影片註解  關聯規則  以統計為基礎的預測模型  融合  資料探勘 
英文關鍵字 Video Annotation  Statistics-Based Model  Rule-Based Model  Fusion  Association Rule 
學科別分類
中文摘要 摘要
由於視訊影片具有多重特徵,如:影像、聲音、文字等特徵。其隱含的高階意涵概念也愈複雜,若單以視覺特徵做自動註解,無法完全藉由影像低階特徵值求得較高階的語義(接近自然語言);相反的,雖然語音資訊的內容較切近人類自然語言,但若單以語音資訊來做註解,可能會有某視訊片段的語音資訊內容與其視訊內容無關的情形發生,導致註解錯誤。為了拉近低階特徵與高階意涵概念間的差距,我們提出一個整合視覺特徵與語音資訊的方法,分別對視覺特徵與語音資訊做處理,建構兩個預測模型:以統計為基礎的模型ModelCRM與以關聯規則為基礎的模型ModelSAR,並藉由融合的方式,將兩個預測模型所產生的機率列表結合在一起,來增進視訊影片註解的準確性。在最後的實驗分析中,我們採用公開性的視訊影片資料(TRECVID 2003)作為實驗資料,而實驗結果證實我們的方法的確有著不錯的預測效果。



英文摘要 ABSTRACT
Video is composed of various types of multimedia data such as image, audio and text, so the implicit high-level concept hidden in it is of high degree of complexity. Accordingly, it is hard to capture the high-level semantics by analyzing only the visual features. On contrast, automatic video annotation may bring out the mismatching between the shot and the speech if only speech information is considered. In order to reduce the gap between low-level features and high-level concepts, we propose an approach that integrates visual features and speech information to yield two referred prediction models, namely ModelCRM (statistics-based) and ModelSAR (rule-based). Through fusing two prediction models, the generated probability list can effectively help enhance the precision of video annotation. Through experimental evaluation on well-known TRECVID 2003 datasets, our proposed approach was shown to deliver higher precision than other existing methods.



論文目次 目錄

英文摘要 I
中文摘要 II
誌謝 III
目錄 IV
表目錄 VI
圖目錄 VII

第一章 導論 - 1 -
1.1 研究目的 - 1 -
1.2 問題描述 - 2 -
1.3 研究方法 - 3 -
1.4 研究貢獻 - 4 -
1.5 論文架構 - 5 -

第二章 文獻探討 - 6 -
2.1 視訊片段偵測及關鍵影格擷取(Shot Detection & Keyframe Extraction) - 6 -
2.2 視訊影片組成 - 8 -
2.3 影像低階特徵值擷取 - 9 -
2.4 針對關鍵影格做分析的註解技術 - 10 -
2.5 自動語音辨認技術(ASR, Automatic Speech Recognition) - 11 -
2.6文字處理技術 - 12 -
2.7 關聯規則探勘 - 13 -
2.7.1 關聯規則之定義 - 13 -
2.7.2 關聯規則探勘法之目的 - 14 -
2.7.3 關聯規則探勘方法 - 14 -
2.7.4 Apriori演算法 - 14 -

第三章 研究方法 - 16 -
3.1 方法架構 - 16 -
3.2 訓練階段(Training Stage) - 18 -
3.2.1 ModelCRM之建構 - 19 -
3.2.2 ModelSAR之建構 - 21 -
3.3 預測階段(Prediction stage) - 27 -
3.3.1 ModelCRM之應用(Annotation by ModelCRM) - 28 -
3.3.2 ModelSAR之應用(Annotation by ModelSAR) - 29 -
3.3.3 融合ModelCRM和ModelSAR(Fusion ModelCRM and ModelSAR) - 32 -

第四章 實驗分析 - 33 -
4.1 實驗資料說明 - 33 -
4.2 實驗評估公式 - 36 -
4.3 實驗設計 - 37 -
4.3.1 ModelCRM的參數設定實驗 - 37 -
4.3.2 ModelCRM預測實驗 - 39 -
4.3.3 ModelSAR預測實驗 - 42 -
4.3.4融合ModelCRM和ModelSAR的預測實驗 - 49 -
4.4 各種預測模型之比較 - 60 -
4.5 補充實驗 - 62 -
4.6 實驗總結 - 65 -

第五章 結論及未來發展 - 66 -
5.1 結論 - 66 -
5.2 未來發展 - 67 -

參考文獻 - 68 -

作者自述 - 72 -

參考文獻 參考文獻
[1] R. Agrawal, T. Imielinski and A. Swami, “Mining Association Rules Between Sets of Items in Large DataBases,” Proc. Of the ACM SIGMOD Conference on Management of Data, pp207-216.1993.

[2] R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proc.20th Very Large Databases(VLDB) Conference,pp 487-499,Chile.1994.

[3] K. Barnard, P. Duygulu, N. De Freitas, D. A. Forsyth, D. Blei, and M. Jordan, ”Matching words and pictures,” Journal of Machine Learning Research. 3:1107-1135, 2003.

[4] K. Barnard and D. A. Forsyth, “Exploiting image semantics for picture libraries,” In The First ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 408-15, 2001.

[5] A. Bouajila , C. Claus and A. Herkersdorf .“MPEG-7 eXperimentation Model (XM).” Avaliable at : http://www.lis.e-technik.tu-muenchen.de/research/bv/topics/mmdb/e_mpeg7.html

[6] A. Dorado, J. Calic, and E. izquierdo. “A Rule-based Video Annotation System.” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 14, No. 5, May 2004.

[7] P. Duygulu and H. Wactlar, “Associating Video Frames with Text,” In Proccedings of the SIGIR Multimedia Information Retrieval Workshop 2003, Aug, 2003.

[8] C. Fellbaum, "WordNet : An Electronic Lexical Database Edited by Christiane Fellbaum," MIT Press. May, 1998.

[9] S. L. Feng, R. Manmatha, and V. Lavrenko, “Multiple Bernoulli Relevance Models for Image and Video Annotation.” 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 1002-1009, 2004.

[10] J. L. Gauvain, L. Lamel, and G. Adda. “The LIMSI Broadcast News Transcription System.” Speech Communication, 37(1-2):89-108, 2002.

[11] A. Ghoshal, P. Ircing and S. Khudanpur, "Hidden Markov Models for Automatic Annotation and Content-Based Retrieval of Images and Video," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR '05. August 2005.

[12] K. Hacioglu and B. Pellom. “A Distributed Architecture for Robust Automatic Speech Recognition.” IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. (ICASSP '03) Volume 1,pp.328-331,April 2003.

[13] W. J. Heng, “Shot Boundary Refinement for Long Transition in Digital Video Sequence,” IEEE Trans. On Multimedia, Vol. 4, No. 4, pp. 434-445, December 2002.

[14] J. H. Huang, "A Novel Video Annotation Method by Integrating Visual Features and Frequent Patterns", Master Thesis, National Cheng Kung University, 2006.

[15] IBM Research Center , “IBM VideoAnnEx Annotation Tool.” Avaliable at : http://www.research.ibm.com/VideoAnnEx/

[16] C. Jelmini and S. Marchand-Maillet. Deva. “an extensible ontology-based annotation model for visual document collections.” In Proceedings of SPIE Photonics West, Electronic Imaging 2002, Internet Imaging IV, Santa Clara, CA, USA, 2003.

[17] K. Johar and R. Simha,"The George Washington University JWord 3.0," Avaliable at : http://www.seas.gwu.edu/~simhaweb/software/jword/

[18] J. R. Kender and M. R. Naphade. “Visual Concepts for News Story Tracking: Analyzing and Exploiting the NIST TRECVID Video Annotation Experiment.” Proceeding of the 2005 IEEE Computer Society Conference on Computer Vision and Patter Recognition, June 2005.

[19] V. Lavrenko, R. Manmatha and J. Jeon. “A Model for Learning the Semantics of Pictures.” In Proceedings of NIPS’03. 2003

[20] V. Lavrenko, S. L. Feng, and R. Manmatha, “Statistical Models for Automatic Video Annotation and Retrieva.” The International Conference on Acoustics, Speech and Signal Processing, May 2004.

[21] J. Li and J. Z. Wang. “Automatic linguistic indexing of pictures by a statistical modeling approach.” IEEE Trans. On Pattern Analysis and Machine Intelligence, 25(10) : 14, 2003.

[22] G. A. Miller, “Wordnet: A Dictionary Browser in Information in Data,” Proceedings of the First Conference of the UW Centre for the New Oxford Dictionary. Waterloo, Canada: University of Waterloo, 1985.

[23] NIST(The National Institute of Standards and Technology), IN Proceedings of the TREC Video Retrieval Evaluation Conference(TrecVID2003), November 2003.

[24] B. Pellom, "Sonic: The University of Colorado Continuous Speech Recognizer", Technical Report TR-CSLR-2001-01, CSLR, University of Colorado, March 2001.

[25] M.F. Porter, "An algorithm for suffix stripping," published in Program, 14 no. 3, pp 130-137, July 1980.

[26] J. D. M. Rennie, “Derivation of the F-Measure,” Available at : http://people.csail.mit.edu/jrennie/writing, February 2004.

[27] Y. Rui, T. S. Huang, and S. Mehrotra, ”Constructing Table-of-Content for Videos.” ACM Multimedia Systems Journal – Special Issue Multimedia Systems on Video Libraries, Vol. 7, No. 5, pp. 359-368, September 1999.

[28] G. Salton and M.J. McGill, ”Introduction to modern information retrieval.” McGraw Hill. 1983.

[29] M. Srikanth, J. Varner, M. Bowden and Moldovan,"Exploiting ontologies for automatic image annotation," Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval SIGIR '05, August 2005.

[30] C. Zhang, S. C. Chen and M. L. Shyu. “PixSO : A system for video shot detection.” ICICS-PCM , December 2003.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2007-08-29起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2007-08-29起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw