||Structural Video Summarization Based on the Multimodal Semantic Analysis and Mining
||Department of Electrical Engineering
structuralized video summarization
video semantic analysis
concept expansion tree
social network analysis
salient motion entropy
motion vector analysis
In this dissertation, we proposed a novel video summarization technique, which was capable of converting a video into a structural summary. Such proposed structuralized video abstracts are composed of the four types of entities: Who, what, where, and when. Furthermore, corresponding shots and their annotations are also listed in each type of entity. The correlated shots are connected by relational edges. In order to discover important clues among a video, we employed a graph mining algorithm to extract components from these entities. Therefore, users can comprehend the story without wasting too much time. In addition, novel sound feature extraction and classification were also adopted to identify audio segments in the video.
On the other hand, a single video usually provides limited information for users. If we can find out those resources which are relevant to the video, we may generate more complete knowledge for them. Accordingly, while augmenting existing contents of a video, we made use of the structuralized clues and integrated with the social network analysis to retrieve media from online resources. After related media were extracted, they could be taken as the complementary materials to enrich the original video contents.
In this dissertation, we also presented a novel technique in generating dynamic skimming videos. Salient motion entropies and an improved mutual information equation were proposed to efficiently highlight video events in a video.
Lastly, we combined the proposed technique with the commercial applications, developing an enhanced video-on-demand system. With the structural information and the support of the scalable video coding, our system could not only facilitate video selection but also make it more efficient for server sides to store digital files.
ABSTRACT (CHINESE) i
ABSTRACT (ENGLISH) iii
LIST OF TABLES ix
LIST OF FIGURES x
CHAPTER 1 1
1.1 Motivation 1
1.2 Background and Literature Review 2
1.2.1 Video Summarization Techniques 2
1.2.2 Video Content Augmentation Techniques 4
1.2.3 Browsing Interfaces and Applications 5
1.3 Contributions of the Dissertation 6
1.4 Outline of the Dissertation 7
CHAPTER 2 8
STRUCTURALIZED VIDEO SUMMARIZATION BASED ON SEMANTIC CONCEPT ENTITIES 8
2.1 Introduction 8
2.2 Mapping Visual Contents to Text 10
2.2.1 Visual and Textual Content Pre-Analysis 10
2.2.2 Maximum Entropy Criterion-Based Annotator 12
2.3 Concept Expansion 14
2.3.1 Background 14
2.3.2 Constructing Trees 14
2.3.3 Dependency Degree Function 15
2.4 Structuralizing Video Contents Using Multimodal Information 16
2.4.1 Annotation Classification Using WordNet 16
2.4.2 Building Vertices in the Relational Graph 17
2.4.3 Building Relations in the Relational Graph 18
2.4.4 Mining Important Vertices and Edges 20
2.4.5 Audio Feature Classification 24
2.5 Experimental Results 28
2.5.1 Concept Expansion and Word Relations 29
2.5.2 Informativeness and Interrelation 32
2.5.3 Contextual Information Among Shots 40
2.5.4 Adaptation of the Proposed System to the Traditional Storyboard 42
2.5.5 Discussions About the Browsing and Indexing Interface 44
2.6 Summary 45
CHAPTER 3 46
VIDEO KNOWLEDGE AUGMENTATION USING STRUCTURALIZED SEMANTIC CONTENTS 46
3.1 Introduction 46
3.2 Graph-Organizing via Spatial-Temporal Modeling 47
3.3 Content Augmentation Based on the Social Network Analysis 48
3.3.1 Constructing the Fundamental Social Network 49
3.3.2 Converting to the Line Graph 49
3.3.3 Mapping Back to the Vertex Graph 51
3.3.4 Calculating Relevant Clusters 53
3.4 Experimental Results 53
3.4.1 Associations Between the Augmented and Structuralized Contents 53
3.4.2 Evaluation of the Contextual Features in the Social Network Analysis 56
3.4.3 Discussions of Time Complexity 59
3.5 Summary 59
CHAPTER 4 61
STRUCTURALIZED CONTEXT-AWARE VIDEO CONTENT FOR VOD SERVICES 61
4.1 Introduction 61
4.2 Processing Context-Aware Contents 62
4.3 Scalable Resolution Support 63
4.4 Experimental Results 67
4.5 Summary 69
CHAPTER 5 70
SPORTS VIDEO SUMMARIZATION BASED ON THE SALIENT MOTION AND INFORMATION ANALYSIS 70
5.1 Introduction 70
5.2 Saliency Map Extraction 71
5.3 Salient Motion Entropy 72
5.4 Mutual Information Based on Salient Motions 74
5.5 Experimental Results 75
5.6 Summary 77
CHAPTER 6 78
6.1 Summary 78
6.2 Future Work 79
PUBLICATION LIST 87
 Y. Peng and C.-W. Ngo, “Clip-based similarity measure for query-dependent clip retrieval and video summarization,” IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 5, pp. 612–627, May 2006.
 S. Lu, I. King, and M. R. Lyu, “A novel video summarization framework for document preparation and archival applications,” in Proc. 2005 IEEE Aerospace Conf., Big Sky, Montana, United States, 2005, Mar. 05–12, pp. 1–10.
 J.-M. Odobez, D. Gatica-Perez, and M. Guillemot, “Spectral structuring of home videos,” in Proc. 2003 ACM Int. Conf. Image and Video Retrieval, Urbana, Illinois, United States, 2003, Jul. 24–25, pp. 310–320.
 J.-M. Odobez, D. Gatica-Perez, and M. Guillemot, “Video shot custering using spectral methods,” in Proc. 3rd Int. Workshop on Content-Based Multimedia Indexing, Rennes, France, 2003, Sep. 22–24, pp. 94–102.
 C.-W. Ngo, Y.-F. Ma, and H.-J. Zhang, “Video summarization and scene detection by graph modeling,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 2, pp. 296–305, Feb. 2005.
 Z. Li, G. M. Schuster, and A. K. Katsaggelos, “MINMAX optimal video summarization,” IEEE Trans. Circuits and Systems for Video Technology, vol. 15, no. 10, pp. 1245–1256, Oct. 2005.
 K. A. Peker and F. I. Bashir, “Content-based video summarization using spectral clustering,” in Proc. 2005 Int. Workshop on Very Low Bit-Rate Video-Coding, Sardinia, Italy, 2005, Sep. 15–16.
 Z. Cernekova, I. Pitas, and C. Nikou, “Information theory-based shot cut/fade detection and video summarization,” IEEE Trans. Circuits and Systems for Video Technology, vol. 16, no. 1, pp. 82–91, Jan. 2006.
 A. Bagga, J. Hu, J. Zhong, and G. Ramesh, “Multi-source combined-media video tracking for summarization,” in Proc. 16th IEEE Int. Conf. Pattern Recognition, Quebec City, Quebec, Canada, 2002, Aug. 11–15, pp. 818–821.
 C. M. Taskiran, Z. Pizlo, A. Amir, D. Ponceleon, and E. J. Delp, “Automated video program summarization using speech transcripts,” IEEE Trans. Multimedia, vol. 8, no. 4, pp. 775–791, Aug. 2006.
 X. Zhu, J. Fan, A. K. Elmagarmid, and X. Wu, “Hierarchical video content description and summarization using unified semantic and visual similarity,” Multimedia Systems, vol. 9, no. 1, pp. 31–53, Jul. 2003.
 M. G. Christel and A. S. Warmack, “The effect of text in storyboards for video navigation,” in Proc. 2001 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Salt Lake City, Utah, United States, 2001, May 07–11, pp. 1409–1412.
 Y.-F. Ma, L. Lu, H.-J. Zhang, and M. Li, “A user attention model for video summarization,” in Proc. 10th ACM Int. Conf. Multimedia, Juan-les-Pins, France, 2002, Dec. 01–06, pp. 533–542.
 J. You, G. Liu, L. Sun, and H. Li, “A multiple visual models based perceptive analysis framework for multilevel video summarization,” IEEE Trans. Circuits and Systems for Video Technology, vol. 17, no. 3, pp. 273–285, Mar. 2007.
 Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, “A generic framework of user attention model and its application in video summarization,” IEEE Trans. Multimedia, vol. 7, no. 5, pp. 907–919, Oct. 2005.
 Y. Li, S.-H. Lee, C.-H. Yeh, and C.-C. J. Kuo, “Techniques for movie content analysis and skimming: Tutorial and overview on video abstraction techniques,” IEEE Signal Processing Magazine, vol. 23, no. 2, pp. 79–89, Mar. 2006.
 B. T. Truong and S. Venkatesh, “Generating comprehensible summaries of rushes sequences based on robust feature matching,” in Proc. Int. Workshop on TRECVID Video Summarization, Bavaria, Germany, 2007, Sep. 23–28, pp. 30–34.
 M. Detyniecki and C. Marsala, “Video rushes summarization by adaptive acceleration and stacking of shots,” in Proc. Int. Workshop on TRECVID Video Summarization, Bavaria, Germany, 2007, Sep. 23–28, pp. 65–69.
 Q. Ma and K. Tanaka, “WebTelop: Dynamic TV-content augmentation by using Web pages,” in Proc. 2003 Int. Conf. Multimedia and Expo, Baltimore, Maryland, United States, 2003, Jul. 06–09, pp. 173–176.
 N. Haas, R. Bolle, N. Dimitrova, A. Janevski, and J. Zimmerman, “Personalized news through content augmentation and profiling,” in Proc. 2002 Int. Conf. Image Processing, Rochester, New York, United States, 2002, Sep. 22–25, pp. 9–12.
 Q. Ma and K. Tanaka, “Topic-structure-based complementary information retrieval and its application,” ACM Trans. Asian Language Information Processing, vol. 4, no. 4, pp. 475–503, Dec. 2005.
 N. Dimitrova, J. Zimmerman, A. Janevski, L. Agnihotri, N. Haas, and R. Bolle, “Content augmentation aspects of personalized entertainment experience,” in Proc. 3rd Workshop on Personalization in Future TV, Johnstown, Pennsylvania, United States, 2003, Jun. 22–26, pp. 42–51.
 T.-Y. Liu, W.-Y. Ma, and H.-J. Zhang, “Effective feature extraction for play detection in american football video,” in Proc. 11th Int. Multimedia Modeling Conference, Melbourne, Australia, 2005, Jan. 12–14, pp. 164–171.
 T. Liu, H.-J. Zhang, and F. Qi, “A novel video key-frame-extraction algorithm based on perceived motion energy model,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no. 10, pp. 1006–1013, Oct. 2003.
 U. Brandes, “A faster algorithm for betweenness centrality,” Journal of Mathematical Sociology, vol. 25, no. 2, pp. 163–177, 2001.
 Z. Shen, K.-L. Ma, and T. Eliassi-Rad, “Visual analysis of large heterogeneous social networks by semantic and structural abstraction,” IEEE Trans. Visualization and Computer Graphics, vol. 12, no. 6, pp. 1427–1439, Nov. 2006.
 S. Brin and L. Page, “The anatomy of a large-scale hypertextual web search engine,” in Proc. 7th World-Wide Web Conf., BrisBane, Australia, 1998, Apr. 21–25, pp. 107–117.
 S. V. Dongen, “Graph clustering via a discrete uncoupling process,” SIAM Journal on Matrix Analysis and Applications, vol. 30, no. 1, pp. 121–141, Feb. 2008.
 J. Warmbrodt, H. Sheng, and R. Hall, “Social network analysis of video bloggers' community,” in Proc. 41st Annu. Hawaii Int. Conf. System Sciences, Waikoloa, Hawaii, United States, 2008, Jan. 07–10, pp. 291–299.
 U. Brandes, J. Lerner, M. J. Lubbers, C. McCarty, and J. L. Molina, “Visual statistics for collections of clustered graphs,” in Proc. 2008 IEEE Pacific Symp. Visualization, Kyoto, Japan, 2008, Mar. 04–07, pp. 47–54.
 B. Shevade, H. Sundaram, and L. Xie, “Modeling personal and social network context for event annotation in images,” in Proc. 7th ACM/IEEE-CS Joint Conf. Digital Libraries, Vancouver, British Columbia, Canada, 2007, Jun. 18–23, pp. 127–134.
 C.-Y. Weng, W.-T. Chu, and J.-L. Wu, “Movie analysis based on roles' social network,” in Proc. 2007 IEEE Int. Conf. Multimedia and Expo, Beijing, China, 2007, Jul. 02–05, pp. 1403–1406.
 A. Girgensohn, J. Boreczky, and L. Wilcox, “Keyframe-based user interfaces for digital video,” Computer, vol. 34, no. 9, pp. 61–67, Sep. 2001.
 A. Haubold and J. R. Kender, “VAST MM: Multimedia browser for presentation video,” in Proc. 6th ACM Int. Conf. Image and Video Retrieval, Amsterdam, Netherlands 2007, Jul. 09–11, pp. 14–18.
 L. Tang and J. R. Kender, “Designing an intelligent user interface for instructional video indexing and browsing,” in Proc. 11th ACM Int. Conf. Intelligent User Interfaces, Sydney, Australia, 2006, Jan. 29–Feb. 01, pp. 318–320.
 D. C. Gibbon, “Generating hypermedia documents from transcriptions of television programs using parallel text alignment,” in Proc. 8th Int. Workshop on Research Issues in Data Engineering: Continuous-Media Databases and Applications, Orlando, Florida, United States, 1998, Feb. 23–24, pp. 26–33.
 J. Graham and J. J. Hull, “Video paper: A paper-based interface for skimming and watching video,” in Proc. 2002 IEEE Int. Conf. Consumer Electronics, Auckland, New Zealand, 2002, Dec. 03–06, pp. 214–215.
 N. Katashi, O. Shigeki, and Y. Mitsuhiro, “Annotation-based multimedia summarization and translation,” in Proc. 19th Int. Conf. Computational Linguistics, Taipei, Taiwan, 2002, Aug. 24–Sep. 01, pp. 1–7.
 J.-Y. Pan, H. Yang, and C. Faloutsos, “MMSS: Multi-modal story-oriented video summarization,” in Proc. 4th IEEE Int. Conf. Data Mining Brighton, United Kingdom, 2004, Nov. 01–04, pp. 491–494.
 M. G. Christel, “Supporting video library exploratory search: When storyboards are not enough,” in Proc. 2008 ACM Int. Conf. Content-Based Image and Video Retrieval, Niagara Falls, Ontario, Canada, 2008, Jul. 07–09, pp. 447–456.
 A. Aya, L. Tang, and J. R. Kender, “A method and browser for cross-referenced video summaries,” in Proc. 2002 IEEE Int. Conf. Multimedia and Expo, Lusanne, Switzerland, 2002, Aug. 26–29, pp. 237–240.
 L.-N. Zhang, C. Yuan, and Y.-Z. Zhong, “HandVoD: A robust and scalable VoD solution with raptor codes over GPRS/EDGE network,” in Proc. 4th IEEE Int. Conf. Circuits and Systems for Communications, Shanghai, China, 2008, May 26–28, pp. 482–486.
 I. Yurie and Y. Takami, “A proposal of related information providing system on distributed VOD,” in Proc. IEEE Workshop on Knowledge Media Networking, Kyoto, Japan, 2002, Jul. 10–12, pp. 43–48.
 C.-S. Park, T.-S. Wang, J.-H. Kim, M.-C. Hwang, and S.-J. Ko, “Video transcoding to support playback at a random location for scalable video coding,” IEEE Trans. Consumer Electronics, vol. 53, no. 1, pp. 227–234, Feb. 2007.
 J. C. Paolillo, “Structure and network in the YouTube core,” in Proc. 41st Annu. Hawaii Int. Conf. System Sciences, Waikoloa, Hawaii, United States, 2008, Jan. 07–10, pp. 156–165.
 T. Liu and R. Katpelly, “An interactive system for video content exploration,” IEEE Trans. Consumer Electronics, vol. 52, no. 4, pp. 1368–1376, Nov. 2006.
 X. Lan, N. Zheng, J. Xue, B. Gao, and X. Wu, “Adaptive VoD architecture for heterogeneous networks based on scalable wavelet video coding,” IEEE Trans. Consumer Electronics, vol. 53, no. 4, pp. 1401–1409, Nov. 2007.
 C. Harrison, B. Amento, and L. Stead, “iEPG: An ego-centric electronic program guide and recommendation interface,” in Proc. 1st Int. Conf. Designing Interactive User Experiences for TV and Video, Silicon Valley, California, United States, 2008, Oct. 22–24, pp. 23–26.
 W. Cooper, “The interactive television user experience so far,” in Proc. 1st Int. Conf. Designing Interactive User Experiences for TV and Video, Silicon Valley, California, United States, 2008, Oct. 22–24, pp. 133–142.
 M. G. Christel, A. M. Olligschlaeger, and C. Huang, “Interactive maps for a digital video library,” IEEE Multimedia Mag., vol. 7, no. 1, pp. 60–67, Jan. 2000.
 Y. Rui, T. S. Huang, and S. Mehrotra, “Exploring video structure beyond the shots,” in Proc. 1998 IEEE Int. Conf. Multimedia Computing and Systems, Austin, Texas, United States, 1998, Jun. 28–Jul. 01, pp. 237–240.
 B. T. Truong, C. Dorai, and S. Venkatesh, “New enhancements to cut, fade, and dissolve detection processes in video segmentation,” in Proc. 8th ACM Int. Conf. Multimedia, Marina del Rey, California, United States, 2000, Oct. 30–Nov. 03, pp. 219–227.
 T.-H. Tsai and Y.-C. Chen, “A robust shot change detection method for content-based retrieval,” in Proc. 2005 IEEE Int. Symp. Circuits and Systems, Taoyuan, Taiwan, 2005, May 23–26, pp. 4590–4593.
 Y. Rui, T. S. Huang, and S. Mehrotra, “Constructing table-of-content for videos,” Multimedia Systems, vol. 7, no. 5, pp. 359–368, Sep. 1999.
 A. Velivelli and T. S. Huang, “Automatic video annotation by mining speech transcripts,” in Proc. 2006 IEEE Int. Conf. Computer Vision and Pattern Recognition, New York City, New York, United States, 2006, Jun. 17–22, pp. 115–122.
 D. Pelleg and A. Moore, “X-means: Extending K-means with efficient estimation of the number of clusters,” in Proc. 17th Int. Conf. Machine Learning, Stanford, California, United States, 2000, Jun. 29–Jul. 02, pp. 727–734.
 S. Patwardhan, S. Banerjee, and T. Pedersen, “SenseRelate::TargetWord—A generalized framework for word sense disambiguation,” in Proc. 43rd Annu. Meeting of the Association for Computational Linguistics, Michigan, Michigan, United States, 2005, Jun. 25–30, pp. 73–76.
 S.-S. Kang, “Keyword-based document clustering,” in Proc. 6th Int. Workshop on Information Retrieval with Asian Languages, Sappro, Japan, 2003, Jul. 07–12, pp. 132–137.
 J. Argillander, G. Iyengar, and H. Nock, “Semantic annotation of multimedia using maximum entropy models,” in Proc. 2005 IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Philadelphia, Pennsylvania, United States, 2005, Mar. 18–23, pp. 153–156.
 S. Gao, J.-H. Lim, and Q. Sun, “An integrated statistical model for multimedia evidence combination,” in Proc. 15th ACM Int. Conf. Multimedia, Augsburg, Germany, 2007, Sep. 25–29, pp. 872–881.
 W. H.-M. Hsu and S.-F. Chang, “Generative, discriminative, and ensemble learning on multi-modal perceptual fusion toward news video story segmentation,” in Proc. 2004 IEEE Int. Conf. Multimedia and Expo, Taipei, Taiwan, 2004, Jun. 27–30, pp. 1091–1094.
 A. L. Berger, S. A. D. Pietra, and V. J. D. Pietra, “A maximum entropy approach to natural language processing,” Computational Linguistics, vol. 22, no. 1, pp. 39–71, Mar. 1996.
 G. A. Miller, “Nouns in WordNet: A lexical inheritance system,” International Journal of Lexicography, vol. 3, no. 4, pp. 245–264, Jan. 1990.
 H. Liu, “Unpacking meaning from words: A context-centered approachto computational lexicon design,” in Proc. 4th Context International and Interdisciplinary Conf. Modeling and Using Context, Stanford, California, United States, 2003, Jun. 23–25, pp. 218–232.
 A. Hoogs, J. Rittscher, G. Stein, and J. Schmiederer, “Video content annotation using visual analysis and a large semantic knowledgebase,” in Proc. 2003 IEEE Int. Conf. Computer Vision and Pattern Recognition, Madison, Wisconsi, United States, 2003, Jun. 18–20, pp. 327–334.
 L. Khan, D. McLeod, and E. Hovy, “Retrieval effectiveness of an ontology-based model for information selection,” International Journal on Very Large Data Bases, vol. 13, no. 1, pp. 71–85, Jan. 2004.
 A. Hulth and B. B. Megyesi, “A study on automatically extracted keywords in text categorization,” in Proc. 21st Int. Conf. Computational Linguistics and 44th Annu. Meeting of the Association for Computational Linguistics, Sydney, Australia, 2006, Jul. 17–18, pp. 537–544.
 L. Singh, M. Beard, L. Getoor, and M. B. Blake, “Visual mining of multi-modal social networks at different abstraction levels,” in Proc. 11th Int. Conf. Information Visualization, Zurich, Switzerland, 2007, Jul. 04–06, pp. 672–679.
 J. Shetty and J. Adibi, “Discovering important nodes through graph entropy the case of Enron email database,” in Proc. 3rd ACM Int. Conf. Link Discovery, Chicago, Illinois, United States, 2005, Aug. 21–25, pp. 74–81.
 R. Navigli and M. Lapata, “Graph connectivity measures for unsupervised word sense disambiguation,” in Proc. 20th Int. Joint Conf. Artificial Intelligence, Hyderabad, India, 2007, Jan. 06–12, pp. 1683–1688.
 D. Walther and C. Koch, “Modeling attention to salient proto-objects,” Neural Networks, vol. 19, no. 9, pp. 1395–1407, Nov. 2006.
 J. T. Foote, “Content-based retrieval of music and audio,” in Proc. 1997 SPIE Conf., Dallas, Texas, United States, 1997, Nov. 03, pp. 138–147.
 S. Pfeiffer, S. Fischer, and W. Effelsberg, “Automatic audio content analysis,” in Proc. 5th ACM Int. Conf. Multimedia, Boston, Massachusetts, United States, 1996, Nov. 18–22, pp. 21–30.
 S. Z. Li, “Content-based audio classification and retrieval using the nearest feature line method,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 5, pp. 619–625, Sep. 2000.
 G. Guo and S. Z. Li, “Content-based audio classification and retrieval by support vector machines,” IEEE Trans. Neural Networks, vol. 14, no. 1, pp. 209–215, Jan. 2003.
 C.-C. Lin, S.-H. Chen, T.-K. Truong, and Y. Chang, “Audio classification and categorization based on wavelets and support vector machine,” IEEE Trans. Speech and Audio Processing, vol. 13, no. 5, pp. 644–651, Sep. 2005.
 C. D. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge, MA: Cambridge University Press, 2008.
 B. Upcroft, S. Kumar, M. Ridley, L. L. Ong, and H. Durrant-Whyte, “Fast re-parameterisation of gaussian mixture models for robotics applications,” in Proc. 2004 Australasian Conf. Robotics and Automation, Canberra, Australia, 2004, Nov. 11–30.
 M. D. Berg, M. V. Kreveld, M. Overmars, and O. Schwarzkopf, Computational Geometry: Algorithms and Applications, 2nd ed. New York, NY: Springer-Verlag, 2000.
 A. D. R. McQuarrie and C.-L. Tsai, Regression and Time Series Model Selection. River Edge, NJ: World Scientific, 1998.
 L. Tan and D. Taniar, “Adaptive estimated maximum-entropy distribution model,” Information Sciences, vol. 177, no. 15, pp. 3110–3128, 2007.
 L.-Y. Duan, M. Xu, Q. Tian, C.-S. Xu, and J. S. Jin, “A unified framework for semantic shot classification in sports video,” IEEE Trans. Multimedia, vol. 7, no. 6, pp. 1066–1083, Dec. 2005.
 V. Mezaris, I. Kompatsiaris, N. V. Boulgouris, and M. G. Strintzis, “Real-time compressed-domain spatiotemporal segmentation and ontologies for video indexing and retrieval,” IEEE Trans. Circuits and Systems for Video Technology, vol. 14, no. 5, pp. 606–621, May 2004.
 B.-W. Chen, J.-C. Wang, and J.-F. Wang, “A novel video summarization based on mining the story-structure and semantic relations among concept entities,” IEEE Trans. Multimedia, vol. 11, no. 2, pp. 295–312, Feb. 2009.
 H. Zheng, H. Wang, and D. H. Glass, “Integration of genomic data for inferring protein complexes from global protein-protein interaction networks,” IEEE Trans. Systems, Man, and Cybernetics—Part B, vol. 38, no. 1, pp. 5–16, Feb. 2008.
 J. B. Pereira-Leal, A. J. Enright, and C. A. Ouzounis, “Detection of functional modules from protein interaction networks,” Proteins: Structure, Function, and Bioinformatics, vol. 54, no. 1, pp. 49–57, Jan. 2004.
 D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.
 P. H. S. Torr and D. W. Murray, “The development and comparison of robust methods for estimating the fundamental matrix,” International Journal of Computer Vision vol. 24, no. 3, pp. 271–300, Nov. 1997.
 S. Ahn, M. Choi, J. Choi, and W. K. Chung, “Data association using visual object recognition for EKF-SLAM in home environment,” in Proc. 2006 IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Beijing, China, 2006, Oct. 09–15, pp. 2588–2594.
 P. D. Turney, “Expressing implicit semantic relations without supervision,” in Proc. 21st Int. Conf. Computational Linguistics and 44th Annu. Meeting of the Association for Computational Linguistics, Sydney, Australia, 2006, Jul. 17–21, pp. 313–320.
 J. M. Kleinberg, “Authoritative sources in a hyperlinked environment,” in Proc. 9th Annu. ACM-SIAM Symp. Discrete Algorithms, San Francisco, California, United States, 1998, Jan. 25–27, pp. 668–677.
 S. V. Dongen, "A cluster algorithm for graphs," National Research Institute for Mathematics and Computer Science, Amsterdam, Netherlands, May 2000.
 T. Wiegand, G. Sullivan, H. Schwarz, and M. Wien, "Scalable video coding: Amendment 3 to ITU-T Rec. H.264 (2005) | ISO/IEC 14496-10:2005," Joint Video Team of ITU-T VCEG and ISO/IEC MPEG, 2007.
 T. Wiegand, G. J. Sullivan, G. Bjøntegaard, and A. Luthra, “Overview of the H.264/AVC video coding standard,” IEEE Trans. Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 560–576, Jul. 2003.
 S. C. Dass, “Markov random field models for directional field and singularity extraction in fingerprint images,” IEEE Trans. Image Processing, vol. 13, no. 10, pp. 1358–1367, Oct. 2004.
 W.-Y. Ma and B. S. Manjunath, “A texture thesaurus for browsing large aerial photographs,” Journal of the American Society for Information Science, vol. 49, no. 7, pp. 633–648, May 1998.
 X. Li and M. T. Orchard, “New edge-directed interpolation,” IEEE Trans. Image Processing, vol. 10, no. 10, pp. 1521–1527, Oct. 2001.
 J. S. Goldstein, I. S. Reed, and L. L. Scharf, “A multistage representation of the Wiener filter based on orthogonal projections,” IEEE Trans. Information Theory, vol. 44, no. 7, pp. 2943–2959, Nov. 1998.
 J. Vieron, M. Wien, and H. Schwarz, "Joint Scalable Video Model (JSVM) 11 Software," Joint Video Team of ITU-T VCEG and ISO/IEC MPEG, Geneva, Switzerland, Doc. JVT-X203 2007.
 B.-L. Yeo and B. Liu, “Rapid scene analysis on compressed video,” IEEE Trans. Circuits and Systems for Video Technology, vol. 5, no. 6, pp. 533–544, Dec. 1995.
 C.-Y. Chen, J.-C. Wang, J.-F. Wang, and Y.-H. Hu, “Event-based segmentation of sports video using motion entropy,” in Proc. 9th IEEE Int. Symp. Multimedia, Taichung, Taiwan, 2007, Dec. 10–12, pp. 107–111.
 M. J. Black, “The robust estimation of multiple motions: Parametric and piecewise-smooth flow fields,” Computer Vision and Image Understanding, vol. 63, no. 1, pp. 75–104, Jan. 1996.
 D. Walther, U. Rutishauser, C. Koch, and P. Perona, “Selective visual attention enables learning and recognition of multiple objects in cluttered scenes,” Computer Vision and Image Understanding, vol. 100, no. 1–2, pp. 41–63, Oct. 2005.