||Predicting Cooperation Relationships in Heterogeneous Movie Networks
||Department of Engineering Science
heterogeneous information network
social network analysis
在社群網路分析中，有一個廣泛被討論的議題是預測人與人之間未來是否會建立新的互動 (諸如友誼或合作) 關係；然而，當我們以較複雜之異質資訊網路而非單純的同質資訊網路，來表現真實世界中的人際互動關係時，此一未來關係預測的問題將變得更加具有挑戰性。在本研究中，我們著重之應用是由多種類型的實體 (例如：電影、導演、演員、製片商和電影類別) 及多種類型的連結所構成之電影網路；為了將此一電影網路中的片段結構與其對應意義清楚地表示出來，我們採用了以『詮釋路徑』為基礎的預測模型。我們所提出之方法有兩大優點，首先，電影網路中的拓撲特徵可以被系統化地擷取出來；其次，我們可使用監督式學習的方法來獲取各個特徵之最佳權重值，以建立一個有效的合作關係預測模型。藉由以真實的IMDb資料集進行實驗驗證，我們的方法在一個大規模的電影網路中可以精確地預測出導演/演員間 (如：一位導演和一位演員、兩位導演或兩位演員間) 的未來合作關係。
In social network analysis, relationship prediction among people in the interpersonal network is a broadly discussed problem. Nevertheless, when modeling a real network as a heterogeneous information network instead of a homogeneous one, this problem becomes more challenging. In this work, we focus on the movie network constituted by multiple types of entities (e.g., movies, participants, studios, and genres) and multiple types of links among these entities. To clearly represent the semantic meanings in such a movie network, we utilize the meta-path-based prediction model. Advantages of our approach are two-fold. First, the meta-path-based method systematically retrieves topological features in a movie network. Second, we use the supervised method to learn the best weights connected with different topological features in building cooperation relationships. Empirical studies based on the real IMDb dataset show that our approach precisely predicts cooperation relationships in a large-scale movie network.
Chapter 1 Introduction................................... 1
1.1 Motivation and Overview of the Thesis................ 1
1.2 Contributions of the Thesis.......................... 3
Chapter 2 Literature Survey.............................. 5
2.1 Types of Social Networks............................. 5
2.2 Social Network Analysis and Its Usage in Different Applications............................................. 7
2.3 Challenges of Link Mining............................ 9
2.4 Link Prediction in Social Network Analysis.......... 11
Chapter 3 A Meta-path-based Model for Heterogeneous Movie Networks................................................ 15
3.1 Utilizing the Meta Structure in Movie Networks...... 15
3.2 Meta-paths and Their Topological Features........... 19
3.3 Establishing the Cooperation Prediction Model....... 22
Chapter 4 Empirical Studies............................. 26
4.1 Experimental Environments........................... 26
4.2 Experimental Results................................ 29
Chapter 5 Conclusions and Future Works.................. 38
 L. A. Adamic and E. Adar, "Friends and Neighbors on the Web," Social Networks, 25(3): 211-230, July 2003.
 R. Agrawal and R. Srikant, "Fast Algorithms for Mining Association Rules," Proceeding of the 20th ACM International Conference on Very Large Data Bases, pp. 487-499, September 1994.
 S. P. Borgatti, A. Mehra, D. J. Brass, and G. Labianca, "Network Analysis in the Social Sciences," Science, 323(5916): 892-895, February 2009.
 P. J. Carrington, J. Scott, and S.Wasserman, "Models and Methods in Social Network Analysis," Cambridge University Press, 2005.
 C. Wang, V. Satuluri, and S. Parthasarathy, "Local Probabilistic Models for Link Prediction," Proceeding of the 7th IEEE International Conference on Data Mining, pp. 322-331, October 2007.
 R. Chellappa and A. Jain, "Markov Random Fields: Theory and Applications," Academic Press, 1993.
 M. Craven, D. DiPasquo, D. Freitag, A. McCallum, T. Mitchell, K. Nigam, and S. Slattery, "Learning to Construct Knowledge Bases from the World Wide Web," Artificial Intelligence, 118(1-2): 69-113, April 2000.
 L. Dehaspe, H. Toivonen, and R. D. King, "Finding Frequent Substructures in Chemical Compounds," Proceeding of the International Conference on Knowledge Discovery and Data Mining, pp. 30-36, 1998.
 P. Domingos and M. Richardson, "Markov Logic: A Unifying Framework for Statistical Relational Learning," Proceeding of the ICML Workshop on Statistical Relational Learning and Its Connecyions to Other Fields, pages 49-54, 2004.
 L. Getoor, "Link Mining: A New Data Mining Challenge," SIGKDD Explorations Newsletter, 5(1): 84-89, July 2003.
 L. Getoor and C. P. Diehl, "Link Mining: A Survey," SIGKDD Explorations Newsletter, 7(2): 3-12, December 2005.
 L. Getoor, N. Friedman, D. Koller, and B. Taskar, "Learning Probabilistic Models of Link Structure," Machine Learning Research, 3: 679-707, January 2003.
 D. Gibson, J. Kleinberg, and P. Raghavan, "Inferring Web Communities from Link Topology," Proceeding of the 9th ACM Conference on Hypertext and hypermedia, pp. 225-234, 1998.
 J. Han, M. Kamber, and J. Pei, "Data Mining: Concepts and Techniques," Morgan Kaufmann Publishers, 2011.
 J. Han, Y. Sun, X. Yan, and P. S. Yu, "Mining Knowledge from Data: An Information Network Analysis Approach," Proceeding of the 28th IEEE International Conference on Data Engineering, pp. 1214-1217, April 2012.
 M. A. Hasan, V. Chaoji, S. Salem, and M. Zaki, "Link Prediction Using Supervised Learning," Proceeding of the Workshop on Link Analysis, 2006.
 J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha, "Mining Protein Family Specific Residue Packing Patterns from Protein Structure Graphs," Proceeding of the 8th ACM International Conference on Resaerch in Computational Molecular Biology, pp. 308-315, March 2004.
 A. Inokuchi, T. Washio, and H. Motoda, "An Apriori-Based Algorithm for Mining Frequent Substructures from Graph Data," Proceeding of the 4th European Conference on Principles of Data Mining and Knowledge Discovery, pp. 13-23, 2000.
 L. Katz, "A New Status Index Derived from Sociometric Analysis," Psychometrika, 18(1): 39-43, March 1953.
 D. G. Kleinbaum and M. Klein, "Introduction to Logistic Regression," Logistic Regression: A Self-Learning Text, Springer New York, pp. 1-39, 2010.
 X. Kong, P. S. Yu, Y. Ding, and D. J. Wild, "Meta Path-Based Collective Classification in Heterogeneous Information Networks," Proceeding of the 21st ACM International Conference on Information and Knowledge Management, pp. 1567-1571, October 2012.
 M. Kuramochi and G. Karypis, "Frequent Subgraph Discovery," Proceeding of the IEEE International Conference on Data Mining, pp. 313-320, December 2001.
 V. Leroy, B. B. Cambazoglu, and F. Bonchi, "Cold Start Link Prediction," Proceeding of the 16th ACM International Conference on Knowledge Discovery and Data Mining, pp. 393-402, May 2010.
 D. Liben-Nowell and J. Kleinberg, "The Link Prediction Problem for Social Networks," Proceeding of the 12th ACM International Conference on Information and Knowledge Management, pp. 556-559, November 2003.
 D. Liben-Nowell and J. Kleinberg, "The Link-Prediction Problem for Social Networks," Journal of the American Society for Information Science and Technology, 58(7): 1019-1031, May 2007.
 R. N. Lichtenwalter, J. T. Lussier, and N. V. Chawla, "New Perspectives and Methods in Link Prediction," Proceeding of the 16th ACM International Conference on Knowledge Discovery and Data Mining, pp. 243-252, July 2010.
 M. E. J. Newman, "Fast Algorithm for Detecting Community Structure in Networks," Physical Review E, 69(6): 066133, June 2004.
 M. E. J. Newman and M. Girvan, "Finding and Evaluating Community Structure in Networks," Physical Review E, 69(2): 026113, February 2004.
 J. O'Madadhain, J. Hutchins, and P. Smyth, "Prediction and Ranking Algorithms for Event-Based Network Data," ACM SIGKDD Explorations Newsletter, 7(2): 23-30, December 2005.
 J. O'Madadhain, P. Smyth, and L. Adamic, "Learning Predictive Models for Link Formation," Proceeding of the Presented at the International Sunbelt Social Network Conference, February 2005.
 A. Popescul and L. H. Ungar, "Statistical Relational Learning for Link Prediction," Proceeding of the Workshop on Learning Statistical Models from Relational DataI, August 2003.
 M. J. Rattigan and D. Jensen, "The Case for Anomalous Link Discovery," ACM SIGKDD Explorations Newsletter, 7(2): 41-47, December 2005.
 J. Scott and P. J. Carrington, "The SAGE Handbook of Social Network Analysis," SAGE Publications, 2011.
 Y. Sun and J. Han, "Meta-Path-Based Search and Mining in Heterogeneous Information Networks," Tsinghua Science and Technology, 18(4): 329-338, August 2013.
 Y. Sun and J. Han, "Mining Heterogeneous Information Networks: A Structural Analysis Approach," ACM SIGKDD Explorations Newsletter, 14(2): 20-28, December 2013.
 Y. Sun, J. Han, C. C. Aggarwal, and N. V. Chawla, "When Will It Happen?: Relationship Prediction in Heterogeneous Information Networks," Proceeding of the 5th ACM International Conference on Web Search and Data Mining, pp. 663-672, February 2012.
 Y. Sun, J. Han, X. Yan, P. S. Yu, and T. Wu, "Pathsim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks," Proceeding of the International Conference on Very Large Data Bases, 4(11): 992-1003, August 2011.
 Y. Sun, B. Norick, J. Han, X. Yan, P. S. Yu, and X. Yu, "Integrating Meta-Path Selection with User-Guided Object Clustering in Heterogeneous Information Networks," Proceeding of the 18th ACM International Conference on Knowledge Discovery and Data Mining, pp. 1348-1356, August 2012.
 B. Taskar, M. F. Wong, P. Abbeel, and D. Koller, "Link Prediction in Relational Data," Proceeding of the Neural Information Processing Systems, pp. 659-666, December 2003.
 T. Tylenda, R. Angelova, and S. Bedathur, "Towards Time-Aware Link Prediction in Evolving Social Networks," Proceeding of the 3rd Workshop on Social Network Mining and Analysis, pp. 1-10, June 2009.
 Y. Sun, R. Barber, M. Gupta, C. C. Aggarwal, and J. Hani, "Co-Author Relationship Prediction in Heterogeneous Bibliographic Networks," Proceeding of the International Conference on Advances in Social Networks Analysis and Mining, pp. 121-128, July 2011.
 X. Yu, Y. Sun, B. Norick, T. Mao, and J. Han, "User Guided Entity Similarity Search Using Meta-Path Selection in Heterogeneous Information Networks," Proceeding of the 21st ACM International Conference on Information and Knowledge Management, pp. 2025-2029, October 2012.