||Feature-based Characterization and Clustering of Social Networks
||Institute of Computer & Communication
Social Network Analysis
How do we distinguish one type of social networks from another if topologies are the only available information? In this thesis, we proposed an approach to characterize social networks with techniques widely used in the social network analysis. These features are computed only based on the given topologies. With the aid of proposed characteristics, classification can then be performed between different types of networks. Our experiments show that a high accuracy can be achieved based on the proposed method. The approach can be used for advertisement distribution system, recommendation systems, and DGA-based botnet detection systems. Experiment also shows that the proposed system can be applied to anomaly detection system.
Table of Contents iii
List of Tables iv
List of Figures v
Chapter 1 Introduction 1
1.1 Overview 1
1.1.1 Definition and Representation of a Network 3
1.1.2 Centrality Measures 4
1.2 Motivation 9
1.3 Objective 10
1.4 Outline 10
Chapter 2 Related Work and Literature Review 11
2.1 Graph Matching Problem 11
2.2 Statistical Classification of Social Networks 15
Chapter 3 System Architecture and Design 16
3.1 System Overview 16
3.2 Feature Selection 17
3.3 Weight Setting 18
3.4 Clustering 20
3.4.1 K-means Algorithm 20
Chapter 4 Experiment Results 23
4.1 Data Sets 23
4.2 Data Preprocessing 24
4.3 Results 26
4.3.1 Classification Between Social Networks about Politics 26
4.3.2 Classification Between Different Social Networks 29
4.3.3 Detecting the Change of Network Patterns 30
4.3.4 DGA-based Botnet Detection 35
Chapter 5 Conclusion and Future work 38
 A. Passarella, R. I. Dunbar, M. Conti,and F. Pezzoni, "Ego network models for future internet social networking environments," Computer Communications, Vol. 35, No. 18, pp. 2201-2217, 2012.
 Belnet History, https://www.belnet.be/en/about-us/history.
 C. Biemann, "Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems," Proceedings of the first workshop on graph based methods for natural language processing, 2006.
 D. Fay, H. Haddadi, A. Thomason, A. W. Moore, R. Mortier, A. Jamakovic, S. Uhlig, and M. Rio, "Weighted Spectral Distribution for Internet Topology Analysis: Theory and Applications," Networking, IEEE/ACM Transactions on , vol.18, no.1, pp.164-176, Feb. 2010.
 D. Liben‐Nowell and J. Kleinberg, "The link‐prediction problem for social networks,” Journal of the American society for information science and technology, pp. 1019-1031, 2007.
 E. Serin and S. Balcisoy, "Entropy Based Sensitivity Analysis and Visualization of Social Networks," Advances in Social Networks Analysis and Mining (ASONAM), 2012 IEEE/ACM International Conference on , vol., no., pp.1099,1104, 26-29, Aug. 2012.
 Foreseeing Innovative New Digiservices (FIND), http://www.find.org.tw/find/home.aspx?page=many&id=390
 G. Gu, R. Perdisci, J. Zhang, and W. Lee, "BotMiner: Clustering Analysis of Network Traffic for Protocol-and Structure-Independent Botnet Detection," USENIX Security Symposium, Vol. 5, No. 2, pp. 139-154, 2008.
 G. Guofei, J. Zhang, and W. Lee, "BotSniffer: Detecting botnet command and control channels in network traffic," Network and Distributed System Security Symposium, 2008.
 H. Bunke and K. Shearer, "A graph distance metric based on the maximal common subgraph" Pattern recognition letters, pp. 255-259, 1998.
 H. Jiang, H. Wang, P. S. Yu, and S. Zhou, "Gstring: A novel approach for efficient search in graph databases," Proceedings of the International Conference on Data Engineering, pp 566-575, 2007.
 H. Yu, M. Kaminsky, P. B. Gibbons, A. Flaxman, "Sybilguard: defending against sybil attacks via social networks," ACM SIGCOMM Computer Communication Review, Vol. 36, No. 4, pp. 267-278, 2006.
 J. J. McGregor, "Backtrack search algorithms and the maximal common subgraph problem." Software: Practice and Experience, pp. 23-34, 1982.
 J. Leskovec and J. J. Mcauley, "Learning to discover social circles in ego networks," Advances in neural information processing systems, 2012.
 J. MacQueen, "Some methods for classification and analysis of multivariate observations," Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, Vol. 1, No. 14, 1967.
 K. Aoyama, "Advertisement distribution system," U.S. Patent Application 10/952,776, 2005.
 L. A. Adamic and N. Glance, "The political blogosphere and the 2004 US election: divided they blog," Proceedings of the 3rd international workshop on Link discovery, ACM, 2005.
 L. Akoglu, H. Tong, and D. Koutra, "Graph based anomaly detection and Description: a survey," Data Mining and Knowledge Discovery, 2014.
 M. Bastian, S. Heymann, and M. Jacomy, "Gephi: an open source software for exploring and manipulating networks," International AAAI Conference on Weblogs and Social Media, 2009.
 M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos, "Netsimile: A scalable approach to size-independent network similarity," CoRR, Vol. abs/1209.2684, 2012.
 M. E. J. Newman, "Community detection and graph partitioning," Europhysics Letters, 2013.
 M. E. J. Newman, "Finding community structure in networks using the eigenvectors of matrices," Physical review E, 2006.
 M. E. J. Newman, "Mathematics of networks," The New Palgrave Encyclopedia of Economics, Palgrave Macmillan, Basingstoke, 2008.
 M. E. J. Newman, "The structure of scientific collaboration networks," Proceedings of the National Academy of Sciences, pp. 404-409, 2001.
 M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten, "The WEKA Data Mining Software: An Update," SIGKDD Explorations, Volume 11, Issue 1, 2009.
 M. R. Gary and D. S. Johnson, "Computers and Intractability: A Guide to the Theory of NP-Completeness, " WH. Freeman and Co., 1979.
 N. B. Silva, I. R. Tsang, G. D. Cavalcanti, and I. J. Tsang, "A graph-based friend recommendation system using Genetic Algorithm," Evolutionary Computation (CEC), 2010 IEEE Congress on, pp.1,7, 18-23, July 2010.
 P. Papadimitriou, A. Dasdan, H. Garcia-Molina, "Web graph similarity for anomaly detection," Journal of Internet Services and Applications, pp. 19-30, 2010.
 R. Balasubramanyan, F. Lin, and W. W. Cohen, "Node clustering in graphs: An empirical study," NIPS Workshop on Networks Across Disciplines in Theory and Applications, 2010.
 R. Giugno and D. Shasha, "Graphgrep: A fast and universal method for querying graphs," Proceedings of the International Conference on Pattern Recognition, Vol.2, pp. 112-115, 2002.
 T. S. Wang, W. T. Cheng, and H. T. Lin, "Research and implementation of DGA-based Botnet Detection," National Computer Symposium, Taiwan, 2013.
 T. Wang and H. Krim, "Statistical classification of social networks," Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on , pp.3977-3980, 25-30, March 2012.
 T. Yingfei, "Centrality characteristics analysis of urban rail network," Intelligent Rail Transportation (ICIRT), 2013 IEEE International Conference on, pp.285-290, 2013.
 V. Arnaboldi, M. Conti, A. Passarella, F. Pezzoni, "Analysis of Ego Network Structure in Online Social Networks," Privacy, Security, Risk and Trust (PASSAT), 2012 International Conference on and 2012 International Confernece on Social Computing (SocialCom), pp.31-40, 3-5, Sept. 2012.
 V. E. Krebs, http://www.orgnet.com/.
 Wash Univ. BIO5488 lecture, 2004.
 X. Huang, J. Lai, and S. F. Jennings, "Maximum common subgraph: some upper bound and lower bound results," BMC bioinformatics, 2006.
 Y. Matsukawa, "Advertisement distribution system." U.S. Patent Application 09/851,518, 2001.