||Cyber-Attack Detection and Defense Based on Spectral Analysis and Community Structure Recognition
||Institute of Computer & Communication
Domain Generation Algorithm
Botnet Detection Mechanism
網際網路的便利性讓使用者忽略了其背後許多資訊安全層面的議題，對於網路服務的過度依賴讓使用者經常曝露在未經保護的網路環境中，而在眾多資安議題中，殭屍網路 (Botnet)透過網際網路進行垃圾郵件散佈、竊取隱私資料、架設釣魚網站、散播惡意程式以及分散式服務阻斷等攻擊惡意行為，儼然成為具有高度威脅性的資訊安全隱憂。然而隨著資安防護能力的提升，驅使著殭屍網路產生多種不同規避偵測的型態以提高存活率，其中網域生成演算法 (Domain Generation Algorithm， DGA) 型態之殭屍網路透過演算法產生大量的備選控制網域以混淆偵查，大幅提升殭屍網路存活機率並增加傳統防禦機制辨識上之困難度，是目前相當主流的殭屍網路型態之一。另一方面，社群網站的興盛除了吸引一般使用者之外也引起惡意攻擊者的注意，其中女巫攻擊 (Sybil Attack) 透過佈建大量虛假帳號以癱瘓系統是目前新興的社群網路攻擊行為之一。而不論是DGA殭屍網路之連線行為或是女巫攻擊中虛假帳號與真實帳號的連結關係，如將之對應到網路拓樸上時均可發現兩者都有特殊的社群結構。因此，本研究提出一基於頻譜分析之分群演算法，針對網路群體進行分群以配合後續之偵測機制，與過往分群演算法不同之處在於，本研究提出的分群演算法不需要事先確認分群數量便可以自適應式根據群體結構產生最合適之分群結果。基於此一分群演算法，針對DGA型態殭屍網路提出一偵防機制，此偵防機制透過觀察與分析DGA型態殭屍網路在網路流量中的行為特徵，利用受感染主機與控制中心通聯時所產生之大量失效查詢流量進行辨識。研究結果顯示本偵測機制可以佈建於真實網路環境中，並且準確偵測出已知與未知的DGA型態殭屍網路。另一方面，針對女巫攻擊，本研究提出一偵防機制，此方法透過群體結構特性進行辨識，並改善過往研究須事先知曉系統中哪些是正常節點方可進行偵測的缺點，本偵防機制無此限制即可進行有效的偵測，並且透過真實社群資料進行驗證，研究結果顯示本機制確實能夠在未知正常節點的情況下有效的偵測女巫攻擊下之假帳號。
With the ever-growing number of online services nowadays, and the proliferation of wireless access services, more and more users are connecting to the Internet. However, the increasing reliance on the Internet and associated network services exposes users to the risk of malicious attacks by third parties intent on causing short-term disruption or more serious long-term damage. Among the various network security concerns, botnets are regarded as one of the leading threats to network security, and are used to conduct a wide range of malicious activities, including information theft, phishing, spam mail distribution, and Distributed Denial of Service (DDoS) attacks. Of the various forms of botnet, DGA-based botnets, which utilize a Domain Generation Algorithm (DGA) to avoid detection, are one of the most disruptive and difficult to detect. In addition to botnets, attacks on social network sites have also emerged as a major concern in recent years. One of the most common and harmful types of attack is the Sybil attack, in which the attacker creates multiple identities and uses these identities to breech a running system with fake information. Although botnets and Sybil attacks are both difficult to detect, they leave behind several important clues which can be used to identify their presence. For example, when mapping the communication patterns of a botnet, or the relationships among the sybil nodes and the honest nodes, on to a graph, the graph shows a unique characteristic in terms of the community structure. Accordingly, this dissertation proposes a clustering algorithm for detecting the community structure of cyber-attacks. More specifically, to address the problem of DGA-based botnets, a scheme is proposed for detecting botnet activity by analyzing the query behavior of the DNS traffic. The proposed scheme exploits the fact that hosts compromised by the same DGA-based malware query the same sets of domains in the domain list and most of these queries fail since only a very small number of the domains are actually associated with an active C&C. The evaluation results show that the proposed scheme provides an accurate and effective means of detecting both existing and new DGA-based botnet patterns in real-world networks. To counter the problem of Sybil attacks, the dissertation additionally proposes a defense mechanism based on the characteristic structural properties of honest and sybil groups. Notably, in contrast to most existing Sybil defense schemes, which require a knowledge of at least one honest node in advance, the scheme proposed in this dissertation has the ability to detect sybil groups in a network without the need for any prior knowledge regarding the honest nodes. The performance of the proposed defense scheme is evaluated using data obtained from a real-world social network (Facebook). The results show that the proposed scheme has the ability to detect Sybil attacks in real social networks with a low false positive ratio.
Chinese Abstract i
List of Tables ix
List of Figures x
Chapter 1 1
1.1. Cyber-Attacks: An Overview 1
1.2. Research Motivation 4
1.2.1. DGA-based Botnet Detection 4
1.2.2. Sybil Attack Detection 5
1.2.3. Clustering Algorithms 7
1.3. Objectives and Outline of Dissertation 9
1.3.1. Objectives 9
1.3.2. Dissertation Outline 13
Chapter 2 14
Weighted-Spectral Clustering Algorithm 14
2.1. Weighted-Spectral Distribution Metric 15
2.2. Weighted-Spectral Clustering Algorithm 20
2.3. Performance Evaluation of WSCA 25
2.3.1. Evaluation with Computer-Generated Graphs 26
2.3.2. Evaluation with Real-World Networks 29
2.3.3. Benchmark Evaluations 36
2.4. Summary 38
Chapter 3 39
DGA-based Botnet Detection Mechanism 39
3.1. Overview of Botnet Detection Mechanisms 40
3.2. Architecture of DBod 43
3.2.1. Filtering Module 44
3.2.2. Clustering Module 46
3.2.3. Group Identification Module 47
3.3. DBod Design 47
3.3.1. Definitions and Notations 48
3.3.2. Filtering Module 49
3.3.3. Clustering Module 50
3.3.4. Group Identification Module 50
3.4. Performance Evaluation of DBod 53
3.4.1. Overview of Tainan City Education Network Dataset 54
3.4.2. Evaluation Using Computer-Generated Malicious Traffic 55
3.4.3. Evaluation Performance Using Real-World DNS Traffic 57
3.5. Summary 74
Chapter 4 75
Sybil Attack Defense Mechanism 75
4.1. Overview of Existing and Proposed Sybil Attack Defense Mechanisms 76
4.2. SybilCurb Design 78
4.2.1. Definitions and Notations 79
4.2.2. Clustering Module 80
4.2.3. Group Detection Module 80
4.3. Performance Evaluation of SybilCurb 84
4.3.1. Evaluation Dataset and Setup 84
4.3.2. Evaluation in Environment with No Sybil Nodes 85
4.3.3. Evaluation in Environments with Malicious Users 88
4.3.4. Discussion 98
4.4. Summary 102
Chapter 5 103
 2016 Symantec Internet Security Threat Report, [Online]. Available: https://www.symantec.com/security-center/threat-report, 2016
 J. Goebel and T. Holz, "Rishi: Identify bot contaminated hosts by IRC nickname evaluation," in first conference on First Workshop on Hot Topics in Understanding Botnets, 2007.
 W. Salusky and R. Danford, "Know your enemy: Fast-flux service networks," in The Honeynet Project, pp. 1-24, 2007.
 M. Abu Rajab, J. Zarfoss, F. Monrose, and A. Terzis, "A multifaceted approach to understanding the botnet phenomenon," in 6th ACM SIGCOMM on Internet measurement, pp. 41-52, 2006.
 G. Gu, P. A. Porras, V. Yegneswaran, M. W. Fong, and W. Lee, "Bothunter: Detecting malware infection through ids-driven dialog correlation," in USENIX Security Symposium, pp. 1-16, 2007.
 G. Gu, J. Zhang, and W. Lee, "BotSniffer: Detecting botnet command and control channels in network traffic”, in 15th Annual Network and Distributed System Security Symposium, 2008.
 G. Gu, R. Perdisci, J. Zhang, and W. Lee, "BotMiner: Clustering Analysis of Network Traffic for Protocol-and Structure-Independent Botnet Detection," in USENIX Security Symposium, pp. 139-154, 2008.
 B. Stone-Gross, M. Cova, L. Cavallaro, B. Gilbert, M. Szydlowski, R. Kemmerer, C. Kruegel, and G. Vigna, "Your botnet is my botnet: analysis of a botnet takeover," in 16th ACM conference on Computer and communications security, pp. 635-647, 2009.
 S. Yadav, A. K. K. Reddy, and S. Ranjan, "Detecting algorithmically generated domain-flux attacks with DNS traffic analysis," IEEE/ACM Transactions on Networking, vol. 20, pp. 1663-1677, 2012.
 P. Porras, "Inside risks reflections on Conficker," Communications of the ACM, vol. 52, pp. 23-24, 2009.
 P. Porras, H. Saidi, and V. Yegneswaran, "An analysis of conficker's logic and rendezvous points," Computer Science Laboratory, SRI International, Tech. Rep, 2009.
 S. Shevchenko, "Domain name generator for murofet." [Online]. Available: http://blog.threatexpert.com/2010/10/domain-name-generator-for-murofet.html, 2010.
 J. R. Douceur, "The Sybil attack," in Peer-to-peer Systems, pp. 251-260, 2002.
 C. R. Davis, J. M. Fernandez, S. Neville, and J. McHugh, "Sybil attacks as a mitigation strategy against the storm botnet," in 3rd International Conference on Malicious and Unwanted Software, pp. 32-40, 2008.
 R. Bhattacharjee and A. Goel, "Avoiding ballot stuffing in ebay-like reputation systems," in ACM SIGCOMM workshop on Economics of peer-to-peer systems, pp. 133-137, 2005.
 P. Judge, "Net vote rigging illustrates importance of Web services," [Online]. Available: http://www.zdnet.com/article/net-vote-rigging-illustrates-importance-of-web-services/, 2002
 M. Bianchini, M. Gori, and F. Scarselli, "Inside pagerank," ACM Transactions on Internet Technology, vol. 5, pp. 92-128, 2005.
 H. Choi, H. Lee, and H. Kim, "BotGAD: detecting botnets by capturing group activities in network traffic," in Fourth International ICST Conference on communication System software and middleware, p. 2-9, 2009.
 R. Villamarín-Salomón and J. C. Brustoloni, "Bayesian bot detection based on DNS traffic similarity," in ACM symposium on Applied Computing, pp. 2035-2041, 2009.
 S. Yadav, A. K. K. Reddy, A. L. Reddy, and S. Ranjan, "Detecting algorithmically generated malicious domain names," in 10th ACM SIGCOMM conference on Internet measurement, pp. 48-61, 2012.
 S. Yadav and A. L. Reddy, "Winning with DNS failures: Strategies for faster botnet detection," Security and privacy in communication networks, pp. 446-459, 2012.
 L. Bilge, E. Kirda, C. Kruegel, and M. Balduzzi, "EXPOSURE: Finding Malicious Domains Using Passive DNS Analysis," in 18th Annual Network and Distributed System Security Symposium, 2011.
 M. Antonakakis, R. Perdisci, Y. Nadji, N. Vasiloglou Ii, S. Abu-Nimeh, W. Lee, and D. Dagon, "From Throw-Away Traffic to Bots: Detecting the Rise of DGA-Based Malware," in USENIX Security Symposium, pp. 491-506, 2012.
 P. V. Mockapetris, "Domain names-concepts and facilities," 1987.
 J. Lee, J. Kwon, H.-J. Shin, and H. Lee, "Tracking multiple C&C botnets by analyzing DNS traffic," in 6th IEEE Workshop on Secure Network Protocols, pp. 67-72, 2010.
 H. Yu, P. B. Gibbons, M. Kaminsky, and F. Xiao, "Sybillimit: A near-optimal social network defense against Sybil attacks," in IEEE Symposium on Security and Privacy, pp. 3-17, 2008.
 H. Yu, M. Kaminsky, P. B. Gibbons, and A. D. Flaxman, "Sybilguard: defending against Sybil attacks via social networks," IEEE/ACM Transactions on Networking, vol. 16, pp. 576-589, 2008.
 V. Gruhn, M. Hülder, and V. Wolff-Marting, "Utilizing Social Networking Platforms to Support Public Key Infrastructures," in SECRYPT, pp. 245-250, 2007.
 G. Danezis and P. Mittal, "SybilInfer: Detecting sybil Nodes using Social Networks," in 16th Annual Network and Distributed System Security Symposium, 2009.
 W. Wei, F. Xu, C. C. Tan, and Q. Li, "Sybildefender: Defend against Sybil attacks in large social networks," in IEEE INFOCOM, pp. 1951-1959, 2012.
 L. Shi, S. Yu, W. Lou, and Y. T. Hou, "Sybilshield: An agent-aided social network-based sybil defense among multiple communities," in IEEE INFOCOM, pp. 1034-1042, 2013.
 A. Mohaisen, A. Yun, and Y. Kim, "Measuring the mixing time of social graphs," in 10th ACM SIGCOMM on Internet measurement, pp. 383-389, 2010.
 S. Wasserman and K. Faust, “Social network analysis: Methods and applications,” Cambridge university press, vol. 8, 1994.
 D. J. Watts and S. H. Strogatz. “Collective dynamics of ‘small-world’networks,” nature, vol. 393(6684), pp. 440-442, 1998.
 J. Scott, “Social network analysis: developments, advances, and prospects,” Social network analysis and mining, vol. 1(1), pp. 21-26, 2011.
 L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley, “Classes of small-world networks,” National Academy of Sciences, vol. 97(21), pp. 11149-11152, 2000.
 M. A. Figueiredo, and A. K. Jain, “Unsupervised learning of finite mixture models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24(3), pp. 381-396, 2002.
 R. Tibshirani, G. Walther, and T. Hastie, “Estimating the number of clusters in a data set via the gap statistic,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 63(2), pp. 411-423, 2001.
 D. Pelleg, and A. W. Moore, “X-means: Extending K-means with Efficient Estimation of the Number of Clusters,” in ICML, pp. 727-734, 2000.
 M. C. Nascimento, and L. Pitsoulis, “Community detection by modularity maximization using GRASP with path relinking,” Computers & Operations Research, vol. 40(12), pp. 3121-3131, 2013.
 Z. Li, “A Non-MCMC Procedure for Fitting Dirichlet Process Mixture Models,” Doctoral dissertation, University of Saskatchewan, 2012.
 M. E. Newman, “Fast algorithm for detecting community structure in networks,” Physical review E, vol. 69(6), 2004.
 A. Clauset, M. E. Newman, and C. Moore, “Finding community structure in very large networks,” Physical review E, vol. 70(6), 2004.
 B. Xiang, E. H. Chen, and T. Zhou, “Finding community structure based on subgraph similarity,” Complex Networks, pp. 73-81, 2009.
 M. C. Nascimento, and L. Pitsoulis, “Community detection by modularity maximization using GRASP with path relinking,” Computers & Operations Research, vol. 40(12), pp. 3121-3131, 2013.
 B. H. Good, Y. A. de Montjoye, and A. Clauset, “Performance of modularity maximization in practical contexts,” Physical Review E, vol. 81(4), 2010.
 M. Girvan, and M. E. Newman, “Community structure in social and biological networks,” National Academy of Sciences, vol. 99(12), pp. 7821-7826, 2002.
 T. S. Wang, H. T. Lin, and P. Wang, “Weighted-spectral clustering algorithm for detecting community structures in complex networks,” Artificial Intelligence Review, 2016. doi:10.1007/s10462-016-9488-4
 T. S. Wang, H. T. Lin, W. T. Cheng, and C. Y. Chen, “DBod: clustering and detecting DGA-based botnets using DNS traffic analysis,” Computers & Security, vol. 64, pp. 1-15, 2017.
 M. S. Granovetter, "The strength of weak ties," American journal of sociology, pp. 1360-1380, 1973.
 P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti, "On Facebook, most ties are weak," Communications of the ACM, vol. 57, pp. 78-84, 2014.
 E. Ferrara, P. De Meo, G. Fiumara, and A. Provetti, "The role of strong and weak ties in Facebook: a community structure perspective," Preprint at http://arXiv. org/abs/1203.0535.
 V. Pareto, “Cours d'économie politique: Librairie Droz,” 1964.
 D. Fay, H. Haddadi, A. Thomason, A. W. Moore, R. Mortier, A. Jamakovic, and M. Rio, “Weighted spectral distribution for internet topology analysis: theory and applications,” IEEE/ACM Transactions on Networking, vol. 18(1), pp. 164-176, 2010.
 L. Jure, and K. Andrej, SNAP Datasets: Stanford Large Network Dataset Collection. http://snap.stanford.edu/data/, 2014.
 B. Mohar, and Y. Alavi, “The Laplacian spectrum of graphs,” Graph theory, combinatorics, and applications, vol. 2, pp. 871-898, 1991.
 F. R. Chung, “Spectral graph theory,” American Mathematical Soc., vol. 92, 1997.
 K. Wehmuth, A. T. A. Gomes, A. Ziviani, and A. P. C. Da Silva, “On the joint dynamics of network diameter and spectral gap under node removal,” in Latin-American Workshop on Dynamic Networks, 2010.
 K. Wehmuth, and A. Ziviani, “Distributed location of the critical nodes to network robustness based on spectral analysis,” in IEEE 7th Latin American Network Operations and Management Symposium, 2011.
 B. M. Waxman, “Routing of multipoint connections,” IEEE Journal on Selected Areas in Communications, vol. 6(9), pp. 1617-1622, 1988.
 A. L. Barabási, and R. Albert, “Emergence of scaling in random networks,” Science, vol. 286(5439), pp. 509-512, 1999.
 W. W. Zachary, “An information flow model for conflict and fission in small groups,” Journal of anthropological research, pp. 452-473, 1977.
 D. Lusseau, “The emergent properties of a dolphin social network,” Royal Society of London. Series B: Biological Sciences, vol. 270(2), pp.186-188, 2003.
 D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S. M. Dawson, “The bottlenose dolphin community of Doubtful Sound features a large proportion of long-lasting associations,” Behavioral Ecology and Sociobiology, vol. 54(4), pp. 396-405, 2003.
 D. E. Knuth, “The Stanford GraphBase: a platform for combinatorial computing," Reading: Addison-Wesley, vol. 37, 1993.
 T. S. Evans, “Clique graphs and overlapping communities,” Journal of Statistical Mechanics: Theory and Experiment, P12037, 2010.
 J. Leskovec, and J. J. Mcauley, “Learning to discover social circles in ego networks,” Advances in neural information processing systems, pp. 539-547, 2012.
 J. Leskovec, D. Huttenlocher, and J. Kleinberg, “Predicting positive and negative links in online social networks,” in 19th international conference on World wide web, pp. 641-650, 2010.
 C. Biemann, “Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems,” in first workshop on graph based methods for natural language processing, pp. 73-80, 2006.
 S. M. Van Dongen, “Graph clustering by flow simulation,” Ph.D. Thesis, Dutch National Research Institute for Mathematics and Computer Science, University of Utrecht, Netherlands, 2001.
 D. LaSalle, and G. Karypis, “Multi-threaded modularity based graph clustering using the multilevel paradigm,” Journal of Parallel and Distributed Computing, vol. 76, pp. 66-80, 2015.
 S. S. Silva, R. M. Silva, R. C. Pinto, and R. M. Salles, "Botnets: A survey," Computer Networks, vol. 57(2), pp.378-403, 2013.
 Y. Kugisaki, Y. Kasahara, Y. Hori, K. Sakurai, "Bot detection based on traffic analysis," in International Conference on Intelligent Pervasive Computing, pp. 303–306, 2007.
 P. Wurzinger, L. Bilge, T. Holz, J. Goebel, C. Kruegel, E. Kirda, "Automatically generating models for botnet detection," M. Backes, P. Ning (Eds.), Computer Security – ESORICS, Lecture Notes in Computer Science, vol. 5789, pp. 232–249, 2009.
 J. Liu, Y. Xiao, K. Ghaboosi, H. Deng, J. Zhang, "Botnet: classification, attacks, detection, tracing, and preventive measures," EURASIP journal on wireless communications and networking, pp. 1184-1187, 2009.
 W. Lu, A. Ghorbani, “Botnets detection based on IRC-Community,” in IEEE GLOBECOM, pp. 1–5, 2008.
 E. Stalmans and B. Irwin, "A framework for DNS based detection and mitigation of malware infections on a network," Information Security South Africa, pp. 1-8, 2011.
 M. Feily, A. Shahrestani, and S. Ramadass, "A survey of botnet and botnet detection," in Third International Conference on Emerging Security Information, Systems and Technologies, pp. 268-273, 2009.
 Y. L. Zhou, Q. S. Li, Q. Miao, and K. Yim, "DGA-Based Botnet Detection Using DNS Traffic," Journal of Internet Services and Information Security, vol. 3, pp. 116-123, 2013.
 S. Schiavoni, F. Maggi, L. Cavallaro, and S. Zanero, "Phoenix: DGA-based botnet tracking and intelligence," in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 192-211, 2014.
 M. Mowbray and J. Hagen, "Finding domain-generation algorithms by looking at length distribution," in IEEE International Symposium on Software Reliability Engineering Workshops, pp. 395-400, 2014.
 R. Sharifnya and M. Abadi, "DFBotKiller: domain-flux botnet detection based on the history of group activities and failures in DNS traffic," Digital Investigation, vol. 12, pp. 15-26, 2015.
 The Spamhaus Project, [Online]. Available: http://www.spamhaus.org/
 Barracuda Reputation Block List, [Online]. Available: http://www.barracudacentral.org/
 SpamCop, [Online]. Available: https://www.spamcop.net/
 The Abusive Hosts Blocking List, [Online]. Available: http://www.ahbl.org/node
 Open Malware – Community Malicious Code Research and Analysis, [Online]. Available: http://www.offensivecomputing.net/
 Virus Share, [Online]. Available: https://virusshare.com/
 Virus Total, [Online]. Available: https://www.virustotal.com/
 V. N. Bruce., DNS-Based DDoS: Diverse Options for Attackers. [Online]. Available: http://www.circleid.com/posts/20150415_dns_based_ddos_diverse_options_for_attackers/
 ScorecardResearch, [Online]. Available: https://www.scorecardresearch.com/
 McAfee secure, [Online]. Available: http://www.mcafee.com/
 R. Levien and A. Aiken, "Attack-Resistant Trust Metrics for Public Key Certification," in 7th USENIX Security Symposium, 1998.
 G. Danezis, C. Lesniewski-Laas, M. F. Kaashoek, and R. Anderson, "Sybil-resistant DHT routing," Computer Security, pp. 305-318, 2005.
 N. Tran, J. Li, L. Subramanian, and S. S. M. Chow, "Optimal sybil-resilient node admission control," in IEEE INFOCOM, pp. 3218-3226, 2011.
 Z. Yang, C. Wilson, X. Wang, T. Gao, B. Y. Zhao, and Y. Dai, "Uncovering social network sybils in the wild," ACM Transactions on Knowledge Discovery from Data, vol. 8, pp. 2, 2014.
 D. Coppersmith, and S. Winograd, “Matrix multiplication via arithmetic progressions,” in nineteenth annual ACM symposium on Theory of computing, pp.1-6, 1987.