||Knowledge Evolution with Search Correlation
||Institute of Computer Science and Information Engineering
In this paper, we explore a novel problem, called Knowledge Evolution, to identify timely new knowledge triples. In the literature, the need of knowledge enrichment has been recognized as the key to the success of knowledge-based search. However, previous work of automatic knowledge extraction, such as Google Knowledge Vault, aim at identifying the unannotated knowledge triples from the full web-scale content in the offline execution. However, in our study, we show that most people demand a specific knowledge, such as the marriage between Brad Pitt and Angelina Jolie, soon after the information is announced. Moreover, the number of queries of such knowledge dramatically declines after a few days, meaning that the most people cannot obtain the precise knowledge from the execution of the offline knowledge enrichment. To remedy this, we propose the SCKE framework to extract new knowledge triples which can be executed in the online scenario. We model the ’Query-Click Page’ bipartite graph to extract the query correlation and to identify cohesive pairwise entities, finally statistically identifying the confident relation between entities. Our experimental studies show that new triples can also be identified in the very beginning after the event happens, enabling the capability to provide the up-to-date knowledge summary for most user queries.
List of Tables iv
List of Figures v
1 Introduction 1
2 Related Work 5
3 The SCKE Framework and Algorithms 8
3.1 Cohesive Pairwise-entities Generation 8
3.2 Relation Identification 16
3.3 Triples Classifier 25
4 Experimental Results 30
5 Conclusions 40
6 Bibliography 41
 S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, and Z. G. Ives, “Dbpedia: A nucleus for a web of open data,” in SEMWEB, 2007.
 K. D. Bollacker, C. Evans, P. Paritosh, T. Sturge, and J. Taylor, “Freebase: a collaboratively created graph database for structuring human knowledge,” in ACM SIGMOD,
 A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. R. H. Jr., and T. M. Mitchell, “Toward an architecture for never-ending language learning,” in AAAI, 2010.
 F. M. Suchanek, G. Kasneci, and G. Weikum, “Yago: a core of semantic knowledge,” in WWW, 2007.
 B. Suh, G. Convertino, E. H. Chi, and P. Pirolli, “The singularity is not near: slowing growth of wikipedia,” in Proceedings of the 2009 International Symposium on Wikis, 2009,
Orlando, Florida, USA, October 25-27, 2009, 2009.
 X. Dong, E. Gabrilovich, G. Heitz, W. Horn, N. Lao, K. Murphy, T. Strohmann, S. Sun, and W. Zhang, “Knowledge vault: a web-scale approach to probabilistic knowledge fusion,”
in ACM SIGKDD, 2014.
 V. Leroy, B. B. Cambazoglu, and F. Bonchi, “Cold start link prediction,” in ACM SIGKDD, 2010.
 D. Liben-Nowell and J. M. Kleinberg, “The link prediction problem for social networks,” in CIKM, 2003.
 H. H. Song, T. W. Cho, V. Dave, Y. Zhang, and L. Qiu, “Scalable proximity estimation and link prediction in online social networks,” in ACM SIGCOMM, 2009.
 J. Zhu, Z. Nie, X. Liu, B. Zhang, and J. Wen, “Statsnowball: a statistical approach to extracting entity relationships,” in WWW, 2009.
 P. Domingos, S. Kok, D. Lowd, H. Poon, M. Richardson, P. Singla, M. Sumner, and J. Wang, “Markov logic: A unifying language for structural and statistical pattern recognition,” in SSPR, 2008.
 L. Backstrom and J. Leskovec, “Supervised random walks: predicting and recommending links in social networks,” in WSDM, 2011.
 H. Chen, W. Ku, H. Wang, L. Tang, and M. Sun, “Linkprobe: Probabilistic inference on large-scale social networks,” in IEEE ICDE.
 N. Lao, T. M. Mitchell, and W. W. Cohen, “Random walk inference and learning in A large scale knowledge base,” in EMNLP, 2011.
 N. Lao and W. W. Cohen, “Relational retrieval using a combination of path-constrained random walks,” Machine Learning, vol. 81, no. 1, pp. 53–67, 2010.
 A. Fader, S. Soderland, and O. Etzioni, “Identifying relations for open information extraction,” in EMNLP, 2011.
 F. Niu, C. Zhang, C. R´e, and J. W. Shavlik, “Elementary: Large-scale knowledge-base construction via machine learning and statistical inference,” Int. J. Semantic Web Inf. Syst., vol. 8, no. 3, pp. 42–73, 2012.
 A. Neelakantan and M. Collins, “Learning dictionaries for named entity recognition using minimal supervision,” CoRR, vol. abs/1504.06650, 2015.
 C. Li, A. Sun, J. Weng, and Q. He, “Tweet segmentation and its application to named entity recognition,” IEEE Trans. Knowl. Data Eng., vol. 27, no. 2, pp. 558–570, 2015.
 L. Derczynski, D. Maynard, G. Rizzo, M. van Erp, G. Gorrell, R. Troncy, J. Petrak, and K. Bontcheva, “Analysis of named entity recognition and linking for tweets,” Inf. Process. Manage., vol. 51, no. 2, pp. 32–49, 2015.
 M. Konkol, T. Brychcin, and M. Konop´ık, “Latent semantics in named entity recognition,” Expert Syst. Appl., vol. 42, no. 7, pp. 3470–3479, 2015.
 S. Keretna, C. P. Lim, D. C. Creighton, and K. B. Shaban, “Enhancing medical named entity recognition with an extended segment representation technique,” Computer Methods and Programs in Biomedicine, vol. 119, no. 2, pp. 88–100, 2015.
 A. Demski, V. Ustun, P. S. Rosenbloom, and C. Kommers, “Outperforming word2vec on analogy tasks with random projections,” CoRR, vol. abs/1412.6616, 2014.
 T. Shi and Z. Liu, “Linking glove with word2vec,” CoRR, vol. abs/1411.5595, 2014.
 X. Rong, “word2vec parameter learning explained,” CoRR, vol. abs/1411.2738, 2014.
 Y. Goldberg and O. Levy, “word2vec explained: deriving mikolov et al.’s negative-sampling word-embedding method,” CoRR, vol. abs/1402.3722, 2014.
 R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, “LIBLINEAR: A library for large linear classification,” Journal of Machine Learning Research, vol. 9, 2008.
 S. Lee, H. Lee, P. Abbeel, and A. Y. Ng, “Efficient L1 regularized logistic regression,” in AAAI, 2006.