||Methods of Data Analytics for Social Media by Considering Social Interaction Patterns and Post Evolving Trends
||Institute of Computer Science and Information Engineering
post trend forecast
Social media provides a platform for people to share their life experiences, for grouping friends together, for advertising business products, and for governments to announce information. With the various services provided by social media, the number of users continues to gradually increase. However, determining how best to understand users’ demands and perspectives from their information sharing in social media is an important challenge. Therefore, this dissertation focuses on the development of data analytic technologies based on the implicit information of post relative content and user behaviours hiding in social media.
Firstly, the sentiment analysis method related to social media users based on integrating their posts of textual opinion and social interactions is proposed herein. With this method, a social opinion graph which indicates users’ social actions and relationships is constructed; the sentiment guiding matrix denotes the influential strength between users’ sentiments, the textual sentiment classifier is built for classifying textual opinion, and social enthusiasm is considered as the care degree between two users. The method is applied to the real cases of Taiwan’s presidential election and hot social issues. The experimental results of the integrated sentiment classification achieve better accuracy compared to previous research for both cases.
Secondly, the early forecast method of post-based evolution trajectory (TEF) on social media is presented herein, In model generation phase, the historical data of post forwarding and responding activities on social media are collected in consecutive time for forming the post-based evolution trajectory. Then, the classification function of each trend type is defined for classifying post-based evolution trajectory which is then labeled as one of the trend types, the L-type, B-type, D-type and G-type. The post-based evolution trajectory model is then generated by random forests for further forecasting. In trajectory forecast phase, the real-time data of post forwarding and responding activities are collected in consecutive time for forming the target trajectory, which is forecasted according to the stages of the classification estimation, correlation evaluation and distance calculation between the target trajectory and post-based evolution trajectory model. TEF is applied on the social media posts, achieving excellent performance in trend type forecasting and forecasting the trend type into four classifications, while also reducing the time consumed in data processing.
Thirdly, the integrated social data analytics hub is built with an integrated content viewer and social data analytic services. In this hub, user’s files, which are stored in different social media platforms are backed up in the local network attached storage. The files such as video, pictures or document stored in user’s multiple social media spaces are managed only through the viewer of unified file management. Moreover, the viewer of private social network is constructed for sharing files with close friends and family instead of in public social media spaces. Furthermore, the microblog information in different social media platforms displays at a time through the viewer of personal social media services. In this viewer, three social data analytic services are provided including hotly discussed post analysis, sentiment analysis and resonance user mining. The experimental results show that the hub builds an environment for managing cross-platform information through the integrated content viewer easily, and provides various social data analytic services.
List of Tables X
List of Figures XII
1. Introduction 1
1.1 Background 1
1.2 Motivation 2
1.3 Contribution 4
1.4 Organization 6
2. Related Work 7
2.1 Characters of Microblog 7
2.2 User-Generated Content on Social Media 8
2.3 Textual Opinion Mining of Social Media Content 9
2.4 Non-Textual Social Information Analysis 11
2.5 Time Series Data Analytics 11
3. Integrated Sentiment Analysis from Textual Content and Social Interactions 16
3.1 Data Collection and Social Opinion Graph 17
3.1.1 Social Opinion Graph Construction 17
3.1.2 Problem Formulation 19
3.2 Training Phase and Textual Sentiment Classification 20
3.2.1 Sentiment Guiding Matrix Construction 20
3.2.2 Textual Sentiment Classifier 23
3.2.3 User-Level Textual Sentiment Classification 24
3.3 Integrated Sentiment Analysis 25
3.3.1 Emotion Homophily 25
3.3.2 Social Enthusiasm 26
3.3.3 Relaxation Labeling 27
3.3.4 Integrated User-Level Sentiment Classification 28
3.4 Experimental Results 30
3.4.1 2012 Presidential Election in Taiwan 30
3.4.2 2014 Hot Social Issues in Taiwan 40
4. Early Forecast via Post-based evolution trajectory Analytics 49
4.1 Post-based evolution trajectory Model Generation 50
4.1.1 Concrete Problem Setting 50
4.1.2 Post-based evolution trajectory 51
4.1.3 Setting of the Trend Types 52
4.1.4 Post-based evolution trajectory Model Generation 55
4.2 Trajectory Early Forecast Method 58
4.3 Performance Analysis 62
4.3.1 Data Processing 63
4.3.2 Model Generation Results 66
4.3.3 Trajectory Forecast Results 67
5. Integrated Social Data Analytics Hub 74
5.1 System Architecture and Functions 75
5.1.1 System Architecture 75
5.1.2 System Functions 76
5.2 Social Data Analytic Services 77
5.2.1 Hotly Discussed Posts Analysis 78
5.2.2 Sentiment Classification 82
5.2.3 Resonance User Mining 83
5.3 Performance Analysis 86
5.3.1 System Performance 86
5.3.2 Results of Social Data Analytic Services 91
6. Conclusion 93
Author’s Publications 107
[Agr12] Agrawal, A., Kumar, V., Pandey, A. and Khan, I. (2012), “An Application of Time Series Analysis for Weather Forecasting,” Journal of Engineering Research and Application, vol. 2(2), pp. 974-980.
[Aie12] Aiello, L. M., Barrat, A., Schifanella, R., Cattuto, C., Markines, B. and Menczer, F. (2012), “Friendship Prediction and Homophily in Social Media,” ACM Transactions on the Web, vol. 6(2), No. 9, pp. 1-33.
[Ama14] Amaral, F., Tiago, T. and Tiago, F. (2014), “User-generated Content: Tourists' Profiles on TripAdvisor,” International Journal on Strategic Innovative Marketing, vol. 1(3), pp. 137-147.
[Ang06] Angelova, R. and Weikum, G. (2006), “Graph-based Text Classification: Learn from Your Neighbors,” ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 485–492.
[Asi13] Asimakopoulos, S. and Dix, A. (2013), “Forecasting Support Systems Technologies-in-practice: A Model of Adoption and Use for Product Forecasting,” International Journal of Forecasting, vol. 29(2), pp. 322-336.
[Asu11] Asur, S., Huberman, B. A., Szabo, G. and Wang, C. (2011), “Trends in Social Media: Persistence and Decay,” Proceedings of Computing Research Repository.
[Bar14] Barnett, A., Mumtaz, H. and Theodoridis, K. (2014), “Forecasting UK GDP Growth and Inflation under Structural Change. A Comparison of Models with Time Varying Parameters,” International Journal of Forecasting, vol. 30(1), pp. 129-143.
[Bös14] Böse, M., Allen, R. M., Brown, H., Cua, G., Fischer, M., Hauksson, E., Heaten, T. H., Hellweg, M., Liukis, M., Neuhauser, D., Maechling, P. J., Solanki, K., Vinci, M., Henson, I., Khainovski, O., Kuyuk, S., Carpio, M., Meier, M.-A. and Jordan, T. (2014), “CISN ShakeAlert: An Earthquake Early Warning Demonstration System for California,” Early Warning for Geological Disasters, Part of the Series Advanced Technologies in Earth Sciences, Chapter 3, pp. 49-69.
[Box76] Box, G. E. P. and Jenkins, G. M. (1976), “Time Series Analysis: Forecasting and Control,” San Francisco: Holden-Day.
[Bre96] Breiman, L. (1996), “Bagging Predictors,” Machine Learning, vol. 24(2), pp. 123-140.
[Bre01] Breiman, L. (2001), “Random Forests,” Machine Learning, vol. 45(1), pp. 5-32.
[Bro11] Brody, S. and Diakopoulos, N. (2011), “Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using Word Lengthening to Detect Sentiment in Microblogs,” Proceedings Conference on Empirical Methods in Natural Language Processing, Edinburgh, United Kingdom, pp. 562–570.
[Brü06] Brück, T. and Stephan, A. (2006), “Do Eurozone Countries Cheat with their Budget Deficit Forecasts?” KYKLOS, International Review for Social Sciences, vol. 59(1), pp. 3-15.
[Cat10] Cataldi, M., Caro, L. D. and Schifanella, C.(2010), “Emerging Topic Detection on Twitter based on Temporal and Social Terms Evaluation,” In Proceedings of the Tenth International Workshop on Multimedia Data Mining, No. 4, pp. 1-10.
[Che13] Chen, N., Ribeiro, B., Vieira, A. and Chen, A. (2013), “Clustering and Visualization of Bankruptcy Trajectory using Self-organizing Map,” Expert Systems with Applications, vol. 40(1), pp. 385-393.
[Cla08] Clauset, A., Moore, C. and Newman, M. E. J. (2008), “Hierarchical Structure and the Prediction of Missing Links in Networks,” Nature 453, pp. 98-101.
[Dav10] Davidov, D., Tsur, O. and Rappoport, A. (2010), “Enhanced Sentiment Learning Using Twitter Hashtags and Smileys,” International Conference on Computational Linguistics, pp. 241–249.
[Don13] Dong, X. and Pi, D. C. (2013), “Novel Method for Hurricane Trajectory Prediction based on Data Mining,” Natural Hazards and Earth System Sciences, vol. 13, pp. 3111-3220.
[Erc07] Ercan, G. and Cicekli, I. (2007), “Using Lexical Chains for Keyword Extraction,” Information Processing and Management, vol. 43(6), pp. 1705-1714.
[Fri99] Friedman, N., Getoor, L. and Koller, D. (1999), “Learning Probabilistic Relational Models,” International Joint Conference on Artificial Intelligence, pp. 1300–1309.
[Fu12] Fu, M.-H., Lin, F.-Y. Lee, K.-R. and Kuo, Y.-H. (2012), “Resonance-relationship Network Construction by Information Analysis Based on Microblog Interactions,” International Conference on Creative Content Technologies, pp. 8-13.
[Gaf99] Gaﬀney, S. and Smyth, P. (1999), “Trajectory Clustering with Mixtures of Regression Models,” Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 63–72.
[Gol14] Goldhammer, M., Doll, K., Brunsmann, U., Gensler, A. and Sick, B. (2014), “Pedestrian's Trajectory Forecast in Public Traffic with Artificial Neural Networks,” International Conference on Intelligent Transportation System, pp. 1758-1763.
[Gra14] Grahl, J. Rothlauf, F. and Hinz, O. (2014), “The Impact of User-Generated Content on Sales: A Randomized Field Experiment,” Working Paper Series, Technischen Universität Darmstadt.
[Hat93] Hatzivassiloglou, V. and Mckeown. K. R. (1993), “Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning,” Proceeding of 31st Annual Meeting of the Association for Computational Linguistics, Columbus, pp. 172-182.
[Hec07] Heckerman, D., Meek, C. and Koller, D. (2007), “Probabilistic Entity-relationship Models, PRMs, and Plate Models,” Introduction to Statistical Relational Learning, pp. 201-239.
[Hsi13] Hsiao, L.-F., Yang, M.-J., Lee, C.-S., Kuo, H.-C., Shih, D.-S., Tsai, C.-C., Wang, C.-J., Chang, L.-Y., D. Chen, Y.-C., Feng, L., Hong, J.-S., Fong, C.-T., Chen, D.-S., Yeh, T.-C., Huang, C.-Y., Guo, W.-D. and Lin, G.-F. (2013), “Ensemble Forecasting of Typhoon Rainfall and Floods over a Mountainous Watershed in Taiwan,” Journal of Hydrology, pp. 55-68.
[Hu09] Hu, X., Sun, N., Zhang, C. and Chua, T.-S. (2009), “Exploiting Internal and External Semantics for the Clustering of Short Texts using World Knowledge,” Proceeding of the 18th ACM Conference on Information and Knowledge Management, New York, USA, pp. 919-928.
[Jah12] Jahanbakhsh, K., King, V. and Shoja, G. C. (2012), “Predicting Missing Contacts in Mobile Social Networks,” Pervasive and Mobile Computing, pp. 698–716.
[Jia11] Jiang, L., Yu, M., Zhou, M., Liu, X. and Zhao, T. (2011), “Target-dependent Twitter Sentiment Classification.,” Proceedings 49th Annual Meeting of the Association for Computational Linguistics, Portland, Oregon, pp. 151–160.
[Kap11] Kaplan, A. M. and Haenlein, M. (2011), “The Early Bird Catches the News: Nine Things You Should Know about Micro-blogging,” Business Horizons, pp. 105-113.
[Kit85] Kittler, J. and Illingworth, J. (1986), “Relaxation Labelling Algorithms - A Review,” Journal of Image and Vision Computing, vol. 3(4), pp. 206–216.
[Kle02] Kleinberg, J. (2002), “Bursty and Hierarchical Structure in Streams,” Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 91–101.
[Kot04] Kotsiantis, S. and Pintelas, P. (2004), “Combining Bagging and Boosting,” International Journal of Computational Intelligence, vol. 1(4), pp. 324-333.
[Kum14] Kumar, M. and Anand, M. (2014), “An Application of Time Series ARIMA Forecasting Model for Predicting Sugarcane Production in India,” Studies in Business and Economics, vol. 9(1), pp. 81-94.
[Kuo15] Kuo, Y.-H., Fu, M.-H., Tsai, W.-H., Lee, K.-R., and Chen, L.-Y. (2015), “Integrated Microblog Sentiment Analysis from Users’ Social Interaction Patterns and Textual Opinions,” Journal of Applied Intelligence, pp. 1-15.
[Len05] Lenser, S. and Veloso, M. (2005), “Nonparametric Time Series Classification,” IEEE International Conference on Robotics and Automation, pp. 3918-3923.
[Lia13] Liang, P.-W. and Dai, B.-R. (2013), “Opinion Mining on Social Media Data,” Proceedings 14th IEEE International Conference on Mobile Data Management, pp. 91-96.
[Lin98] Lin, D. (1998), “An Information-theoretic Definition of Similarity,” Proceedings of the Fifteenth International Conference on Machine Learning, pp. 296–304.
[Liu10a]Liu, B. (2010), Sentiment Analysis and Subjectivity, Chapter 26, Handbook of Natural Language Processing, 2nd ed., Chapman and Hall, pp. 627-666.
[Liu10b]Liu, Z., Yu, W., Chen, W., Wang, S. and Wu, F. (2010), “Short Text Feature Selection for Micro-Blog Mining,” Proceedings of International Conference on Computation Intelligence and Software Engineering, pp. 1-4.
[Liu13] Liu, W.-C., Fu, M.-H., Lee, K.-R. and Kuo, Y.-H. (2013), “A Content Fusion System Based on User Participation Degree on Microblog,” International Conference on Industrial, Engineering Applications of Artificial Intelligence and Expert Systems, pp. 83-90.
[Mat10] Mathioudakis, M. and Koudas, N. (2010), “Twittermonitor: Trend Detection Over the Twitter Stream,” Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, pp. 1155–1158.
[McP01] McPherson, M., Smith-Lovin, L. and Cook, J. M. (2001), “Birds of a Feather: Homophily in Social Networks,” Annual Review of Sociology, vol. 27, pp. 415-444.
[Nag10] Nagin, S. D. and Odgers, C. L. (2010), “Group-Based Trajectory Modeling in Clinical Research,” The Annual Review of Clinical Psychology, vol. 6, pp. 109–38.
[Nik12] Nikolov, S. and Shah, D. (2012), “A Nonparametric Method for Early Detection of Trending Topics,” Workshop on Information and Decision in Social Networks.
[Pak10] Pak, A. and Paroubek, P. (2010), “Twitter as a Corpus for Sentiment Analysis and Opinion Mining,” Proceedings of the Seventh Conference on Language Resources and Evaluation, vol. 10, pp. 1320–1326.
[Pan04] Pang, B. and Lee, L. (2004), “A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization based on Minimum Cuts,” Association for Computational Linguistics, pp. 271-278.
[Pan08] Pang, B. and Lee, L. (2008), “Opinion Mining and Sentiment Analysis,” Foundations and Trends in Information Retrieval, vol. 2, pp. 1-135.
[Qui96] Quinlan, J. R. (1996), “Improved Use of Continuous Attributes in C4.5,” Journal of Artificial Intelligence Research, vol. 4, pp. 77-90.
[Rey13] Reyes, J., Morales-Esteban, A. and Martínez-Álvarez, F. (2013), “Neural Networks to Predict Earthquakes in Chile,” Applied Soft Computing, vol. 13, pp. 1314-1328.
[Sch99] Schapire, R. E. (1999), “A Brief Introduction to Boosting,” International Joint Conference on Artificial Intelligence, vol. 2, pp. 1401-1406.
[Sta15] Stavrianea, A. and Kavoura, A. (2015), “Social Media's and Online User-generated Content's Role in Services Advertising,” AIP Conference Proceedings, vol. 1644(1), pp. 318-324.
[Suh10] Suh, B., Hong, L., Pirolli, P. and Chi, E. H. (2010), “Want to be Retweeted? Large Scale Analytics on Factors Impacting Retweet in Twitter Network,” IEEE Second International Conference on Social Computing, pp. 177–184.
[Sun09] Sun, D., Zhou, T., Liu, J.-G., Liu, R.-R., Jia, C.-X. and Wang, B.-H. (2009), “Information Filtering based on Transferring Similarity,” Physical Review E 80, 017101.
[Tan11] Tan, C., Lee, L., Tang, J., Jiang, L., Zhou, M. and Li, P. (2011), “User-Level Sentiment Analysis Incorporating Social Networks,” Proceedings of Knowledge Discovery and Data Mining (SIGKDD), California, USA, pp. 1397–1405.
[Tur02] Turney, P. D. (2002), “Thumbs up or Thumbs down? Semantic Orientation applied to Unsupervised Classification of Reviews,” Proceedings Association for Computational Linguistics, pp. 417–424.
[Tur03] Turney, P. D. and Littman M. L. (2003), “Measuring Praise and Criticism: Inference of Semantic Orientation from Association,” ACM Transaction of Information System, No. 21, pp. 315–346.
[Wan96] Wang, J.-H. and Leu, J.-Y (1996), “Stock Market Trend Prediction using ARIMA-based Neural Networks,” International Conference on Neural Networks, vol. 4, pp. 2160-2165.
[Wes14] West, R., Paskov, H., Leskovec, J. and Potts, C. (2014), “Exploiting Social Network Structure for Person-to-person Sentiment Analysis,” Transactions of the Association for Computational Linguistics, vol. 2, pp. 297-310.
[Yu07] Yu, K., Chu, W. and Yu, S. (2007), “Stochastic Relational Models for Discriminative Link Prediction,” Advances in Neural Information Processing Systems, pp. 1553–1560.
[Yun05] Xia, Y., Wong, K. and Gao, W. (2005), “NIL is not Nothing: Recognition of Chinese Network Informal Language Expressions,” Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing, pp. 95-102.
[Zam10] Zaman, T. R., Herbrich, R., Gael, J. v. and Stern, D. (2010), “Predicting Information Spreading in Twitter,” Proceedings of Workshop on Computational Social Science and the Wisdom of Crowds, NIPS, pp. 1-4.
[Zha08] Zhang, X. and Yao, T. (2008), “A Study of Network Informal Language Using Minimal Supervision Approach,” Autonomous Systems – Self-organization, Management, and Control, pp. 169-175.