||Preliminary Study on Aviation Safety System using Big Data Analytics
||Institute of Civil Aviation
Big data analytic
Engine life cycle
基於CRISP-DM(Cross Industry Standard Process for Data Mining)，本論文將這套已發展成熟並廣泛利用的架構利用於航空業，初步探討大數據解析之思維。由於航空業瞬息萬變，數據解析又是針對個案研究，一套通用的程序可以讓分析師面對各種狀況時具備數據解析思維。本論文以引擎壽命預測做為範例，將引擎感測器產生的數據進入數據解析程序並探討，找出其隱藏價值。
Big data analysis has become a very popular topic recently. In various fields are trying to bring the new insight into their enterprise for increasing revenue or finding potential patterns. Furthermore the airline can seek new way to serve the marketplace and increase their profit, balancing both safety and revenue. Basis for big data analysis is on the volume, variety, velocity, and veracity of data. It needs to be well collected, analyzed, and applied that can reveal its value. For aviation, there is a strict regulation for collecting, preserving data, but most of the data are not well used.
In this study, based on CRISP-DM, a robust and well-proven data mining methodology, as the fundamental concept to create a data analytic processing which is suitable for aviation. A generic data thinking can help analysts to handle with diversified data type, a variety data source, and different analyzing software. We focus on the procedure that gives aviation analysts a logical and data thinking. The purpose of this study is to create a preliminary approach of aviation data analytic, because there is not a standard operation procedure yet. However, data analytic is a case by case study, it is impossible to have an all-powerful tool which is able to analyze every database. An engine sensor database is used in this procedure to give example for data thinking and investigating. Try to find hidden information to reveal its value.
Bringing big data analytic into aviation safety system does not mean to exclude experts' opinion. Human work with data can generate great insights.
List of Figures VI
List of Tables VII
CHAPTER I INTRODUCTION 1
1.1 Motivation 1
1.2 Main Idea 3
1.3 Scope 4
1.4 Literature survey 4
1.5 Thesis Outline 9
CHAPTER II RESEARCH BACKGROUND 10
2.1 Background 10
2.2 Data collection method 11
2.3 Aviation Data 11
2.4 Unlabeled and Labeled Data 14
2.5 Supervised, Semi-Supervised and Unsupervised Learning 15
2.6 Perspective difficulties 16
CHAPTER III METHODOLOGY 18
3.1 Basic concept of data mining 18
3.1.1 Data mining and big data 18
3.1.2 Applications of data mining 20
3.2 Basic concept of CRISP-DM 21
3.2.1 Aviation business Understanding 23
3.2.2 Data Understanding 24
3.2.3 Data Preparation 26
3.2.4 Modeling 28
3.2.5 Evaluation 29
3.2.6 Deployment 29
CHAPTER IV RESULTS AND DISCUSSION 31
4.1 Expected outcomes 31
4.2 Aviation business Understanding 32
4.3 Data Understanding 35
4.4 Data Preparation 38
4.4.1 Time series shifting 38
4.5 Modeling 45
4.6 Evaluation 47
4.7 Deployment 49
CHAPTER V CONCLUSIONS 50
 N. Graham, “Aviation Safety: Making a safe system even safer”, Air Navigation Bureau International Civil Aviation Organization Video Message, October, 2010.
 FAA, Aeronautical Decision-Making, pilot's handbook of aeronautical knowledge, chapter 17, 2014.
 J. Mouawad,& C. Drew, “Airline industry at its safest since the dawn of the jet age,” The New York Times Vol. 11, 2013
 A. Mosleh, A. Dias, G. Eghbali, & K. Fazen," An integrated framework for identification, classification, and assessment of aviation systems hazards." Probabilistic Safety Assessment and Management. Springer London, 2004.
 J. Caldwell, "Fatigue Countermeasures in Aviation". Aviation, Space, and Environmental Medicine 80 (1): pp. 29–59, 2009
 V. M. Schonberger, & K. Cukier., “Big Data: A Revolution That Will Transform How We Live, Work, and Think”, “Books of The Times”, May 2013.
 T. J. Wholey, IBM Global Business Services White Paper, “Commercial Aviation and Aerospace: Big Data Analytics for Advantage, Differentiation and Dollars”, December 2014.
 M. Finnegan, “Boeing 787s to Create Half a Terabyte of Data Per Flight, Says Virgin Atlantic”, March, 2013.
 Y. Dodge, the Oxford Dictionary of Statistical Terms, Oxford University Press, 6 edition. ISBN 0199206139, September 2006.
 W. Willcox, "The Founder of Statistics", Review of the International Statistical Institute Vol. 5, No. 4, pp. 321-328, 1938.
 A. Hald, a History of Mathematical Statistics, Wiley, ISBN 0471179124, 1998.
 Agrawal, & K. Gopal, Biomonitoring of Water and Waste water. Springer Science & Business Media, 2013.
 D. J. Rumsey, & D. Unger, U Can: Statistics for Dummies, John Wiley & Sons, Inc., pp. 31, ISBN: 978-1-119-08485-3, July 2015.
 W. J. Frawley, G. P. Shapiro, & C. J. Matheus, "Knowledge discovery in databases: An overview.” AI magazine 13. 3: 57, 1992.
 G. P. Shapiro, "Knowledge discovery in real databases: A report on the IJCAI-89 Workshop." AI magazine 11.4: 68, 1990
 U. Fayyad et al., "From data mining to knowledge discovery in databases”, AI magazine 17.3: 37, 1996.
 Shearer, “the CRISP-DM model: the new blueprint for data mining”, Journal of Data Warehousing, Volume 5 Number 4, 2000.
 G. P. Shapiro, “ CRISP-DM, still the top methodology for analytics, data mining, or data science projects”, available in April 2016 from website: http://www.kdnuggets.com/2014/10/crisp-dm-top-methodology-analytics-data-mining-data-science-projects.html
 S. Ayhan, J. Pesce, P. Comitz, D. Sweet, S. Bliesner, & G. Gerberick," Predictive analytics with aviation big data." Integrated Communications, Navigation and Surveillance Conference, ICNS, 2013.
 Computer Sciences Corporation Federal Sector, “Aircraft Situation Display to Industry: Functional Description and Interface Control Document for the XML Version.” Version 1.8, Available from website in April 206: http://www.fly.faa.gov/ASDI/asdi.html, 2011.
 J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, & A. H. Byers, “Big Data: The next frontier for innovation, competition and productivity”, Technical report, McKinsey Global Institute, May 2011.
 V. Dhar, “Data Science and Prediction”, Communications of the ACM, vol. 56, no. 12, December 2013.
 M.G. Siegler, “Eric Schmidt: Every 2 Days We Create As Much Information As We Did Up To 2003”,Techcrunch Newspaper, Aug 4, 2010.
 M. Schroeck, R. Shockley, J. Smart, D. Romero-Morales, & P. Tufano, “Analytics: The real-world use of big data”, IBM Global Business Services Business Analytics and Optimization Executive Report, October 2012.
 Z. Nazeri, E. Bloedorn, & P. Ostwald, "Experiences in mining aviation safety data." ACM SIGMOD Record, Vol. 30. No. 2. ACM, 2001.
 The INTEL IT Center, “Big Data Visualization: Turning Big Data Into Big Insights”, March 2013. Available from website: http://www.intel.com/content/www/us/en/big-data/big-data-visualization-turning-big-data-into-big-insights.html
 Rockwell Collins, Aircraft Communications Addressing and Reporting System (ACARS), available from website in April 2016: https://www.rockwellcollins.com/Services_and_Support/Information_Management/~/media/DA843DB0792946C58740F613328E5022.ashx
 Pagels, "Aviation Data Mining." Scholarly Horizons: University of Minnesota, Morris Undergraduate Volume 2 Issue 1 Article 3, 2015.
 Trewartha, "Investigating data mining in MATLAB" , Bachelor Dissertation, Department of Science, Rhodes University, 2006.
 X. ZHU, “Semi-supervised learning literature survey”, Computer Sciences TR 1530, University of Wisconsin, 2005.
 Example of supervised and semi-supervised learning in binary classification problems, Available from website: http://bioinformatics.oxfordjournals.org/content/24/6/783/F1.expansion.html
 Persing, & V. Ng, "Semi-supervised cause identification from aviation safety reports", Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Vol. 2, 2009.
 T. De Bie, T. T. Maia, & A. de Pádua Braga, "Machine Learning with Labeled and Unlabeled Data.", ESANN, 2009.
 M. S. Chen, J. Han, & P. S. Yu, "Data mining: an overview from a database perspective." Knowledge and data Engineering, IEEE Transactions pp.866-883, 1996.
 Harris, E. Bloedorn, & N. Rothleder, “Recent Experiences with Data Mining in Aviation Safety.” Special Interest Group on Management of Data, Data Mining and Knowledge Discovery (SIGMOD-DMKD) Workshop, 1998.
 S. Brin, R. Motwani, & C. Silverstein, "Beyond market baskets: Generalizing association rules to correlations." ACM SIGMOD Record. Vol. 26. No. 2. ACM, 1997.
 J. Widom, "Research problems in data warehousing." Proceedings of the fourth international conference on Information and knowledge management. ACM, 1995.
 S. T. Gahane, “The Conceptual Overview: Challenges and Opportunities with “BIG DATA””, International Journal of IT, Engineering and Applied Sciences Research (IJIEASR) Volume 2, No. 9, pp.23-28, September 2013.
 T. J. Wholey, IBM Global Business Services White Paper, “Commercial Aviation and Aerospace: Big Data Analytics for Advantage, Differentiation and Dollars.” December 2014. Available from website: http://www-01.ibm.com/common/ssi/cgi-bin/ssialias?infotype=SA&subtype=WH&htmlfid=GBW03316USEN
 D. Al-Jumeily, A. Hussain, C. Mallucci, & C. Oliver, Applied Computing in Medicine and Health, Morgan Kaufmann, pp.110, 2015.
 P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, & R. Wirth ,"CRISP-DM 1.0 Step-by-step data mining guide." 2000.
 Harper & S. D. Pickett, “Methods for mining HTS data”, Drug Discovery Today, Volume 11, Numbers 15/16, pp. 694-699, 2006.
 Holmes, A. Donkin, & I. H. Witten, “Weka: A machine learning workbench”, Intelligent Information Systems, 1994. Proceedings of the 1994 Second Australian and New Zealand Conference, 1994.
 T. Wang, J. Yu, D. Siegel, & J. Lee, “A similarity-based prognostics approach for remaining useful life estimation of engineered systems. In Prognostics and Health Management. ”, International Conference on, IEEE, pp. 1-6, 2008.
 R. Kohavi, " A study of cross-validation and bootstrap for accuracy estimation and model selection.", Ijcai, Vol. 14, No. 2, 1995.