進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0707202010463500
論文名稱(中文) 探討線性轉換方法對隱私保護資料探勘流程效用及安全性影響之研究
論文名稱(英文) The Impact of Linear Transformation on the Effectiveness and Security of the Privacy Preserving Data Mining Process
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 108
學期 2
出版年 109
研究生(中文) 方荷雅
研究生(英文) He-Ya Fang
學號 R76074111
學位類別 碩士
語文別 中文
論文頁數 54頁
口試委員 指導教授-翁慈宗
口試委員-王維聰
口試委員-蔡青志
中文關鍵字 隱私保護  資料探勘  資料擾動  加密 
英文關鍵字 classification  data perturbation  encryption  linear transformation  privacy preserving 
學科別分類
中文摘要 由於資料探勘技術能夠從資料中提取有用的知識,以便更好地了解和服務客戶,從而獲得競爭優勢,因此在資料探勘這一領域也越來越多人投入及研究。而隨著資訊技術的發展,使得購物習慣、信用記錄、疾病歷史等個人資料都能夠被收集和處理。毫無疑問,這些資訊對於許多領域都非常有用。然而,目前大眾對個人隱私的關注越來越大,許多人並不希望自己的私密資料被透露出去,所以保障資料的安全性又能夠透過資料探勘取得有效的資訊,在現今是一項需要被思考的事。故本研究嘗試建立一套方法流程,保障原始資料的私密性及效用性,期望對隱私保護資料探勘這塊領域有所貢獻。
目前隱私保護資料探勘這一領域,除去加密和匿名化,大多以對資料進行擾動來保護資料隱私,而擾動會導致資料失真,降低資料的效用性,因此本研究使用一段和多段式線性函數,將原始的連續型資料轉換成另一種數值,確保原始資料值不被直接得知,又能夠讓資料維持原本的效用性,再透過模型還原和預測程式的方法,使得資料接收者能夠使用自己的資料來做新資料預測。除了轉換資料集之外,還加入對轉換資料集的噪音干擾及加密的方式,使得在資料集的傳遞過程中,達到加強資料安全性的效果,讓第三方(非資料提供或接收者)無法使用此轉換的資料集。依據實驗結果可以發現決策樹及規則分類,使用多段式線性轉換能夠保持資料的效用性,提供比一段式轉換還高的安全性,並且分段數越多,安全性就會越高;羅吉斯迴歸和支持向量機則是使用一段式轉換即可,因為一段式轉換能夠維持資料的效用性,並且根據本研究的流程與評估方式,一段式轉換並不會對這兩種分類方法的安全性造成影響。
英文摘要 Since data mining techniques can extract useful knowledge from data, more and more people are devoted to this field. The data for mining generally contain personal records, and hence people pay more attention on preventing their private data from being disclosed. This study attempts to establish a procedure to ensure the effectiveness and security of the original data. Data are transformed by piecewise linear functions before sending to data analysts who will apply classification methods on the transformed data. Transmitting data are also protected by perturbation and encryption processes. The classification models produced by data analysts can be sent back to data providers who will restore the models for the data analysts. This restoring process is designed to ensure that data analysts can have models for classifying new instances. According to the experimental results on ten data sets, the more pieces a linearly function has, the higher security can be achieved for algorithms decision tree and rule-based classifier. The data analyzed by algorithms logistic regression and support vector machine should not be transformed by multi-piece linear functions, because the accuracies of the original and transformed data resulting from these two algorithms will be different.
論文目次 摘要 I
誌謝 VI
目錄 VII
表目錄 IX
圖目錄 X
第一章 緒論 1
1.1 研究背景及動機 1
1.2 研究目的 2
1.3 研究流程 3
1.4 研究限制 3
第二章 文獻探討 4
2.1 隱私保護資料探勘 4
2.1.1 隱私保護資料探勘的應用技術 4
2.1.2 隱私保護資料探勘的實際應用 6
2.1.3 隱私保護資料探勘的挑戰 7
2.2 資料擾動技術 8
2.3 加密技術 10
2.3.1 同態加密 10
2.3.2 RSA加密 11
2.4 PPDM評估方法 13
2.4.1 隱私級別 13
2.4.2 資料質量 13
2.5 小結 15
第三章 研究方法 17
3.1 資料轉換 18
3.1.1 一段式線性轉換 18
3.1.2 多段式線性轉換 19
3.2 效果測試 21
3.2.1 決策樹、規則分類 21
3.2.2 羅吉斯/線性迴歸、支持向量機 24
3.3 新資料預測 25
3.3.1 模型還原 27
3.3.2 預測資料程式 28
3.4 噪音干擾及加密 30
3.5 評估指標 31
第四章 實證研究 33
4.1 資料集介紹 33
4.2 效用性實證 34
4.2.1 一段式線性分析 35
4.2.2 多段式線性分析 38
4.3 安全性實證 45
4.4 小結 48
第五章 結論與建議 49
5.1 結論 49
5.2 未來研究與發展 50
參考文獻 51
參考文獻 Ahmad, I., & Archana, K. (2014). Homomorphic encryption method applied to cloud computing. International Journal of Information & Computation Technology, 4(15), 1519-1530.
Agrawal, R., & Srikant, R. (2000). Privacy-preserving data mining. Proceedings of the ACM SIGMOD Conference on Management of Data, 439-450.
Bhaladhare, P. R., & Jinwala, D. C. (2016). Novel approaches for privacy preserving data mining in k-anonymity model. Journal of Information Science and Engineering, 32(1), 63–78.
Chen, K., & Liu, L. (2005). Privacy preserving data classification with rotation
perturbation. Fifth IEEE International Conference on Data Mining, 589-592.
Cai, YL., & Tang, CM. (2019). Privacy of outsourced two-party k-means clustering. Concurrency and Computation: Practice and Experience, doi: 10.1002/cpe.5473.
Gokulnath, C., Priyan, M. K., Balan, E. V., Prabha, K. R., & Jeyanthi, R. (2015).
Preservation of privacy in data mining by using PCA based perturbation technique. 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials, 202-206.
Gao, JL., Ping, Q., & Wang, JX. (2018). Resisting re-identification mining on social graph data. World Wide Web - Internet and Web Information Systems, 21(6), 1759-1771.
Jain, Y. K., & Bhandare, S. K. (2011). Min max normalization based data perturbation method for privacy protection. International Journal of Computer & Communication Technology, 2, 45-50.
Liew, C. K., Choi, U. J., & Liew, C. J. (1985). A data distortion by probability
distribution. ACM Transaction on Database Systems, 10(3), 395-411.
Li, G., & Xue, R. (2018). A new privacy-preserving data mining method using non-negative matrix factorization and singular value decomposition. Wireless Personal Communications, 102(2), 1799-1808.
López, V., Fernández, A., Moreno-Torres, J. G., & Herrera, F. (2012). Analysis of preprocessing vs. cost-sensitive learning for imbalanced classification. Open problems on intrinsic data characteristics. Expert Systems with Applications, 39(7), 6585-6608.
Mittal, D., Kaur, D., & Aggarwal, A. (2014). Secure data mining in cloud using
homomorphic encryption. 2014 IEEE International Conference on Cloud Computing in Emerging Markets, 1–7.
Ma, H., Guo, XY., Ping, Y., Wang, BC., Yang, YH., Zhang, ZL., & Zhou, JX. (2019). PPCD: Privacy-preserving clinical decision with cloud support. Plos One, 14(5), doi: 10.1371/journal.pone.0217349.
Maheswaria, N., & Revathi, M. (2014). Data security using decomposition. International Journal of Applied Science and Engineering, 12(4), 303-312.
Mendes, R., & Vilela, J. P. (2017). Privacy-preserving data mining: methods, metrics, and applications. IEEE Access, 5, 10562–10582.
Oliveira, S.R.M., & Zaı¨ane, O.R. (2010). Privacy preserving clustering by data
transformation. Journal of Information and Data Management, 1(1), 37–51.
Rivest, R., Shamir, A., & Adleman, L. (1978). A method for obtaining digital signatures and public key cryptosystems. Communications of the ACM, 21(2), 120-126.
Rathna, S. S., & Karthikeyan, T. (2015). Survey on recent algorithms for privacy preserving data mining. International Journal of Computer Science and Information Technologies, 6(2), 1835-1840.
San, I., At, N., Yakut, I., & Polat, H. (2016). Efficient paillier cryptoprocessor for privacy-preserving data mining. Security and Communication Networks, 9(11), 1535–1546.
Saranya, K., Premalatha, K., & Rajasekar, S. S. (2015). A survey on privacy preserving data mining. 2nd International Conference on Electronics and Communication System, 1740–1744.
Sweeney, L. (2002). k-anonymity: a model for protecting privacy. International Journal on Uncertainty, Fuzziness, and Knowledge-Based Systems, 10(5), 557-570.
Tripathi, R., & Agrawal, S. (2014). Comparative study of symmetric and asymmetric cryptography techniques. International Journal of Advance Foundation and Research in Computer, 1(6), 68–76.
Tsiafoulis, S. G., & Zorkadis, V. C. (2010). A neural network clustering based
algorithm for privacy preserving data mining. International Conference on Computational Intelligence and Security, 401-405.
Upadhyay, S., Sharma, C., Sharma, P., Bharadwaj, P., & Seeja, K. R. (2018). Privacy preserving data mining with 3-D rotation transformation. Journal of King Saud University-Computer and Information Sciences, 30(4), 524-530.
Wang, Q., Du, MX., Chen, XY., Chen, YJ., Zhou, P., Chen, XF., & Huang, XY. (2018).Privacy-preserving collaborative model learning: the case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12), 2381-2393.
Wu, W., Parampalli, U., Liu, J., & Xian, M. (2019). Privacy preserving k-nearest neighbor classification over encrypted database in outsourced cloud environments. World Wide Web - Internet and Web Information Systems, 22(1), 101-123.
Wang, Y., Adams, S., Beling, P., Greenspan, S., Rajagopalan, S., Velez-Rojas, M., Mankovski, S., Boker, S., & Brown, D. (2018). Privacy preserving distributed deep learning and its application in credit card fraud detection. 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/ 12th IEEE International Conference on Big Data Science and Engineering, 1070–1078.
Yin, D., & Yang, Q. (2018) GANs based density distribution privacy-preservation on mobility data. Security and Communication Networks, 2018(2), 1-13.
Yu, S. (2016). Big privacy: challenges and opportunities of privacy study in the age of big data. IEEE Access, 4, 2751–2763.
Yun, U., & Kim, J. (2015). A fast perturbation algorithm using tree structure for privacy preserving utility mining. Expert Systems with Applications, 42(3), 1149–1165.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-05-30起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-05-30起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw