進階搜尋


下載電子全文  
系統識別號 U0026-2608201613495400
論文名稱(中文) 利用語者辨識技術於居家安全應用
論文名稱(英文) Exploiting Speaker Recognition for Home Security Applications
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 104
學期 2
出版年 105
研究生(中文) 潘冠中
研究生(英文) Kuan-Chung Pan
學號 n96031130
學位類別 碩士
語文別 英文
論文頁數 45頁
口試委員 指導教授-鄧維光
口試委員-侯廷偉
口試委員-林耕霈
口試委員-席家年
中文關鍵字 語者識別  生物特徵  向量量化  居家安全 
英文關鍵字 speaker recognition  biometrics  vector quantization  home security 
學科別分類
中文摘要 在現今資訊科技發達的年代,個人的隱私安全逐漸被人們重視,傳統採用帳號密碼的認證方法,漸已暴露出安全性不足的問題。相較而言,生物識別技術可以取代或加強現有的認證方法,每個人獨有的生物特性經過特徵處理之後即可作為安全認證的標準。在本研究中,我們使用了人的語音作為驗證的標準以提升居家安全,因為語音資料較其他的生物特徵更為容易取得。明確而言,語者識別技術的重點在於能夠有效地擷取人的聲音特性,而每個人發音的聲帶和聲道特徵都不一樣,且每個人的講話方式都有一個特定的節奏風格等。根據前人研究,聲音特徵的提取多使用梅爾倒頻譜係數將語音片段轉化為對應的特徵向量,接著藉由向量量化的方法可將特徵向量進行編碼並建立語音模型;而在辨識時,將未知的語音資料和事先建立好的語音模型進行相似度比對,找出最小的偏差值便可判定為辨識結果。為了提高居家安全此一應用的實用性,我們所設計與實作的原型系統提供了語者識別和歷程記錄瀏覽等功能,而根據我們的實驗評估結果,除可即時且正確地辨識家庭成員外,更能識別陌生人的語音而主動示警。
英文摘要 Nowadays privacy is considered as an important issue for all individuals. Conventional approach of utilizing the combination of username and password for authentication is becoming less secure. On the other hand, biometrics become new features for authentication because biometrics are inherently unique and measurable characteristics that can be used to identify a person. Among several types of biometrics, it is generally easier to get voice data of a person. In order to improve home security, in our research using speaker recognition technology, the technology is to identify a person utilizing the characteristics of human voice. Speaker recognition techniques can be effectively to extraction the person’s vocal tract features. Their vocal tract shapes, larynx sizes, and other parts of their voice production organs are different no two individuals sound identical. In prior works, the Mel-Frequency Cepstrum Coefficients can describe the vocal tract characteristics and easy to captures vocal tract characteristics more effectively. The extracted features by vector quantization approach to create voice modeling. During the identification, a speech sample or utterance is compared against a previously created voice model. Our prototype system has two main functions including speaker recognition and history review. Experimental results show that our system can instantly and accurately identify family members in the home environment. Moreover, strangers can be detected so as to actively alert family members.
論文目次 Chapter 1 Introduction 1
1.1 Motivation and Overview 1
1.2 Contributions of this Work 2
Chapter 2 Preliminaries 3
2.1 Biometrics Authentication for Home Security 3
2.2 Overview of a Speaker Recognition System 5
2.3 Variants of Spoken Input in Speaker Recognition Systems 9
2.4 Preprocessing of Voice Data 10
Chapter 3 Utilizing GQSOM for Voice Modeling 14
3.1 Voice Modeling 14
3.1.1 Usage of K-means 14
3.1.2 Usage of the Self-Organizing Map 16
3.1.3 Usage of the Growing Quadtree Self-Organizing Map 19
3.2 Exceptional Enrollment and Identification for Family Members 23
Chapter 4 Empirical Studies 27
4.1 Proposed Scheme 27
4.2 Experimental Results 28
4.3 Proposed Speaker Recognition System 38
Chapter 5 Conclusions and Future Works 41
Bibliography 42

參考文獻 [1] B. Ayoub, K. Jamal, and Z. Arsalane, “Self-organizing mixture models for text-independent speaker identification,” Third IEEE International Colloquium in Information Science and Technology (CIST), pages 345-350, Oct 2014.
[2] P. Agrawal, and H. A. Patil, “Fusion of TEO Phase with MFCC Features for Speaker Verification,” Proceedings of the 2nd International Conference on Perception and Machine Intelligence, pages 161-166, February 2015.
[3] H. Choi, R. Gutierrez-Osuna, S. Choi, and Y. Choe, “Kernel oriented discriminant analysis for speaker-independent phoneme spaces,” International Conference on Pattern Recognition, pages 1-4, December 2008.
[4] S. Chakroborty, A. Roy, S. Majumdar, and G. Saha, “Capturing Complementary Information via Reversed Filter Bank and Parallel Implementation with MFCC for Improved Text-Independent Speaker Identification,” International Conference on Computing: Theory and Applications, pages 463-467, March 2007.
[5] S. Debnath, B. Soni, U. Baruah, and D. K. Sah, “Text-Dependent Speaker Verification System: A Review,” International Conference on Intelligent Systems and Control (ISCO), pages 1-7, Jan 2015.
[6] D. Govind, A. S. Biju, and A. Smily, “Automatic speech polarity detection using phase information from complex analytic signal representations,” International Conference on Signal Processing and Communications (SPCOM), pages 1-5, July 2014.
[7] T. Gulzar, A. Singh, and S. Sharma, “Comparative Analysis of LPCC, MFCC and BFCC for the Recognition of Hindi Words using Artificial Neural Networks,” International Journal of Computer Applications, 101(12):22-27, September 2014.
[8] B. Homayoon, “Fundamentals of speaker recognition,” Springer Science & Business Media, 2011.
[9] D. Hosseinzadeh and S. Krishnan, “Combining Vocal Source and MFCC Features for Enhanced Speaker Recognition Performance Using GMMs,” Workshop on Multimedia Signal Processing, pages 365-368, Oct 2007.
[10] R. Hasan, M. Jamil, G. Rabbani, and Saifur Rahman, “Speaker Identification Using Mel Frequency Cepstral Coefficients,” International Conference on Electrical & Computer Engineering (ICECE), pages 565-568, December 2004.
[11] J. Justiniano, C. Javier, A. Blecher, and H. Beigi, “Acceptability Research for Audio Visual Recognition Technology,” Recognition Technologies Technical Report No. RTI-20150128-01, January 2015
[12] A. K. Jain, A. A. Ross, and K. Nandakumar, “Introduction to Biometrics,” Springer Science & Business Media, 2011.
[13] A. K. Jain, A. Ross, and S. Prabhakar, “An introduction to biometric recognition,” IEEE Transactions on Circuits and Systems for Video Technology, 14(1):4-20, Jan 2004.
[14] P. Kumar, and S. L. Lahudkar, “Automatic Speaker Recognition using LPCC and MFCC,” International Journal on Recent and Innovation Trends in Computing and Communication, 3(4):2106-2109, April 2015.
[15] T. Kohonen, “The Self-organizing Map,” Proceedings of IEEE, 78(9):1464-1480, September 1990.
[16] T. Kinnunen, and H. Li, “An overview of text-independent speaker recognition: From features to supervectors,” Speech Communication, 52(1):12-40, January 2010.
[17] T. Kinnunen, T. Kilpeläinen, and P. FrÄnti, “Comparison of clustering algorithms in speaker identification,” dim, 1(2), 2011.
[18] I. Lapidot, “Self-Organizing-Maps with BIC For Speaker Clustering,” IDIAP, 2(60), December 2002.
[19] I. Lapidot, H. Guterman, and A. Cohen, “Unsupervised speaker recognition based on competition between self-organizing maps,” IEEE Transactions on Neural Networks, 13(4):877-887, July 2002.
[20] A. T. Mafra and M. G. Simoes, “Text independent automatic speaker recognition using selforganizing maps,” Conference Record of the 2004 IEEE Industry Applications Conference, 3:1503-1510, Oct 2004.
[21] L. Muda, M. Begam and I. Elamvazuthi, “Voice recognition algorithms using mel frequency cepstral coefficient (MFCC) and dynamic time warping (DTW) techniques,” Journal of Computing, 2(3):138-143, March 2010.
[22] R. Mathur and S. N. Sharma, “Performance Comparison of Speaker Identification using Vector Quantization by MFCC Algorithm,” International Journal of Engineering Development and Research (IJEDR), 3(2):252-255, 2015.
[23] S. Nakagawa, L. Wang, and S. Ohtsuka, “Speaker Identification and Verification by Combining MFCC and Phase Information,” IEEE Transactions on Audio, Speech, and Language Processing , 20(4):1085-1095, May 2012.
[24] J. Patel and A. Nandurbarkar, “Development and Implementation of Algorithm for Speaker recognition for Gujarati Language,” International Research Journal of Engineering and Technology (IRJET), 2(2):444-448, May 2015.
[25] D. A. Reynolds, “An overview of automatic speaker recognition technology,” IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 4:4072-4075, May 2002.
[26] H. Seddik, A. Rahmouni, and M. Sayadi, “Text independent speaker recognition using the Mel frequency cepstral coefficients and a neural network classifier,” First International Symposium on Control, Communications and Signal Processing, pages 631-634, 2004.
[27] M. Sahidullah, and G. Saha, “Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition,” Speech Communication, 54(4):543-565, May 2012.
[28] M. Sahidullah and T. Kinnunen, “Local spectral variability features for speaker verification,” Digital Signal Processing, 50:1-11, November 2015.
[29] V. Tiwari, “MFCC and its applications in speaker recognition,” International Journal on Emerging Technologies, 1(1):19-22, 2010.
[30] W.-G. Teng, P.-L. Chang, and C.-T. Yang, “Adaptive and Efficient Colour Quantisation Based on a Growing Self-Organising Map,” IET Image Processing, 6(5):463-472, July 2012.
[31] O. Viikki, and K. Laurila, “Cepstral domain segmental feature vector normalization for noise robust speech recognition,” Speech Communication, 25:133-147, February 1998.
[32] Z. Weng, L. Li, and Donghui Guo, “Speaker recognition using weighted dynamic MFCC based on GMM,” International Conference on Anti-Counterfeiting, Security and Identification, pages 285-288, July 2010.
[33] Y. Yujin, Z. Peihua, and Z. Qun, “Research of speaker recognition based on combination of LPCC and MFCC,” IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), 3:765-767, October 2010.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2018-01-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2018-01-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw