進階搜尋


 
系統識別號 U0026-0812200914141460
論文名稱(中文) 利用自調節門檻值與追蹤能量封包動態於複合式語音活動偵測演算法
論文名稱(英文) Using a Self-Regulatory Threshold and a Tracking Power Envelope Dynamics for an Integrated Voice Activity Detection Algorithm
校院名稱 成功大學
系所名稱(中) 電機工程學系碩博士班
系所名稱(英) Department of Electrical Engineering
學年度 96
學期 2
出版年 97
研究生(中文) 毛成一
研究生(英文) Cheng-yi Mao
電子信箱 n2695422@mail.ncku.edu.tw
學號 n2695422
學位類別 碩士
語文別 英文
論文頁數 56頁
口試委員 口試委員-雷曉方
口試委員-林俊宏
指導教授-王振興
中文關鍵字 自調節門檻值  語音活動偵測 
英文關鍵字 voice activity detection  self-regulatory threshold 
學科別分類
中文摘要 本論文描述一複合式語音活動偵測演算法,其結合自調節門檻值及動態能量封包追蹤以達到分類效果。本文針對傳統語音活動偵測演算法中,需預先決定門檻值的方法作改良,並利用雜訊頻譜自動計算於門檻值估測中所需之關鍵參數,改良過去演算法設定參數之方法。同時為了符合實際應用中避免語音資料遺失之需求,自調節門檻值結合動態能量封包追蹤演算法,以期提高對雜訊偵測的準確性來達到降低語音偵測錯誤率的發生。由於本文中此二演算法之計算參數皆以頻譜能量為基礎並且為非因果性的計算特性,因此此二演算法可經由適當的結合而產生複合架構。最後,考慮過去的偵測值與兩種演算法得出相反結果時對特定方法的信任程度,乘上不同權重值後得到最終輸出結果。本演算法之效果經由AURORA語料庫對於偵測準確度的驗證,得到可接受之語音偵測錯誤率的情況下,同時達到較高的非語音偵測正確率,並接近於即時處理系統。
英文摘要 This paper presents an integrated voice activity detection (VAD) algorithm that is composed of an adaptive VAD with a self-regulatory threshold setting mechanism and an energy-based VAD with tracking power envelope dynamics. We develop an auto-modulated scheme of a crucial parameter calculation by a noise spectrum estimation to ameliorate the issue of parameter setting in conventional adaptive threshold schemes. The motivation of the integration with a tracking power envelope dynamics is to rely on the superiority in noise detection to suppress the speech false alarm rate to satisfy the need of practical applications. The final VAD outputs are determined by the proposed integrated scheme which considers the past decisions and the confidence level in alternative algorithms based on the case of the opposite detections. The effectiveness of the proposed scheme has been validated by the AURORA database. According to the experimental results, the proposed scheme achieves an acceptable speech-false-alarm rate and a higher non-speech hit rate in real-time procedures than those of some existing VAD algorithms.
論文目次 CHINESE ABSTRACT i
ABSTRACT ii
ACKNOWLEDGEMENT iii
LIST OF TABLES vi
LIST OF FIGURES vii
1 Introduction 1-1
1.1 Motivation 1-1
1.2 Literature Survey 1-3
1.3 Purpose of the Study 1-5
1.4 Organization of the Thesis 1-6
2 Self-Regulative Threshold Based Adaptive VAD 2-1
2.1 Signal Preprocessing 2-1
2.2 Long-term Spectrum Estimation and Threshold Calculation 2-3
2.2.1 Signal-to-Noise Ratio Measure 2-4
2.2.2 Threshold Calculation 2-6
2.2.3 Scheme of the PFA Calculation 2-12
2.3 Compression Procedure 2-16
2.4 Flow Diagram of the Adaptive VAD with Self-Regulatory Threshold 2-19
3 Integrated VAD for Reducing the Missing Information of Speech 3-1
3.1 Application Requirement 3-1
3.2 Tracking Power Envelope Dynamics Algorithm 3-3
3.3 Fusion Scheme 3-10
4 Simulation Results 4-1
4.1 Performance Indicators 4-1
4.2 Parameters Setting and Speech Database Introduction 4-4
4.3 Performance of PFA Estimation 4-6
4.4 Performance of the Integration Structure 4-9
4.5 Comparison with Three Different VADs 4-10
5 Conclusions and Future Work 5-1
5.1 Conclusions 5-1
5.2 Future Work 5-2
References
參考文獻 [1] M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise,” in Proc. IEEE International Conf. Acoustics, Speech, Signal Processing, pp. 208-211, 1979.
[2] S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction,” IEEE Trans. Acoustics Speech Signal Processing, vol. 27, no. 2, pp. 113-120, 1979.
[3] A. Benyassine, E. Shlomot, H. Y. Su, D. Massaloux, C. Lamblin, and J. P. Petit, “ITU-T recommendation G.729 annex B: A silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications,” IEEE Communication Mag., vol. 35, pp. 64-73, 1997.
[4] F. Beritelli, S. Casale, and A. Cavallaro, “A robust voice activity detector for wireless communications using soft computing,” IEEE J. Select. Areas Commun., vol. 16, pp. 1818-1829, 1998.
[5] J. H. Chang, N. S. Kim, and S. K. Mitra, “Voice activity detection based on multiple statistical models,” IEEE Trans. Signal Processing, vol. 54, no. 6, pp. 1965-1976, 2006.
[6] A. Davis, S. Nordholm, and R. Togneri, “Statistical voice activity detection using low-variance spectrum estimation and an adaptive threshold,” IEEE Trans. Audio, Speech, and Language Processing, vol. 14, no. 2, pp. 412-424, 2006.
[7] J. P. Egan, Signal Detection Theory and ROC Analysis. New York: Academetic, 1975.
[8] J. M. Gorriz, J. Ramírez, C. G. Puntonet, and J. C. Segura, “Generalized LTR-based voice activity detector,” IEEE Signal Processing Lett., vol. 13, no. 10, pp. 636-639, 2006.
[9] J. A. Haigh and J. S. Mason, “Robust voice activity detection using cepstral features,” in IEEE TENCON, pp. 321-324, 1993.
[10] S. Haykin, Communication Systems. New York: Wiley, 1994.
[11] H. G. Hirsch and D. Pearce, “The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions,” in ISCA ITRW ASR2000, 2000.
[12] S. M. Kay, Fundamentals of Statistical Signal Processing. Englewood Cliffs, NJ: Prentice-Hall, 1998.
[13] Q. Li, J. Zheng, A. Tsai, and Q. Zhou, “Robust endpoint detection and energy normalization for real-time speech and speaker recognition,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 3, pp. 146-157, 2002.
[14] D. M. Jones, Noise. New York: Wiley, pp. 61–95. 1983.
[15] M. Markzinzik and B. Kollmeier, “Speech pause detection for noise spectrum estimation by tracking power envelope dynamics,” IEEE Trans. Speech and Audio Processing, vol. 10, no. 2, pp. 109-118, 2002.
[16] E. Nemer, R. Goubran, and S. Mahmoud, “Robust voice activity detection using higher-order statistics in the LPC residual domain,” IEEE Trans. Speech and Audio Processing, vol. 9, no. 3, pp. 217-231, 2001.
[17] M. Petrou and J. Kittler, “Optimal edge detectors for ramp edges,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 13, no. 5, pp. 483-491, 1981.
[18] R. V. Prasad, A. Sangwan, H. S. Jamadagni, and M. C. Chiranth, “Comparison of voice activity detection algorithms for VoIP,” in Proc. IEEE Symposium on Computer and Communications, vol. 5, pp. 530-535, 2002.
[19] R. V. Prasad, R. Muralishhankar, S. Vijay, H. N. Shankar, P. Pawelczak, and I. Miemegeers, “Voice activity detection for VoIP-an information theoretic approach” in Proc. IEEE Global Telecommunications Conf., pp. 1-6, 2006.
[20] L. R. Rabiner and M. R. Sambur, “Voiced-unvoiced-silence detection using the Itakura LPC distance measure,” in Proc, Int. Conf. Acoustics, Speech, Signal Processing, pp. 323-326, 1977.
[21] R. Tuker, “Voice activity detection using a periodicity measure,” in IEE Proceedings-I, vol. 139, no. 4, 1992.
[22] J. Ramírez, J. C. Segura, C. Benítez, Á. de la Torre, and A. Rubio, “Efficient voice activity detection algorithms using long-term speech information,” Speech Communication , vol. 42, no. 3-4, pp. 271-287, 2004.
[23] J. Ramírez, J. C. Segura, C. Benítez, L. Gaucía, and A. Rubio, “Statistical voice activity detection using a multiple observation likelihood ratio test,” IEEE Signal Processing Lett., vol. 12, no. 10, pp. 689-692, 2005.
[24] J. Shon and W. Sung, “A voice activity detector employing soft decision based noise spectrum adaptation,” in Proc. IEEE ICASSP’ 98, vol. 1, pp. 365-368, 1998.
[25] J. S. Sohn, N. S. Kim, and W. Y. Sung, “A statistical model-based voice activity detection,” IEEE Signal Processing Lett., vol. 6, no. 1, pp. 1-3, 1999.
[26] S. G. Tanyer and H. Özer, “Voice activity detection in nonstationary noise,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 4, pp. 478-482, 2000.
[27] P. D. Welch, “The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms,” IEEE Trans. Audio Electroacoustics, vol. 15, no.2, pp. 70-73, 1967.
[28] G. D. Wu and C. T. Lin, “A recurrent neural fuzzy network for word boundary detection invariable noise-level environments,” IEEE Trans. Systems, Man, and Cybernetics - Part B, vol. 31, no. 1, pp.84-97, 2001.
[29] “Digital cellular telecommunications system (Phase 2+); voice activity detector (VAD) for adaptive multi-rate (AMR) speech traffic channels,” ETSI, GSM 06.94 v7.1.1 (ETSI EN 301 708), 1998.
[30]“Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms,” ETSI, v1.1.5 (ETSI ES 202 050), 2002.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2010-07-09起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2010-07-09起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw