進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2708201919002800
論文名稱(中文) 音源定位及分離之陪伴機器人
論文名稱(英文) Sound Source Localization and Separation for A Companion Robot
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 107
學期 2
出版年 108
研究生(中文) 顏姿宜
研究生(英文) Zih-Yi Yen
學號 N96064434
學位類別 碩士
語文別 中文
論文頁數 57頁
口試委員 口試委員-侯廷偉
口試委員-王宗一
口試委員-陳澤生
口試委員-王榮泰
口試委員-郭乃文
指導教授-周榮華
中文關鍵字 陪伴機器人  音源定位  音訊分離  麥克風陣列 
英文關鍵字 companion robot  sound source localization  sound source separation  microphone array 
學科別分類
中文摘要 聲音來源定位及分離所得到的資訊,可以幫助機器人提供更多的功能。以陪伴機器人(Companion Robot)和社交機器人(Social Robot)為例,前者在服務長者或年幼的使用者時,可以得知周遭用戶各別的指令,執行相對應的功能;後者則可避免機器人在社交場合上出現雞同鴨講的現象。
本論文即是針對陪伴機器人的需求研究,使用麥克風陣列收音,先以MUSIC (MUltiple SIgnal Classification)演算法進行多音源的定位,接著由此結果找出適當的音源資訊,再以GCC-NMF (Generalized Cross Correlation – Non-Negative Matrix Factorization)演算法,完成音訊分離,最終目的在於能呈現出各個音源的方位和音頻資訊,以利後續分析。
實驗環境為背景噪音45~55 dB的室內空間,以手機、藍牙喇叭播放聲音和人聲做為測試聲音,音量控制在65~75 dB(此為一般說話的聲音大小)。由於陪伴機器人是假定在一個小家庭內使用,且選用的麥克風陣列為四個麥克風一組,因此本論文的重點以不大於三個的音源為主,進行定位及分離。
根據實驗結果,音源定位可以量測到距離1.5公尺以內的音源,且角度誤差大約在正負3度之間,電腦的計算時間約0.45秒,相較於傳統的Beamforming來說,時間已大幅縮短。而在音訊分離前加入挑選麥克風聲道的步驟,不論是從圖表分析還是實際聆聽音檔,都可以看到分離的效果明顯提升。
英文摘要 Recently most companion robots are designed to interact with people through vision and sound. In this thesis, the author added a sound source recognition system to an existing facial expression recognition robot by using a microphone array. The sound source recognition system consists of two parts, namely sound source localization and sound source separation. The former is achieved by using MUSIC (MUltiple SIgnal Classification) algorithm to estimate the angle of sound source; whereas the latter is by GCC-NMF (Generalized Cross Correlation – Non-Negative Matrix Factorization) algorithm to separate different sound sources. In order to improve the separation accuracy after localization, the author selected appropriate microphone channels via the sound directionality before separation to enhance the separation results.
Since the companion robot aims to serve in small families, the main goal of this study is to treat 2 to 3 sound signals with background noise levels typically in the range of about 45 to 55 dB. The results show that the MUSIC algorithm can estimate the target source accurately, and need less computation time than conventional method, for example, beamforming. As for separation, whether it’s directly listening to audio files or conducting a spectrogram analysis, it all had a significant effect on the results.
論文目次 摘要 I
Extended Abstract II
致謝 X
目錄 XI
表目錄 XIII
圖目錄 XIV
第一章、緒論 1
1-1 研究動機與背景 1
1-2 研究目的 1
1-3 研究貢獻 1
1-4 文獻回顧 2
1-4-1 陪伴機器人文獻回顧 2
1-4-2 機器人聽覺系統文獻回顧 4
1-4-3 音源定位文獻回顧 5
1-4-4 音訊分離文獻回顧 10
1-5 論文架構 14
第二章、背景技術介紹 15
2-1 多訊號分類演算法(MUSIC) 15
2-2 GCC-NMF演算法 19
2-2-1 NMF介紹 19
2-2-2 GCC介紹 21
2-2-3 結合GCC與NMF 22
第三章、系統架構與軟硬體介紹 24
3-1 整體系統架構 24
3-2 系統硬體介紹 26
3-2-1 麥克風陣列 26
3-2-2 HMI觸控液晶顯示模組 27
3-2-3 網路攝影機、揚聲器 28
3-3 機器人機構設計 29
3-3-1 機器人外觀 29
3-3-2 機器人內部設計 32
3-4 軟體規格 35
第四章、實驗方法與結果討論 36
4-1 實驗方法 36
4-1-1 不同演算法音源定位實驗 37
4-1-2 音源定位個數實驗 39
4-1-3 音源定位計算時間實驗 40
4-1-4 音源定位距離實驗 41
4-1-5 麥克風聲道選擇實驗 44
4-1-6 最終優化 47
4-2 結果討論 49
第五章、結論與建議 52
5-1結論 52
5-2建議 53
參考文獻 54

參考文獻 [1] The robot invasion arrived at CES 2019 — and it was cuter than we expected. https://www.digitaltrends.com/home/cutest-companion-robots-ces-2019/. 2019.06查詢.
[2] The robots of CES 2019.https://www.androidauthority.com/new-robots-942422/. 2019.06查詢.
[3] Zoetic AI Kiki. https://www.zoeticai.com/. 2019.06查詢.
[4] GROOVE X LOVOT. https://lovot.life/en/. 2019.06查詢.
[5] Stanley Black & Decker Pria. https://www.okpria.com/. 2019.06查詢.
[6] Okuno, H. G., & Nakadai, K. (2015, April). Robot audition: Its rise and perspectives. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5610-5614, IEEE.
[7] EARS. https://robot-ears.eu/. 2019.06查詢.
[8] Nakadai, K., Okuno, H. G., Takahashi, T., Nakamura, K., Mizumoto, T., Yoshida, T., ... & Ince, G. (2011, September). Introduction to open source robot audition software hark. In The 29th annual conference of the robotics society of Japan. Robotics Society of Japan.
[9] Elkachouchi, H., & Elsalam Mofeed, M. A. (2005, March). Direction-of-arrival methods (DOA) and time difference of arrival (TDOA) position location technique. In Proceedings of the Twenty-Second National Radio Science Conference, 2005. NRSC 2005, pp. 173-182, IEEE.
[10] Tuma, J., Janecka, P., Vala, M., & Richter, L. (2012, May). Sound source localization. In Proceedings of the 13th International Carpathian Control Conference (ICCC), pp. 740-743, IEEE.
[11] Kim, S., On, B., Im, S., & Kim, S. (2017, February). Performance comparison of FFT-based and GCC-PHAT time delay estimation schemes for target azimuth angle estimation in a passive SONAR array. In 2017 IEEE Underwater Technology (UT), pp. 1-4, IEEE.
[12] Yue, X., Qu, G., Liu, B., & Liu, A. (2018, September). Detection sound source direction in 3D space using convolutional neural networks. In 2018 First International Conference on Artificial Intelligence for Industries (AI4I), pp. 81-84, IEEE.
[13] 劉子維.(2019). 使用麥克風陣列設計與實作聲音來源定位機制. 國立成功大學工程科學系學位論文.
[14] Baig, N. A., & Malik, M. B. (2013). Comparison of direction of arrival (DOA) estimation techniques for closely spaced targets. International Journal of Future Computer and Communication, 2(6), 654.
[15] Lavate, T. B., Kokate, V. K., & Sapkal, A. M. (2010, April). Performance analysis of MUSIC and ESPRIT DOA estimation algorithms for adaptive array smart antenna in mobile communication. In 2010 Second International Conference on Computer and Network Technology, pp. 308-311, IEEE.
[16] Boll, S. (1979). Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustics, Speech, and Signal Processing, 27(2), pp. 113-120.
[17] Ephraim, Y., & Malah, D. (1984). Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech, and Signal Processing, 32(6), pp. 1109-1121.
[18] Vuvuzela sound denoising algorithm. https://www.mathworks.com/matlabcentral/fileexchange/27912-vuvuzela-sound-denoising-algorithm. 2019.06查詢.
[19] Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014, October). Singing-voice separation from monaural recordings using deep recurrent neural networks. In ISMIR, pp. 477-482.
[20] Aytar, Y., Vondrick, C., & Torralba, A. (2016). Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems, pp. 892-900, NIPS 2016.
[21] SoundNet. https://www.youtube.com/watch?v=yJCjVvIY4dU. 2019.06查詢.
[22] Schmidt, R. (1986). Multiple emitter location and signal parameter estimation. IEEE Transactions on Antennas and Propagation, 34(3), pp. 276-280.
[23] Tang, H. (2014). DOA estimation based on MUSIC algorithm. [online] Available: https://pdfs.semanticscholar.org/5ff7/806b44e60d41c21429e1ad2755d72bba41d7.pdf. 2019.06查詢.
[24] Wood, S. U., Rouat, J., Dupont, S., Pironkov, G., Wood, S. U., Rouat, J., ... & Pironkov, G. (2017). Blind speech separation and enhancement with GCC-NMF. IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 25(4), pp. 745-755.
[25] SiSEC數據集.[Online]. Available: https://sisec.inria.fr/. 2019.06查詢.
[26] Emiya, V., Vincent, E., Harlander, N., & Hohmann, V. (2011). Subjective and objective quality assessment of audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 19(7), pp. 2046-2057.
[27] Vincent, E., Gribonval, R., & Févotte, C. (2006). Performance measurement in blind audio source separation. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), pp. 1462-1469.
[28] Blandin, C., Ozerov, A., & Vincent, E. (2012). Multi-source TDOA estimation in reverberant audio using angular spectra and clustering. Signal Processing, 92(8), pp. 1950-1960.
[29] Duong, N. Q., Vincent, E., & Gribonval, R. (2010). Under-determined reverberant audio source separation using a full-rank spatial covariance model. IEEE Transactions on Audio, Speech, and Language Processing, 18(7), pp. 1830-1840.
[30] 孫佾微. (2018). 由表情辨識情緒之陪伴機器人. 國立成功大學工程科學系學位論文.
[31] Seeed Studio - Seeed Wiki - ReSpeaker Mic Array v2.0. http://wiki.seeedstudio.com/ReSpeaker_Mic_Array_v2.0/ . 2019.06查詢.
[32] 淘晶馳 - HMI觸控液晶顯示模組. http://www.tjc1688.com/Product/Txilie/. 2019.06查詢.
[33] 羅技C525網路攝影機.
https://www.logitech.com/zh-tw/product/hd-webcam-c525#specification-tabular. 2019.06查詢.
[34] 小米隨身藍牙喇叭.https://www.mi.com/tw/littleaudio/. 2019.06查詢.
[35] Mouser Electronics - 麥克風MP34DT01TR-M.https://reurl.cc/rqWeZ. 2019.06查詢.
[36] Scheibler, R., Bezzam, E., & Dokmanić, I. (2018, April). Pyroomacoustics: A python package for audio room simulation and array processing algorithms. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 351-355, IEEE.
[37] DiBiase, J. H. (2000). A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays. PhD thesis, Engineering, Brown University, Providence RI, USA.
[38] Yoon, Y. S., Kaplan, L. M., & McClellan, J. H. (2006). TOPS: New DOA estimator for wideband signals. IEEE Transactions on Signal processing, 54(6), pp. 1977-1989.
[39] GitHub - Sean Wood - GCC-NMF. https://github.com/seanwood/gcc-nmf. 2019.06查詢.
[40] 陳旻甄. (2018). 高齡者陪伴機器人. 國立成功大學工程科學系學位論文.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2021-08-31起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2021-08-31起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw