進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0508201914572200
論文名稱(中文) 混合音頻與歌詞之歌曲自動標籤方法
論文名稱(英文) A Method of Music Auto-tagging Based on Audio and Lyric
校院名稱 成功大學
系所名稱(中) 資訊管理研究所
系所名稱(英) Institute of Information Management
學年度 107
學期 2
出版年 108
研究生(中文) 徐陞瑋
研究生(英文) Sheng-Wei Syu
學號 R76064027
學位類別 碩士
語文別 中文
論文頁數 94頁
口試委員 指導教授-王惠嘉
口試委員-高宏宇
口試委員-劉任修
口試委員-李偉柏
中文關鍵字 音樂自動標籤  深度學習  多目標學習  多標籤分類 
英文關鍵字 Music Auto-tagging  Deep Learning  Multi-task Learning  Multi-tag Classification 
學科別分類
中文摘要 隨著網路與科技的進步,線上音樂平台與串流音樂蓬勃發展,大量的數位音樂使得使用者面臨資訊過載的問題。為了解決這個問題,這些平台需要利用使用者資訊與輔助資料來建構完善的推薦系統,協助使用者檢索、查詢或發現新的音樂,目前最常用來查詢的方法是使用關鍵字查詢。
關鍵字的查詢中,社交標籤(social tag)被認為能夠幫助推薦系統進行更完善的推薦,然而社交標籤卻面臨標籤稀疏性以及冷啟動(cold start)的問題,使得其幫助推薦系統的成效受限。為了解決這些問題,需要透過自動標籤(auto tagging)系統來補足標籤的不足,達到協助推薦系統的功能。過往的自動標籤的研究中,大多僅使用音頻來進行分析,然而已有許多研究證明了歌詞能夠幫助音樂分類系統取得更多資訊並且提升分類正確率。因此本研究將歌詞納入分類系統中與音頻共同進行特徵擷取,提出一個混合音頻與歌詞的音樂自動標籤系統。
近年來,由於類神經網路的發展,已有不少學者使用類神經網路來進行音頻以及文字特徵的擷取,並也證實其成效。其中,針對歌詞特徵擷取的部分,有不少研究指出考量歌詞的架構能更有效的提取歌詞特徵來完成分類任務。本研究將使用類神經網路的架構來進行音樂的特徵擷取以及自動標籤,針對歌詞特徵擷取,本研究將混合卷積神經網路(convolutional neural network)及循環神經網路(recurrent neural network)的架構進行特徵擷取,以達到擷取歌詞架構特徵的目的。
此外已有研究證實,使用多目標學習的方法能夠藉由學習標籤之間的關聯性達到提升分類表現的目的。本研究將多目標學習的方法應用於歌曲自動標籤之中來進行標籤分類。
經過本研究實驗證實,本研究透過混合音頻與歌詞來進行歌曲自動標籤並且以多目標學習的標籤分類器完成分類任務的方法,比起先前研究中只使用音頻的單目標學習方法有更好的分類表現。
英文摘要 With the development of the Internet and technology, online music platforms and music streaming services are booming, the large number of digital music makes users face the problem of information overloading. In order to solve this problem, these platforms need to construct a comprehensive recommendation system by using user information and meta data to help users in searching, querying or discovering new music.
Social tags are considered to help the music recommendation system to make better recommendations. However, social tags face the problem of tag sparsity and cold start, limiting their effectiveness in helping the recommendation system. To solve these problems, it is necessary to supplement the shortage of the tags through a music auto-tagging system. In the past, most of the research on auto-tagging used only audio for analysis. However, many studies have proved that the lyrics can help the music classification system to obtain more information and improve the classification accuracy.
This study proposed a method of music auto-tagging, which use both audio and lyric for analysis. Besides, we also experimented the different architecture of tag classification, the result shows that the structure using late fusion model and multi-task classification method has the best performance.
論文目次 第1章 緒論 1
1.1 研究背景與動機 1
1.2 研究目的 6
1.3 研究範圍與限制 7
1.4 研究流程 7
1.5 論文大綱 8
第2章 文獻探討 10
2.1 音樂資訊檢索(Music Information Retrieval, MIR) 10
2.1.1 音樂自動標籤(music auto tagging) 11
2.2 音樂特徵擷取 12
2.2.1 音訊特徵擷取 12
2.2.2 歌詞特徵擷取 15
2.3 詞嵌入(Word Embedding) 18
2.3.1 Word2Vec 19
2.3.2 GloVe 20
2.4 類神經網路(Neural Networks) 21
2.4.1 卷積神經網路(Convolutional Neural Networks, CNN) 22
2.4.2 循環神經網路(Recurrent Neural Networks, RNN) 23
2.4.2.1 Encoder-Decoder Network 27
2.4.2.2 Attention 機制 29
2.4.2.3 雙向循環神經網路(bidirectional recurrent neural network, BRNN) 31
2.5 多標籤學習(Multi-Label Learning, MLL) 32
2.5.1 多標籤學習架構 32
2.5.2 多目標學習 34
2.5.3 強標籤與弱標籤(strong label and weak label) 34
2.6 小結 36
第3章 研究方法 37
3.1 研究架構 37
3.2 資料蒐集模組 38
3.3 音頻前處理模組 40
3.4 歌詞前處理模組 42
3.4.1 內文前處理 42
3.4.2 詞嵌入模型 44
3.4.3 歌詞矩陣化 46
3.5 標籤前處理模組 48
3.6 類神經網路 50
3.6.1 音頻特徵擷取 50
3.6.2 歌詞特徵擷取 52
3.6.2.1 句子特徵擷取 53
3.6.2.2 句子特徵整合 56
3.7 標籤生成 58
3.7.1 模型合併 58
3.7.2 標籤分類器 59
3.8 小結 62
第4章 系統建置與驗證 63
4.1 系統建置 63
4.2 實驗方法 63
4.2.1 資料來源 64
4.2.2 評估指標 65
4.3 參數設定 66
4.3.1 參數一:詞嵌入維度 66
4.3.2 參數二:最長歌詞句子長度 66
4.3.3 參數三:最大歌詞總句數 67
4.3.4 參數四:類神經訓練參數 68
4.4 實驗結果與分析 68
4.4.1 實驗一:合併標籤與未合併標籤 69
4.4.2 實驗二:詞嵌入訓練方法 72
4.4.3 實驗三:文字特徵擷取方法 74
4.4.4 實驗四:混合模型與各別模型 77
4.4.5 實驗五:多目標學習與單目標學習 81
4.4.6 實驗六:早期合併與晚期合併 83
4.5 小結 85
第5章 結論 86
5.1 研究成果 86
5.2 未來研究方向 88
參考資料 90
參考文獻 Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. arXiv preprint arXiv:1409.0473.
Bahuleyan, H. (2018). Music Genre Classification Using Machine Learning Techniques. arXiv preprint arXiv:1804.01149.
Bertin-Mahieux, T., Ellis, D. P., Whitman, B., & Lamere, P. (2011). The Million Song Dataset. Paper presented at the International Society for Music Information Retrieval Conference, Miami,Florida (USA)
Casey, M. A., Veltkamp, R., Goto, M., Leman, M., Rhodes, C., & Slaney, M. (2008). Content-Based Music Information Retrieval: Current Directions and Future Challenges. IEEE, 96(4), 668-696.
Chen, Z., Zhan, Z., Shi, W., Chen, W., & Zhang, J. (2016). When Neural Network Computation Meets Evolutionary Computation: A Survey. Paper presented at the International Symposium on Neural Networks, St. Petersburg, Russia.
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning Phrase Representations Using RNN Encoder-Decoder for Statistical Machine Translation. arXiv preprint arXiv:1406.1078.
Choi, K. (2018). Deep Neural Networks for Music Tagging. Queen Mary University of London.
Choi, K., Fazekas, G., & Sandler, M. (2016). Automatic Tagging Using Deep Convolutional Neural Networks. arXiv preprint arXiv:1606.00298.
Choi, K., Fazekas, G., Sandler, M., & Cho, K. (2017). Convolutional Recurrent Neural Networks for Music Classification. Paper presented at the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
Choi, K., Lee, J. H., Hu, X., & Downie, J. S. (2016). Music Subject Classification Based on Lyrics and User Interpretations. Paper presented at the ASIS&T Annual Meeting, Copenhagen, Denmark.
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv preprint arXiv:1412.3555.
Datta, A. K., Solanki, S. S., Sengupta, R., Chakraborty, S., Mahto, K., & Patranabis, A. (2017). Signal Analysis of Hindustani Classical Music: Springer Singapore.
Delbouys, R., Hennequin, R., Piccoli, F., Royo-Letelier, J., & Moussallam, M. (2018). Music Mood Detection Based On Audio And Lyrics With Deep Neural Net. arXiv preprint arXiv:1809.07276.
Dieleman, S., & Schrauwen, B. (2014). End-to-End Learning for Music Audio. Paper presented at the IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy.
Downie, J. S. (2003). Music information retrieval. Annual review of information science and technology, 37(1), 295-340.
Duan, S. F., Zhang, J. L., Roe, P., & Towsey, M. (2014). A Survey of Tagging Techniques for Music, Speech and Environmental Sound. Artificial Intelligence Review, 42(4), 637-661.
Elman, J. L. (1990). Finding Structure in Time. Cognitive science, 14(2), 179-211.
Fang, J., Grunberg, D., Litman, D. T., & Wang, Y. (2017). Discourse Analysis of Lyric and Lyric-Based Classification of Music. Paper presented at the International Society for Music Information Retrieval Conference, Suzhou, China.
Fell, M., & Sporleder, C. (2014). Lyrics-Based Analysis and Classification of Music. Paper presented at the International Conference on Computational Linguistics, Dublin, Ireland.
Gossi, D., & Gunes, M. H. (2016). Lyric-Based Music Recommendation. In H. Cherifi, B. Gonçalves, R. Menezes, & R. Sinatra (Eds.), Complex Networks VII: Proceedings of the 7th Workshop on Complex Networks CompleNet 2016 (pp. 301-310). Cham: Springer International Publishing.
Hassan, A., & Mahmood, A. (2018). Convolutional Recurrent Deep Learning Model for Sentence Classification. IEEE Access, 6, 13949-13957.
Horsburgh, B., Craw, S., & Massie, S. (2015). Learning Pseudo-Tags to Augment Sparse Tagging in Hybrid Music Recommender Systems. Artificial Intelligence Review, 219(C), 25-39.
Hu, X., Choi, K., & Downie, J. S. (2017). A Framework for Evaluating Multimodal Music Mood Classification. Journal of the Association for Information Science and Technology, 68(2), 273-285.
Huang, Y., Wang, W., & Wang, L. (2015). Unconstrained Multimodal Multi-Label Learning. Ieee Transactions on Multimedia, 17(11), 1923-1935.
Huang, Y., Wang, W., Wang, L., & Tan, T. (2013). Multi-Task Deep Neural Network for Multi-Label Learning. Paper presented at the IEEE International Conference on Image Processing, Melbourne, Australia.
Hyung, Z., Park, J.-S., & Lee, K. (2017). Utilizing Context-Relevant Keywords Extracted from a Large Collection of User-Generated Documents for Music Discovery. Information Processing & Management, 53(5), 1185-1200.
Kaminskas, M., Ricci, F., & Schedl, M. (2013). Location-Aware Music Recommendation Using Auto-tagging and Hybrid Matching. Paper presented at the 7th ACM conference on Recommender systems, Hong Kong, China.
Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. arXiv preprint arXiv:1408.5882.
Knees, P., & Schedl, M. (2013). A Survey of Music Similarity and Recommendation from Music Context Data. Acm Transactions on Multimedia Computing Communications and Applications, 10(1), 21.
Labrosa. (2011a). Last.Fm Dataset. Retrieved from: http://labrosa.ee.columbia.edu/millionsong/lastfm
Labrosa. (2011b). musiXmatch dataset. Retrieved from: http://labrosa.ee.columbia.edu/millionsong/musixmatch
Lai, S., Xu, L., Liu, K., & Zhao, J. (2015). Recurrent Convolutional Neural Networks for Text Classification. Paper presented at the Association for the Advancement of Artificial Intelligence, Austin Texas, USA.
Lamere, P. (2008). Social Tagging and Music Information Retrieval. Journal of New Music Research, 37(2), 101-114.
Lauren, P., Qu, G., Yang, J., Watta, P., Huang, G.-B., & Lendasse, A. (2018). Generating Word Embeddings from an Extreme Learning Machine for Sentiment Analysis and Sequence Labeling Tasks. Cognitive Computation, 10(4), 625-638. doi:10.1007/s12559-018-9548-y
Lee, J., & Nam, J. (2017). Multi-Level and Multi-Scale Feature Aggregation Using Pretrained Convolutional Neural Networks for Music Auto-Tagging. IEEE signal processing letters, 24(8), 1208-1212.
Levy, O., Goldberg, Y., & Dagan, I. (2015). Improving Distributional Similarity with Lessons Learned from Word Embeddings. Transactions of the Association for Computational Linguistics, 3, 211-225.
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. arXiv preprint arXiv:1506.00019.
Liu, K., Li, Y., Xu, N., & Natarajan, P. (2018). Learn to Combine Modalities in Multimodal Deep Learning. arXiv preprint arXiv:1805.11730.
Liu, W., Wang, Z., Liu, X., Zeng, N., Liu, Y., & Alsaadi, F. E. (2017). A Survey of Deep Neural Network Architectures and Their Applications. Neurocomputing, 234, 11-26.
Malheiro, R., Panda, R., Gomes, P., & Paiva, R. P. (2018). Emotionally-Relevant Features for Classification and Regression of Music Lyrics. IEEE Transactions on Affective Computing(2), 240-254.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
Murthy, Y. V. S., & Koolagudi, S. G. (2018). Content-Based Music Information Retrieval (CB-MIR) and Its Applications toward the Music Industry: A Review. ACM Computing Surveys, 51(3), 1-46.
Nematzadeh, A., Meylan, S. C., & Griffiths, T. L. (2017). Evaluating Vector-Space Models of Word Representation, or, the Unreasonable Effectiveness of Counting Words Near Other Words. Paper presented at the Cognitive Science Society, London, UK.
Oğul, H., & Kırmacı, B. (2016). Lyrics Mining for Music Meta-Data Estimation. Paper presented at the International Conference on Artificial Intelligence Applications and Innovations, Thessaloniki, Greece.
Panwar, S., Das, A., Roopaei, M., & Rad, P. (2017). A Deep Learning Approach for Mapping Music Genres. Paper presented at the System of Systems Engineering Conference (SoSE), Wakoloa, Hawaii, USA.
Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. Paper presented at the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar.
PwC. (2017). Perspectives from the Global Entertainment and Media Outlook 2017–2021. Retrieved from https://www.pwc.com/gx/en/entertainment-media/pdf/outlook-2017-curtain-up.pdf
ŘEHŮŘEK, R. (2014). Making sense of word2vec. Retrieved from https://rare-technologies.com/making-sense-of-word2vec/
Scaringella, N., Zoia, G., & Mlynek, D. (2006). Automatic Genre Classification of Music Content: A Survey. IEEE Signal Processing Magazine, 23(2), 133-141.
Schedl, M., Gómez, E., & Urbano, J. (2014). Music Information Retrieval: Recent Developments and Applications. Foundations and Trends® in Information Retrieval, 8(2-3), 127-261.
Schuster, M., & Paliwal, K. K. (1997). Bidirectional Recurrent Neural Networks. IEEE Transactions on Signal Processing, 45(11), 2673-2681.
Song, G., Wang, Z., Han, F., Ding, S., & Iqbal, M. A. (2018). Music Auto-Tagging Using Deep Recurrent Neural Networks. Neurocomputing, 292, 104-110.
Tarwani, K. M., & Edem, S. (2017). Survey on Recurrent Neural Network in Natural Language Processing. International Journal of Engineering Trends and Technology, 48(6), 301-304.
Tsaptsinos, A. (2017). Lyrics-Based Music Genre Classification Using a Hierarchical Attention network. arXiv preprint arXiv:1707.04678.
Van Den Oord, A., Dieleman, S., & Schrauwen, B. (2014). Transfer Learning by Supervised Pre-Training for Audio-Based Music Classification. Paper presented at the the International Society for Music Information Retrieval Taipei, Taiwan.
Wang, S. Y., Wang, Y. C., Yang, Y. H., & Wang, H. M. (2014). Towards Time-Varying Music Auto-Tagging Based on CAL500 Expansion. Paper presented at the 2014 IEEE International Conference on Multimedia and Expo (ICME).
Yang, Y. H., & Liu, J. Y. (2013). Quantitative Study of Music Listening Behavior in a Social and Affective Context. Ieee Transactions on Multimedia, 15(6), 1304-1315.
Yang, Z., Yang, D., Dyer, C., He, X., Smola, A., & Hovy, E. (2016). Hierarchical Attention Networks for Document Classification. Paper presented at the Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
Zhang, M., & Zhou, Z. (2014). A Review on Multi-Label Learning Algorithms. IEEE transactions on knowledge and data engineering, 26(8), 1819-1837.
Zhang, Y., & Yang, Q. (2017). A Survey on Multi-Task Learning. arXiv preprint arXiv:1707.08114.
Zhuang, N., Yan, Y., Chen, S., Wang, H., & Shen, C. (2018). Multi-Label Learning Based Deep Transfer Neural Network for Facial Attribute Classification. Pattern Recognition, 80, 225-240.
Zuo, Y., Zeng, J., Gong, M., & Jiao, L. (2016). Tag-Aware Recommender Systems Based on Deep Neural Networks. Neurocomputing, 204, 51-60.

論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-05-24起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-05-24起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw