進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0607202018325600
論文名稱(中文) 基於CNN混合模型的相似融合度背景圖像分割之方法
論文名稱(英文) A CNN Based Hybrid Model Method for Image Segmentation in Similar Fusion Background Image
校院名稱 成功大學
系所名稱(中) 工程科學系
系所名稱(英) Department of Engineering Science
學年度 108
學期 2
出版年 109
研究生(中文) 戴子期
研究生(英文) Tzu-Chi Tai
學號 N96071130
學位類別 碩士
語文別 中文
論文頁數 55頁
口試委員 指導教授-賴槿峰
口試委員-賴盈勳
口試委員-黃仁竑
口試委員-黃悅民
口試委員-趙涵捷
口試委員-陳世曄
中文關鍵字 相似融合度背景圖像  圖像分割  圖像強化 
英文關鍵字 Similar Fusion Background Image  Image Segmentation  Image Enhancement 
學科別分類
中文摘要 本研究提出了相似融合度背景的圖像分割方法,主要演算法使用兩個卷積神經網路模型做圖像處理。相似融合度背景圖像指的是圖像中物件的顏色與紋路等特徵與背景相似,導致使用卷積層做為特徵提取方法的模型產生錯誤。

相較於傳統直接使用卷積層做特徵提取,本研究先使用PyNET模型分別強化圖像中物件與背景的特徵,並將此強化圖像與原始圖像依照一定重疊權重進行重疊。藉由重疊圖像同時保有原始圖像與強化圖像的特徵,使卷積層易於從物件與背景提取不同的特徵,讓物件不會融合於背景中被忽略掉。為了確保最後的圖像分割會有最好的效果,本研究對圖像做兩次分割,第一次會將大致物件形狀切出,第二次則是細分物體形狀與正確分類該物件的標籤。

於實驗結果中,本研究會以IoU得分做為指標,評估圖像分割模型U-Net、DeepLab、FCN的表現。於本研究的實驗一第一次分割中,U-Net模型有最好的表現,並於實驗二第二次分割中,該分割結果改善了相似融合度背景的圖像之圖像分割。觀察總得分可以發現,在處理相同的圖像時,使用U-Net模型相較於DeepLab模型,平均增加了20\%的IoU得分。
英文摘要 In this study, we propose a convolutional neural network (CNN) based hybrid model method to improve the image segmentation of similar fusion background images, which means the features such as the color and texture of objects in the image are similar to the background. This leads to errors in image segmentation using convolutional layers as feature extraction.

In order to address this problem, we use the PyNET model to enhance the features of the objects and the background in the image, and overlay this enhanced image with the original image according to a certain overlap weight. By overlapping the image while retaining the features of the original image and the enhanced image, the convolution layer can easily extract different features from the object and the background, so that the object will not be ignored in the background.

In the experimental results, this study will use the Intersection over Union (IoU) score as an criteria to evaluate the performance of the image segmentation models. Our result shows that U-Net model got 20\% more than DeepLab got in the average score criterion.
論文目次 摘要 i
英文延伸摘要 ii
誌謝 vi
目錄 vii
表格 ix
圖片 x
縮寫表 xi
符號表 xii
Chapter 1. 簡介 1
1.1. 研究動機 1
1.2. 研究目標 1
1.3. 章節提要 2
Chapter 2. 研究背景與相關文獻 3
2.1. 研究背景 3
2.1.1. 神經網路架構研究 3
2.1.2. 卷積神經網路研究 5
2.1.3. 混合模型研究 8
2.2. 圖像分割研究 8
2.2.1. 資料集與評估指標 9
2.2.2. 圖像分割研究 10
2.3. 相似融合度背景分割 13
2.3.1. 相似融合度背景分割之因與其問題 14
2.3.2. 近年相關研究 15
Chapter 3. 研究方法 16
3.1. 圖像強化 16
3.1.1. PyNET 16
3.1.2. 圖像重疊 19
3.2. 圖像分割 20
3.2.1. 全卷積網路 20
3.2.2. UNet 22
3.2.3. DeepLab 24
3.3. 測試資料集 27
Chapter 4. 研究結果與討論 29
4.1. 實驗設計 29
4.1.1. 實驗環境 29
4.1.2. 實驗流程 30
4.2. 實驗結果 32
4.2.1. 圖像分割比較實驗 32
4.2.2. 分割結果辨識實驗 44
4.3. 結果討論 47
Chapter 5. 結論與未來展望 48
5.1. 研究結論 48
5.2. 未來展望 49
References 50
參考文獻 [1] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
[2] S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[3] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
[4] P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graphbased image segmentation,” International journal of computer vision, vol. 59, no. 2, pp. 167–181, 2004.
[5] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
[6] L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[7] L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
[8] L.C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoderdecoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
[9] F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
[10] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
[11] D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, pp. 305–313, 1989.
[12] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
[14] K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
[15] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
[16] S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
[17] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
[18] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in Thirtyfirst AAAI conference on artificial intelligence, 2017.
[19] F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
[20] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
[21] K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
[22] S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
[23] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710, 2018.
[24] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
[25] S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in neural information processing systems, pp. 3856–3866, 2017.
[26] G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018.
[27] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
[28] R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
[29] S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
[30] K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
[31] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
[32] J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
[33] A. Bochkovskiy, C.Y. Wang, and H.Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
[34] O. Alsing, “Mobile object detection using tensorflow lite and transfer learning,” 2018.
[35] C.F. Lai, W.C. Chien, L. T. Yang, and W. Qiang, “Lstm and edge computing for big data feature recognition of industrial electrical equipment,” IEEE Transactions on Industrial Informatics, vol. 15, no. 4, pp. 2469–2477, 2019.
[36] X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars, “Guiding the longshort term memory model for image caption generation,” in Proceedings of the IEEE international conference on computer vision, pp. 2407–2415, 2015.
[37] A. Graves, N. Jaitly, and A.r. Mohamed, “Hybrid speech recognition with deep bidirectional lstm,” in 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278, IEEE, 2013.
[38] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
[39] P. Mishra, K. Khurana, S. Gupta, and M. K. Sharma, “Vmanalyzer: Malware semantic analysis using integrated cnn and bidirectional lstm for detecting vmlevel attacks in cloud,” in 2019 Twelfth International Conference on Contemporary Computing (IC3), pp. 1–6, IEEE, 2019.
[40] S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using lstmcnn based deep learning,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 857–875, 2019.
[41] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
[42] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
[43] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
[44] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015.
[45] M.T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
[46] Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489, 2016.
[47] Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, and G. Hu, “Attentionoverattention neural networks for reading comprehension,” arXiv preprint arXiv:1607.04423, 2016.
[48] J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1243–1252, JMLR. org, 2017.
[49] A. Oliva and A. Torralba, “Building the gist of a scene: The role of global image features in recognition,” Progress in brain research, vol. 155, pp. 23–36, 2006.
[50] C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba, “Hoggles: Visualizing object detection features,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1–8, 2013.
[51] D. G. Lowe, “Object recognition from local scaleinvariant features,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1150–1157, Ieee, 1999.
[52] T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971–987, 2002.
[53] Y. Fan, X. Lu, D. Li, and Y. Liu, “Videobased emotion recognition using cnnrnn and c3d hybrid networks,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450, 2016.
[54] Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, and W. Geng, “A novel attentionbased hybrid cnnrnn architecture for semgbased gesture recognition,” PloS one, vol. 13, no. 10, 2018.
[55] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
[56] G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? endtoend speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5200–5204, IEEE, 2016.
[57] Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolutiongru based deep neural network,” in European semantic web conference, pp. 745–760, Springer, 2018.
[58] Y. Lyu and X. Huang, “Road segmentation using cnn with gru,” arXiv preprint arXiv:1804.05164, 2018.
[59] A. Nanduri and L. Sherry, “Anomaly detection in aircraft data using recurrent neural networks (rnn),” in 2016 Integrated Communications Navigation and Surveillance (ICNS), pp. 5C2–1, Ieee, 2016.
[60] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
[61] W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.
[62] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, pp. 1520–1528, 2015.
[63] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computerassisted intervention, pp. 234–241, Springer, 2015.
[64] T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
[65] H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890, 2017.
[66] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
[67] H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151–7160, 2018.
[68] R. Archibald, K. Chen, A. Gelb, and R. Renaut, “Improving tissue segmentation of human brain mri through preprocessing by the gegenbauer reconstruction method,” NeuroImage, vol. 20, no. 1, pp. 489–502, 2003.
[69] D. Gottlieb, C.W. Shu, A. Solomonoff, and H. Vandeven, “On the gibbs phenomenon i: recovering exponential accuracy from the fourier partial sum of a nonperiodic analytic function,” Journal of Computational and Applied Mathematics, vol. 43, no. 12, pp. 81–98, 1992.
[70] Y. Zhang and C. Zhang, “A new algorithm for character segmentation of license plate,” in IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), pp. 106–109, IEEE, 2003.
[71] M. H. Jafari, N. Karimi, E. NasrEsfahani, S. Samavi, S. M. R. Soroushmehr, K. Ward, and K. Najarian, “Skin lesion segmentation in clinical images using deep learning,” in 2016 23rd International conference on pattern recognition (ICPR), pp. 337–342, IEEE, 2016.
[72] K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2012.
[73] A. Ignatov, L. Van Gool, and R. Timofte, “Replacing mobile camera isp with a single deep learning model,” arXiv preprint arXiv:2002.05509, 2020.
[74] D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
[75] A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,” in Proceedings of the 18th ACM international conference on Multimedia, pp. 1469–1472, 2010.
[76] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
[77] P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros, “Imagetoimage translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
[78] J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in European conference on computer vision, pp. 694–711, Springer, 2016.
[79] A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with convolutional neural networks,” in Advances in neural information processing systems, pp. 766–774, 2014. [80] A. Adams, J. Baek, and M. A. Davis, “Fast highdimensional filtering using the permutohedral lattice,” in Computer Graphics Forum, vol. 29, pp. 753–762, Wiley Online Library, 2010.
[81] T. Carneiro, R. V. M. Da Nóbrega, T. Nepomuceno, G.B. Bian, V. H. C. De Albuquerque, and P. P. Reboucas Filho, “Performance analysis of google colaboratory as a tool for accelerating deep learning applications,” IEEE Access, vol. 6, pp. 61677–61685, 2018.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-07-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-07-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw