||A CNN Based Hybrid Model Method for Image Segmentation in Similar Fusion Background Image
||Department of Engineering Science
Similar Fusion Background Image
In this study, we propose a convolutional neural network (CNN) based hybrid model method to improve the image segmentation of similar fusion background images, which means the features such as the color and texture of objects in the image are similar to the background. This leads to errors in image segmentation using convolutional layers as feature extraction.
In order to address this problem, we use the PyNET model to enhance the features of the objects and the background in the image, and overlay this enhanced image with the original image according to a certain overlap weight. By overlapping the image while retaining the features of the original image and the enhanced image, the convolution layer can easily extract different features from the object and the background, so that the object will not be ignored in the background.
In the experimental results, this study will use the Intersection over Union (IoU) score as an criteria to evaluate the performance of the image segmentation models. Our result shows that U-Net model got 20\% more than DeepLab got in the average score criterion.
Chapter 1. 簡介 1
1.1. 研究動機 1
1.2. 研究目標 1
1.3. 章節提要 2
Chapter 2. 研究背景與相關文獻 3
2.1. 研究背景 3
2.1.1. 神經網路架構研究 3
2.1.2. 卷積神經網路研究 5
2.1.3. 混合模型研究 8
2.2. 圖像分割研究 8
2.2.1. 資料集與評估指標 9
2.2.2. 圖像分割研究 10
2.3. 相似融合度背景分割 13
2.3.1. 相似融合度背景分割之因與其問題 14
2.3.2. 近年相關研究 15
Chapter 3. 研究方法 16
3.1. 圖像強化 16
3.1.1. PyNET 16
3.1.2. 圖像重疊 19
3.2. 圖像分割 20
3.2.1. 全卷積網路 20
3.2.2. UNet 22
3.2.3. DeepLab 24
3.3. 測試資料集 27
Chapter 4. 研究結果與討論 29
4.1. 實驗設計 29
4.1.1. 實驗環境 29
4.1.2. 實驗流程 30
4.2. 實驗結果 32
4.2.1. 圖像分割比較實驗 32
4.2.2. 分割結果辨識實驗 44
4.3. 結果討論 47
Chapter 5. 結論與未來展望 48
5.1. 研究結論 48
5.2. 未來展望 49
 Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradientbased learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
 S. Hochreiter and J. Schmidhuber, “Long shortterm memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
 I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, pp. 2672–2680, 2014.
 P. F. Felzenszwalb and D. P. Huttenlocher, “Efficient graphbased image segmentation,” International journal of computer vision, vol. 59, no. 2, pp. 167–181, 2004.
 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Semantic image segmentation with deep convolutional nets and fully connected crfs,” arXiv preprint arXiv:1412.7062, 2014.
 L.C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
 L.C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,” arXiv preprint arXiv:1706.05587, 2017.
 L.C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoderdecoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
 F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychological review, vol. 65, no. 6, p. 386, 1958.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by backpropagating errors,” nature, vol. 323, no. 6088, pp. 533–536, 1986.
 D. A. Pomerleau, “Alvinn: An autonomous land vehicle in a neural network,” in Advances in neural information processing systems, pp. 305–313, 1989.
 Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural computation, vol. 1, no. 4, pp. 541–551, 1989.
 A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
 K. Simonyan and A. Zisserman, “Very deep convolutional networks for largescale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
 C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
 S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” arXiv preprint arXiv:1502.03167, 2015.
 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2818–2826, 2016.
 C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, “Inceptionv4, inceptionresnet and the impact of residual connections on learning,” in Thirtyfirst AAAI conference on artificial intelligence, 2017.
 F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1251–1258, 2017.
 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
 K. He, X. Zhang, S. Ren, and J. Sun, “Identity mappings in deep residual networks,” in European conference on computer vision, pp. 630–645, Springer, 2016.
 S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He, “Aggregated residual transformations for deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1492–1500, 2017.
 B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, “Learning transferable architectures for scalable image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 8697–8710, 2018.
 D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” tech. rep., California Univ San Diego La Jolla Inst for Cognitive Science, 1985.
 S. Sabour, N. Frosst, and G. E. Hinton, “Dynamic routing between capsules,” in Advances in neural information processing systems, pp. 3856–3866, 2017.
 G. E. Hinton, S. Sabour, and N. Frosst, “Matrix capsules with em routing,” 2018.
 R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 580–587, 2014.
 R. Girshick, “Fast rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 1440–1448, 2015.
 S. Ren, K. He, R. Girshick, and J. Sun, “Faster rcnn: Towards realtime object detection with region proposal networks,” in Advances in neural information processing systems, pp. 91–99, 2015.
 K. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask rcnn,” in Proceedings of the IEEE international conference on computer vision, pp. 2961–2969, 2017.
 J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, realtime object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779–788, 2016.
 J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018.
 A. Bochkovskiy, C.Y. Wang, and H.Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020.
 O. Alsing, “Mobile object detection using tensorflow lite and transfer learning,” 2018.
 C.F. Lai, W.C. Chien, L. T. Yang, and W. Qiang, “Lstm and edge computing for big data feature recognition of industrial electrical equipment,” IEEE Transactions on Industrial Informatics, vol. 15, no. 4, pp. 2469–2477, 2019.
 X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars, “Guiding the longshort term memory model for image caption generation,” in Proceedings of the IEEE international conference on computer vision, pp. 2407–2415, 2015.
 A. Graves, N. Jaitly, and A.r. Mohamed, “Hybrid speech recognition with deep bidirectional lstm,” in 2013 IEEE workshop on automatic speech recognition and understanding, pp. 273–278, IEEE, 2013.
 I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, pp. 3104–3112, 2014.
 P. Mishra, K. Khurana, S. Gupta, and M. K. Sharma, “Vmanalyzer: Malware semantic analysis using integrated cnn and bidirectional lstm for detecting vmlevel attacks in cloud,” in 2019 Twelfth International Conference on Contemporary Computing (IC3), pp. 1–6, IEEE, 2019.
 S. Song, H. Huang, and T. Ruan, “Abstractive text summarization using lstmcnn based deep learning,” Multimedia Tools and Applications, vol. 78, no. 1, pp. 857–875, 2019.
 D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
 K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoderdecoder for statistical machine translation,” arXiv preprint arXiv:1406.1078, 2014.
 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in neural information processing systems, pp. 5998–6008, 2017.
 K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” in International conference on machine learning, pp. 2048–2057, 2015.
 M.T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attentionbased neural machine translation,” arXiv preprint arXiv:1508.04025, 2015.
 Z. Yang, D. Yang, C. Dyer, X. He, A. Smola, and E. Hovy, “Hierarchical attention networks for document classification,” in Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 1480–1489, 2016.
 Y. Cui, Z. Chen, S. Wei, S. Wang, T. Liu, and G. Hu, “Attentionoverattention neural networks for reading comprehension,” arXiv preprint arXiv:1607.04423, 2016.
 J. Gehring, M. Auli, D. Grangier, D. Yarats, and Y. N. Dauphin, “Convolutional sequence to sequence learning,” in Proceedings of the 34th International Conference on Machine LearningVolume 70, pp. 1243–1252, JMLR. org, 2017.
 A. Oliva and A. Torralba, “Building the gist of a scene: The role of global image features in recognition,” Progress in brain research, vol. 155, pp. 23–36, 2006.
 C. Vondrick, A. Khosla, T. Malisiewicz, and A. Torralba, “Hoggles: Visualizing object detection features,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 1–8, 2013.
 D. G. Lowe, “Object recognition from local scaleinvariant features,” in Proceedings of the seventh IEEE international conference on computer vision, vol. 2, pp. 1150–1157, Ieee, 1999.
 T. Ojala, M. Pietikainen, and T. Maenpaa, “Multiresolution grayscale and rotation invariant texture classification with local binary patterns,” IEEE Transactions on pattern analysis and machine intelligence, vol. 24, no. 7, pp. 971–987, 2002.
 Y. Fan, X. Lu, D. Li, and Y. Liu, “Videobased emotion recognition using cnnrnn and c3d hybrid networks,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, pp. 445–450, 2016.
 Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, and W. Geng, “A novel attentionbased hybrid cnnrnn architecture for semgbased gesture recognition,” PloS one, vol. 13, no. 10, 2018.
 O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: A neural image caption generator,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3156–3164, 2015.
 G. Trigeorgis, F. Ringeval, R. Brueckner, E. Marchi, M. A. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu features? endtoend speech emotion recognition using a deep convolutional recurrent network,” in 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 5200–5204, IEEE, 2016.
 Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolutiongru based deep neural network,” in European semantic web conference, pp. 745–760, Springer, 2018.
 Y. Lyu and X. Huang, “Road segmentation using cnn with gru,” arXiv preprint arXiv:1804.05164, 2018.
 A. Nanduri and L. Sherry, “Anomaly detection in aircraft data using recurrent neural networks (rnn),” in 2016 Integrated Communications Navigation and Surveillance (ICNS), pp. 5C2–1, Ieee, 2016.
 J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
 W. Liu, A. Rabinovich, and A. C. Berg, “Parsenet: Looking wider to see better,” arXiv preprint arXiv:1506.04579, 2015.
 H. Noh, S. Hong, and B. Han, “Learning deconvolution network for semantic segmentation,” in Proceedings of the IEEE international conference on computer vision, pp. 1520–1528, 2015.
 O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computerassisted intervention, pp. 234–241, Springer, 2015.
 T.Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2117–2125, 2017.
 H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890, 2017.
 S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
 H. Zhang, K. Dana, J. Shi, Z. Zhang, X. Wang, A. Tyagi, and A. Agrawal, “Context encoding for semantic segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7151–7160, 2018.
 R. Archibald, K. Chen, A. Gelb, and R. Renaut, “Improving tissue segmentation of human brain mri through preprocessing by the gegenbauer reconstruction method,” NeuroImage, vol. 20, no. 1, pp. 489–502, 2003.
 D. Gottlieb, C.W. Shu, A. Solomonoff, and H. Vandeven, “On the gibbs phenomenon i: recovering exponential accuracy from the fourier partial sum of a nonperiodic analytic function,” Journal of Computational and Applied Mathematics, vol. 43, no. 12, pp. 81–98, 1992.
 Y. Zhang and C. Zhang, “A new algorithm for character segmentation of license plate,” in IEEE IV2003 Intelligent Vehicles Symposium. Proceedings (Cat. No. 03TH8683), pp. 106–109, IEEE, 2003.
 M. H. Jafari, N. Karimi, E. NasrEsfahani, S. Samavi, S. M. R. Soroushmehr, K. Ward, and K. Najarian, “Skin lesion segmentation in clinical images using deep learning,” in 2016 23rd International conference on pattern recognition (ICPR), pp. 337–342, IEEE, 2016.
 K. He, J. Sun, and X. Tang, “Guided image filtering,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 6, pp. 1397–1409, 2012.
 A. Ignatov, L. Van Gool, and R. Timofte, “Replacing mobile camera isp with a single deep learning model,” arXiv preprint arXiv:2002.05509, 2020.
 D. G. Lowe, “Distinctive image features from scaleinvariant keypoints,” International journal of computer vision, vol. 60, no. 2, pp. 91–110, 2004.
 A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,” in Proceedings of the 18th ACM international conference on Multimedia, pp. 1469–1472, 2010.
 G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4700–4708, 2017.
 P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros, “Imagetoimage translation with conditional adversarial networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134, 2017.
 J. Johnson, A. Alahi, and L. FeiFei, “Perceptual losses for realtime style transfer and superresolution,” in European conference on computer vision, pp. 694–711, Springer, 2016.
 A. Dosovitskiy, J. T. Springenberg, M. Riedmiller, and T. Brox, “Discriminative unsupervised feature learning with convolutional neural networks,” in Advances in neural information processing systems, pp. 766–774, 2014.  A. Adams, J. Baek, and M. A. Davis, “Fast highdimensional filtering using the permutohedral lattice,” in Computer Graphics Forum, vol. 29, pp. 753–762, Wiley Online Library, 2010.
 T. Carneiro, R. V. M. Da Nóbrega, T. Nepomuceno, G.B. Bian, V. H. C. De Albuquerque, and P. P. Reboucas Filho, “Performance analysis of google colaboratory as a tool for accelerating deep learning applications,” IEEE Access, vol. 6, pp. 61677–61685, 2018.