 |
系統識別號 |
U0026-1808202016280300 |
論文名稱(中文) |
雙向引導擴散式神經網路應用於立體視覺影像匹配 |
論文名稱(英文) |
Dual-GDNet: Dual Guided-diffusion Network for Stereo Image Dense Matching |
校院名稱 |
成功大學 |
系所名稱(中) |
測量及空間資訊學系 |
系所名稱(英) |
Department of Geomatics |
學年度 |
108 |
學期 |
2 |
出版年 |
109 |
研究生(中文) |
王瑞評 |
研究生(英文) |
Ruei-Ping Wang |
學號 |
P66074086 |
學位類別 |
碩士 |
語文別 |
英文 |
論文頁數 |
71頁 |
口試委員 |
指導教授-林昭宏 口試委員-張智安 口試委員-蔡富安 口試委員-曾義星
|
中文關鍵字 |
立體視覺影像匹配
深度卷積神經網路
|
英文關鍵字 |
Stereo matching
Deep Convolutional Neural Network
|
學科別分類 |
|
中文摘要 |
在3D 資訊重建中,立體視覺影像密匹配為關鍵步驟之一,其在攝影測量和計算機視覺領域方面,仍然是一項艱鉅的任務。除了基於罩窗的匹配之外,在機器學習的最新研究中,其通過使用深度卷積神經網絡(DCNN),在密匹配取得了很大進展。雙向引導擴散式神經網絡(Dual-GDNet),於本研究中提出,其不僅在網絡設計和訓練過程中,採用由左影像至右影像的傳統匹配架構,並於訓練中加入由右至左的左右一致性檢測之特性,意於減少錯誤匹配的機率。此外,抑制型回歸(Suppressed Regression) 亦於本研究中提出,其通過在視差回歸前,去除無關機率信息來估計視差,避免在多峰機率分布中,估計出不屬於任何一峰的視差結果。Dual-GDNet中的左右一致性概念,可應用於現有的DCNN 模型,以進一步改善視差估計。為了評估各新型神經網路設計之性能,選擇了GANet作為骨幹和主要比較對象,並採納Scene Flow 和KITTI 2015 立體數據集,做為訓練與評估模型的資料集。實驗結果證明,在端點誤差(EPE Error)、大於一之像素誤差比率(Error Rate) 和top2誤差方面,與相關模型比較之下,表現較為優異,其中Scene Flow 數據集的改進為2-10%,KITTI 2015 數據集的改進為2-8%。
|
英文摘要 |
Stereo dense matching which plays a key role in 3D reconstruction still remains a challenging task in photogrammetry and computer vision. In addition to block-based matching, recent studies based on machine learning have achieved great progress in stereo dense matching by using deep convolutional neural networks (DCNN). In this paper, a novel neural network called dual guided-diffusion network (Dual-GDNet) is proposed, which utilizes not only left-to-right but right-to-left image matchings in the network design and training with a consistentization process to reduce the possibilities of mis-matching. In addition, suppressed regression is proposed to refine disparity estimation by removing unrelated information before regression to prevent ambiguous predictions on multi-peaks probability distributions. The proposed Dual-GDNet can be applied to existing DCNN models for further improvement on disparity estimation. To estimate the performance, GA-Net is selected as the backbone, and the model was evaluated on the stereo datasets including Scene Flow and KITTI 2015. Experimental results demonstrate the superior, in terms of end-point-error, > 1 pixel error rate, and top-2 error, of the proposed model, compared with related models. An improvement of 2-10% on Scene Flow and 2-8% on KITTI 2015 datasets were obtained.
|
論文目次 |
摘要 - i
Abstract - ii
致謝 - iii
Table of Contents - iv
List of Tables - vi
List Figures - vii
Chapter 1. Introduction - 1
Chapter 2. Related Work - 5
Chapter 3. Background - 7
Semi-Global Matching (SGM) - 7
Matching cost calculation - 7
Cost aggregation - 17
Disparity computation/optimization - 19
Disparity refinement - 19
GANet - 21
Feature extraction - 22
Cost volume construction - 24
Guidance network - 25
Semi-global aggregation (SGA) - 26
Local guided aggregation (LGA) - 29
Disparity regression - 29
Chapter 4. Methodology - 32
Dual-GDNet - 34
Visible and invisible pixels problem - 36
Probability distribution with higher generalizability - 38
Flipped training - 39
Guided Diffusion Layer (GD) - 44
Training in Cross Entropy with Softmax - 46
Suppressed Regression - 50
Chapter 5. Experimental Results - 53
Evaluation on Scene Flow Dataset - 54
Evaluation on KITTI 2015 Dataset - 56
Effect on Flipped Training - 64
Chapter 6. Conclusions - 68
References - 69
|
參考文獻 |
Atienza, R. (2018). Fast disparity estimation using dense networks. CoRR,abs/1805.07499. Retrieved from http://arxiv.org/abs/1805.07499
Žbontar, J., & LeCun, Y. (2015). Stereo matching by training a convolutional neural network to compare image patches. CoRR, abs/1510.05970. Retrieved from http://dblp.uni-trier.de/db/journals/corr/corr1510.html#ZbontarL15
Chang, J., & Chen, Y. (2018). Pyramid stereo matching network. CoRR,abs/1803.08669. Retrieved from http://arxiv.org/abs/1803.08669
Chen, J., & Yuan, C. (2016, 09). Convolutional neural network using multiscale information for stereo matching cost computation. In (p. 34243428). doi: 10.1109/ICIP.2016.7532995
Cheng, F., He, X., & Zhang, H. (2017, 08). Learning to refine depth for robust stereo estimation. Pattern Recognition, 74. doi: 10.1016/j.patcog.2017.07.027
Cheng, X., Wang, P., & Yang, R. (2018). Learning depth with convolutional spatial propagation network. arXiv preprint arXiv:1810.02695
Godard, C., Aodha, O. M., & Brostow, G. J. (2016). Unsupervised monocular depth estimation with leftright consistency. CoRR, abs/1609.03677. Retrieved from http://arxiv.org/abs/1609.03677
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR, abs/1512.03385. Retrieved from http://arxiv.org/abs/1512.03385
Hinton, G. E., Vinyals, O., & Dean, J. (2015). Distilling the knowledge in a neural network. ArXiv, abs/1503.02531
Hirschmüller, H. (2008). Stereo Processing by SemiGlobal Matching and Mutual Information. IEEE TRANSACTIONS ON PATTERN 69 doi:10.6844/NCKU202002215 ANALYSIS AND MACHINE INTELLIGENCE, 30(2), 328–341
Huang, G., Liu, Z., & Weinberger, K. Q. (2016). Densely connected convolutional networks. CoRR, abs/1608.06993. Retrieved from http://arxiv.org/abs/1608.06993
Kang, J., Chen, L., Deng, F., & Heipke, C. (2019, 09). Context pyramidal network for stereo matching regularized by disparity gradients. ISPRS Journal of Photogrammetry and Remote Sensing, 157. doi:10.1016/j.isprsjprs.2019.09.012
Kendall, A., Martirosyan, H., Dasgupta, S., Henry, P., Kennedy, R., Bachrach, A., & Bry, A. (2017). Endtoend learning of geometry and context for deep stereo regression. CoRR, abs/1703.04309. Retrieved from http://arxiv.org/abs/1703.04309
Kingma, D., & Ba, J. (2014, 12). Adam: A method for stochastic optimization. International Conference on Learning Representations
Liu, S., De Mello, S., Gu, J., Zhong, G., Yang, M.H., & Kautz, J. (2017). Learning affinity via spatial propagation networks. In I. Guyon et al. (Eds.), Advances in neural information processing systems 30 (pp. 1520–1530). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/6750-learning-affinity-via-spatial-propagation-networks.pdf
Luo, W., Schwing, A. G., & Urtasun, R. (2016). Efficient deep learning for stereo matching. In 2016 ieee conference on computer vision and pattern recognition (cvpr) (p. 56955703)
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., & Brox, T. (2015). A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. CoRR, abs/1512.02134. Retrieved from http://arxiv.org/abs/
1512.02134
Menze, M., & Geiger, A. (2015). Object scene flow for autonomous vehicles. In 2015 ieee conference on computer vision and pattern recognition (cvpr) (p. 30613070)
Mozerov, M., & Weijer, J. (2015, 01). Accurate stereo matching by twostep energy minimization. IEEE transactions on image processing : a 70 doi:10.6844/NCKU202002215 publication of the IEEE Signal Processing Society, 24. doi: 10.1109/TIP.2015.2395820
PierrotDeseilligny, M., & Paparoditis, N. (2006). A multiresolution and optimizationbased image matching approach: An application to surface reconstruction from SPOT5HRS stereo imagery. The International Archives of Photogrammetry, Remote Sensing andSpatial Information Sciences, 36(1/W41), 328–341
Scharstein, D., & Szeliski, R. (2002). A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. International Journal of Computer Vision, 47(13),7–42
Seki, A., & Pollefeys, M. (2017). A. Seki and M. Pollefeys. Sgmnets:Semiglobal matching with neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 6640–6649
Wolf, P. R., & Dewitt, B. A. (2000). Elements of Photogrammetry: with applications in GIS. New York: McGrawHill, 30
Zhang, F., Prisacariu, V., Yang, R., & Torr, P. H. (2019). GANet:Guided Aggregation Net for Endtoend Stereo Matching. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 185–194
|
論文全文使用權限 |
同意授權校內瀏覽/列印電子全文服務,於2022-08-19起公開。同意授權校外瀏覽/列印電子全文服務,於2022-08-19起公開。 |
 |
|
 |