進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-2703202007470000
論文名稱(中文) 利用深度學習應用於食物影像辨識與營養評估
論文名稱(英文) Food Image Recognition and Nutrition Estimation via Deep Learning
校院名稱 成功大學
系所名稱(中) 電機工程學系
系所名稱(英) Department of Electrical Engineering
學年度 108
學期 2
出版年 109
研究生(中文) 陳佳宏
研究生(英文) Jia-Hong Chen
學號 N26061767
學位類別 碩士
語文別 英文
論文頁數 116頁
口試委員 指導教授-李國君
口試委員-彭文孝
口試委員-陳秀玲
口試委員-鄭國順
口試委員-戴顯權
中文關鍵字 食物影像數據集  白平衡  對比度限制自適應直方圖均衡  營養估算  影像識別  食物類別預測  殘差卷積神經網絡  擴張卷積 
英文關鍵字 food image dataset  white balancing  contrast limited adaptive histogram equalization  nutrition estimation  image recognition  food category prediction  residual convolutional neural network  dilated convolution 
學科別分類
中文摘要 本論文提出一個名為AIFood的大規模的食物影像數據集,以及利用深度學習建立食物影像的食物類別識別系統與營養估算系統。我們從其他現有食物影像數據集以及食物網站去收集食物影像以建立食物影像數據集,並使用24種食物類別去標記所有收集到的影像,此外,我們使用自動白平衡與對比度限制自適應直方圖均衡程式去前處理影像以改善圖像的視覺品質,其中我們計算了影像裡的資訊並設定限制去偵測影像是否需要前處理,接下來,為了實現食物影像識別,我們建構50層的殘差卷積神經網路並移除其中的最大池化層去減少信息的損失並且用擴張卷積替換部分的卷積層去增加卷積層的感受視野。利用AIFood數據集去訓練完我們的神經網路之後,我們可以得到Micro-F1有83.63%和Macro-F1有76.90%食物類別預測的準確率,我們更改過後的殘差卷積神經網路會比原始的殘差卷積神經網路有更好的預測準確率。接下來,我們從台灣的衛生福利部收集每一個食物類別的營養素信息並計算每個食物類別在平均每人的一餐中含有多少營養素。最後,食物影像中所具有的營養含量即為神經網路在食物影像檢測到的所有食物類別的營養含量總和。
英文摘要 This thesis proposes a large-scale food image dataset, namely AIFood dataset, and a system for food category recognition of food image using deep learning, and nutrition estimation. In order to build a food image dataset for food image recognition, we collect the food images from existing food image datasets and a food website and relabel all images using 24 food categories. In addition, we preprocess the images using automatic white balancing and contrast limited adaptive histogram equalization to improve the visual quality of images. We calculate the information of the image and set constraints to detect if the image is needed to be preprocessed. Next, for food image recognition, we modify a 50-layer residual convolutional neural network (ResNet50) by removing the maximum pooling to decrease the loss the information and replacing part of convolutional layers with dilated convolution to increase receptive field of convolutional layers. After training the neural network using AIFood dataset, we can achieve 83.63% and 76.90% performance of Micro-F1 and Macro-F1 for food category prediction respectively. The performance of our modified ResNet50 is better than the original ResNet50. Next, we collect the nutrition information from the Ministry of Health and Welfare Taiwan and calculate the nutrition of each category per one person meal. Finally, the nutrition content in the food image is the sum of the nutrition of all food categories detected by the neural network in the food image.
論文目次 摘 要 i
Abstract iii
致謝 v
Table of Contents vii
List of Tables xi
List of Figures xiii
Chapter 1 Introduction 1
1.1 Introduction 1
1.2 Background Information 3
1.2.1 Food Category 3
1.2.2 Nutrition Information 4
1.3 Motivation 6
1.4 Organization of this Thesis 6
Chapter 2 Surveys of Related Works in the Literatures 7
2.1 Convolutional Neural Network (CNN) 7
2.1.1 Convolution 7
2.1.1.1 Dilated Convolution 8
2.1.1.2 Octave convolution 9
2.1.2 Neural Network 10
2.1.3 Common CNN model 13
2.1.3.1 LeNet 13
2.1.3.2 AlexNet 14
2.1.3.3 VGG 15
2.1.3.4 GoogLeNet 15
2.1.3.5 ResNet 16
2.2 Learning Technique 19
2.2.1 Early Stop 19
2.2.2 Transfer Learning 19
2.2.3 Data Augmentation 20
2.2.4 Global Average Pooling 21
2.2.5 Optimizers 22
2.2.5.1 Stochastic Gradient Descent 22
2.2.5.2 Momentum 23
2.2.5.3 AdaGrad 24
2.2.5.4 Adam 24
2.3 Existing Food Image Dataset 26
2.4 Research about Food Image Recognition 28
2.5 Research about Nutrition Estimation 30
2.6 Software of Nutrition Estimation 31
Chapter 3 Proposed Algorithms 33
3.1 Overview of Proposed Algorithm 33
3.2 AIFood Image Dataset 34
3.2.1 Image Collection and Labeling 35
3.2.2 Image preprocessing 37
3.2.2.1 Automatic White Balancing 38
3.2.2.2 Contrast Limited Adaptive Histogram Equalization 43
3.3 Food Image Recognition Based on CNN 45
3.3.1 Convolutional Neural Network Architecture 45
3.3.1.1 Frequency Analysis of Dilated Filter 48
3.3.2 Model Fine-tuning and Optimization 50
3.3.2.1 Optimization Step 50
3.3.2.2 Dataset Division 50
3.3.2.3 Weight Initialization 51
3.3.2.4 Image Normalization 53
3.3.2.5 Random Image Rotation and Flip 54
3.3.2.6 Loss Function and Optimizer 55
3.4 Nutrition Estimation 56
3.5 Method of Prediction Analysis and Visualization of CNN 59
3.5.1 Class Activation Maps 59
3.5.2 Feature Visualization 61
Chapter 4 Experimental Results and Discussion 63
4.1 Comparison of Food Image Dataset 63
4.2 Results of Image Preprocessing 68
4.3 Results of Food Image Recognition 71
4.3.1 Comparison of Different Initial Weight 72
4.3.2 Comparison of Results of Random Image Rotation and Flip 74
4.3.3 Comparison of Different Architecture of CNN 75
4.3.4 Comparison of Different Convolution 79
4.3.4.1 Experimental Result of Octave Convolution 79
4.3.4.2 Experimental Result of Dilated Convolution 79
4.3.5 Result of Classification of Food Category 80
4.4 Class Activation Maps 84
4.5 Feature Visualization 86
4.6 Response and Feature Using Different Input 92
4.7 Result of Nutrition Estimation 96
4.8 Comparison of Previous Work 98
4.8.1 Food Image Dataset 98
4.8.2 Food Image Recognition 99
4.8.3 Nutrition Estimation 100
Chapter 5 Conclusion and Future Works 101
5.1 Conclusions 101
5.2 Future Works 102
Acknowledgments 103
References 104
Appendix 109

參考文獻 [1] Ytower [Online]. Available: https://www.ytower.com.tw/.
[2] M. E. J. Lean, "Principles of human nutrition," Medicine, vol. 43, pp. 61-65, 2015.
[3] J. Vaughan and C. Geissler, The new Oxford book of food plants. OUP Oxford, 2009.
[4] UNICEF., Facts for life. Unicef, 2010.
[5] W. H. Organization, "Essential nutrition actions: improving maternal, newborn, infant and young child health and nutrition," 2013.
[6] Disabled-World. (2019). Nutrition: Nutritious Food Types and Dietary Information [Online]. Available: https://www.disabled-world.com/fitness/nutrition/.
[7] Reference intake for dietary nutrient [Online]. Available: https://www.hpa.gov.tw/Pages/Detail.aspx?nodeid=544&pid=725.
[8] N. Abbaspour, R. Hurrell, and R. Kelishadi, "Review on iron and its importance for human health," Journal of research in medical sciences: the official journal of Isfahan University of Medical Sciences, vol. 19, no. 2, p. 164, 2014.
[9] A. B. Evert et al., "Nutrition therapy recommendations for the management of adults with diabetes," Diabetes care, vol. 37, no. Supplement 1, pp. S120-S143, 2014.
[10] M. S. Seelig and A. Rosanoff, The magnesium factor. Penguin, 2003.
[11] Taiwan Food Nutrients Database [Online]. Available: https://consumer.fda.gov.tw/Food/TFND.aspx?nodeID=178.
[12] F. Yu and V. Koltun, "Multi-scale context aggregation by dilated convolutions," arXiv preprint arXiv:1511.07122, 2015.
[13] Y. Chen et al., "Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution," arXiv preprint arXiv:1904.05049, 2019.
[14] P. Buranajun, M. Sasananan, and S. Sasananan, "Prediction of Product Design and Development Success Using Artificial Neural Network," 2007.
[15] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[16] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in neural information processing systems, 2012, pp. 1097-1105.
[17] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in 2009 IEEE conference on computer vision and pattern recognition, 2009: Ieee, pp. 248-255.
[18] V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814.
[19] G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R. Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
[20] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[21] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, "Rethinking the inception architecture for computer vision," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2818-2826.
[22] C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9.
[23] C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi, "Inception-v4, inception-resnet and the impact of residual connections on learning," in Thirty-First AAAI Conference on Artificial Intelligence, 2017.
[24] S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," arXiv preprint arXiv:1502.03167, 2015.
[25] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[26] R. Caruana, S. Lawrence, and C. L. Giles, "Overfitting in neural nets: Backpropagation, conjugate gradient, and early stopping," in Advances in neural information processing systems, 2001, pp. 402-408.
[27] S. J. Pan and Q. Yang, "A survey on transfer learning," IEEE Transactions on knowledge and data engineering, vol. 22, no. 10, pp. 1345-1359, 2009.
[28] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, "AutoAugment: Learning Augmentation Strategies From Data," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 113-123.
[29] M. Lin, Q. Chen, and S. Yan, "Network in network," arXiv preprint arXiv:1312.4400, 2013.
[30] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, "Learning deep features for discriminative localization," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 2921-2929.
[31] H. Robbins and S. Monro, "A stochastic approximation method," The annals of mathematical statistics, pp. 400-407, 1951.
[32] N. Qian, "On the momentum term in gradient descent learning algorithms," Neural networks, vol. 12, no. 1, pp. 145-151, 1999.
[33] J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," Journal of Machine Learning Research, vol. 12, no. Jul, pp. 2121-2159, 2011.
[34] D. P. Kingma and J. Ba, "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980, 2014.
[35] M. Chen, K. Dhingra, W. Wu, L. Yang, R. Sukthankar, and J. Yang, "PFID: Pittsburgh fast-food image dataset," in 2009 16th IEEE International Conference on Image Processing (ICIP), 2009: IEEE, pp. 289-292.
[36] H. Hoashi, T. Joutou, and K. Yanai, "Image recognition of 85 food categories by feature fusion," in 2010 IEEE International Symposium on Multimedia, 2010: IEEE, pp. 296-301.
[37] Y. Matsuda, H. Hoashi, and K. Yanai, "Recognition of multiple-food images by detecting candidate regions," in 2012 IEEE International Conference on Multimedia and Expo, 2012: IEEE, pp. 25-30.
[38] Y. Kawano and K. Yanai, "Automatic expansion of a food image dataset leveraging existing categories with domain adaptation," in European Conference on Computer Vision, 2014: Springer, pp. 3-17.
[39] G. M. Farinella, D. Allegra, and F. Stanco, "A benchmark dataset to study the representation of food images," in European Conference on Computer Vision, 2014: Springer, pp. 584-599.
[40] L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101–mining discriminative components with random forests," in European Conference on Computer Vision, 2014: Springer, pp. 446-461.
[41] X. Wang, D. Kumar, N. Thome, M. Cord, and F. Precioso, "Recipe recognition with large multimodal food dataset," in 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2015: IEEE, pp. 1-6.
[42] O. Beijbom, N. Joshi, D. Morris, S. Saponas, and S. Khullar, "Menu-match: Restaurant-specific food logging from images," in 2015 IEEE Winter Conference on Applications of Computer Vision, 2015: IEEE, pp. 844-851.
[43] P. Pouladzadeh, A. Yassine, and S. Shirmohammadi, "Foodd: food detection dataset for calorie measurement using food images," in International Conference on Image Analysis and Processing, 2015: Springer, pp. 441-448.
[44] G. Ciocca, P. Napoletano, and R. Schettini, "Food recognition: a new dataset, experiments, and results," IEEE journal of biomedical and health informatics, vol. 21, no. 3, pp. 588-598, 2016.
[45] A. Singla, L. Yuan, and T. Ebrahimi, "Food/non-food image classification and food categorization using pre-trained googlenet model," in Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 2016: ACM, pp. 3-11.
[46] J. Chen and C.-W. Ngo, "Deep-based ingredient recognition for cooking recipe retrieval," in Proceedings of the 24th ACM international conference on Multimedia, 2016: ACM, pp. 32-41.
[47] Y. Liang and J. Li, "Computer vision-based food calorie estimation: dataset, method, and experiment," arXiv preprint arXiv:1705.07632, 2017.
[48] X. Chen, Y. Zhu, H. Zhou, L. Diao, and D. Wang, "ChineseFoodNet: A large-scale image dataset for chinese food recognition," arXiv preprint arXiv:1705.02743, 2017.
[49] J. Harashima, Y. Someya, and Y. Kikuta, "Cookpad image dataset: An image collection as infrastructure for food research," in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017: ACM, pp. 1229-1232.
[50] J. Marin et al., "Recipe1M+: A Dataset for Learning Cross-Modal Embeddings for Cooking Recipes and Food Images," arXiv preprint arXiv:1810.06553, 2018.
[51] Z. Fu, D. Chen, and H. Li, "Chinfood1000: A large benchmark dataset for chinese food recognition," in International Conference on Intelligent Computing, 2017: Springer, pp. 273-281.
[52] M.-Y. Chen et al., "Automatic chinese food identification and quantity estimation," in SIGGRAPH Asia 2012 Technical Briefs, 2012: ACM, p. 29.
[53] B. V. R. e Silva, M. G. Rad, J. Cui, M. McCabe, and K. Pan, "A Mobile-Based Diet Monitoring System for Obesity Management," Journal of health & medical informatics, vol. 9, no. 2, 2018.
[54] N. Martinel, C. Piciarelli, and C. Micheloni, "A supervised extreme learning committee for food recognition," Computer Vision and Image Understanding, vol. 148, pp. 67-86, 2016.
[55] A. Mariappan et al., "Personal dietary assessment using mobile devices," in Computational Imaging VII, 2009, vol. 7246: International Society for Optics and Photonics, p. 72460Z.
[56] F. Zhu, M. Bosch, N. Khanna, C. J. Boushey, and E. J. Delp, "Multiple hypotheses image segmentation and classification with application to dietary assessment," IEEE journal of Biomedical and Health Informatics, vol. 19, no. 1, pp. 377-388, 2014.
[57] J. Zheng, Z. Wang, and C. Zhu, "Food image recognition via superpixel based low-level and mid-level distance coding for smart home applications," Sustainability, vol. 9, no. 5, p. 856, 2017.
[58] Y. Kawano and K. Yanai, "Foodcam: A real-time mobile food recognition system employing fisher vector," in International Conference on Multimedia Modeling, 2014: Springer, pp. 369-373.
[59] C. Liu, Y. Cao, Y. Luo, G. Chen, V. Vokkarane, and Y. Ma, "Deepfood: Deep learning-based food image recognition for computer-aided dietary assessment," in International Conference on Smart Homes and Health Telematics, 2016: Springer, pp. 37-48.
[60] M. M. Anthimopoulos, L. Gianola, L. Scarnato, P. Diem, and S. G. Mougiakakou, "A food recognition system for diabetic patients based on an optimized bag-of-features model," IEEE journal of biomedical and health informatics, vol. 18, no. 4, pp. 1261-1271, 2014.
[61] Y. He, C. Xu, N. Khanna, C. J. Boushey, and E. J. Delp, "Analysis of food images: Features and classification," in 2014 IEEE International Conference on Image Processing (ICIP), 2014: IEEE, pp. 2744-2748.
[62] A. Meyers et al., "Im2Calories: towards an automated mobile vision food diary," in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1233-1241.
[63] C. Liu et al., "A new deep learning-based food recognition system for dietary assessment on an edge computing service infrastructure," IEEE Transactions on Services Computing, vol. 11, no. 2, pp. 249-261, 2017.
[64] H. Hassannejad, G. Matrella, P. Ciampolini, I. De Munari, M. Mordonini, and S. Cagnoni, "Food image recognition using very deep convolutional networks," in Proceedings of the 2nd International Workshop on Multimedia Assisted Dietary Management, 2016: ACM, pp. 41-49.
[65] K. Yanai and Y. Kawano, "Food image recognition using deep convolutional network with pre-training and fine-tuning," in 2015 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 2015: IEEE, pp. 1-6.
[66] S. Christodoulidis, M. Anthimopoulos, and S. Mougiakakou, "Food recognition for dietary assessment using deep convolutional neural networks," in International Conference on Image Analysis and Processing, 2015: Springer, pp. 458-465.
[67] A. Salvador, M. Drozdzal, X. Giro-i-Nieto, and A. Romero, "Inverse cooking: Recipe generation from food images," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 10453-10462.
[68] T. Ege and K. Yanai, "Simultaneous estimation of food categories and calories with multi-task CNN," in 2017 Fifteenth IAPR International Conference on Machine Vision Applications (MVA), 2017: IEEE, pp. 198-201.
[69] J. Noronha, E. Hysen, H. Zhang, and K. Z. Gajos, "Platemate: crowdsourcing nutritional analysis from food photographs," in Proceedings of the 24th annual ACM symposium on User interface software and technology, 2011: ACM, pp. 1-12.
[70] C.-L. Wang, "Automatic White Balancing Algorithm with Detection of Potential White Color Areas," Department of Electrical Engineering, National Cheng Kung University, 2007.
[71] G. D. Finlayson, S. D. Hordley, and A. Alsam, "Investigating von Kries‐like adaptation using local linear models," Color Research & Application: Endorsed by Inter‐Society Color Council, The Colour Group (Great Britain), Canadian Society for Color, Color Science Association of Japan, Dutch Society for the Study of Color, The Swedish Colour Centre Foundation, Colour Society of Australia, Centre Français de la Couleur, vol. 31, no. 2, pp. 90-101, 2006.
[72] R. C. Gonzalez and R. E. Woods, "Digital image processing [M]," Publishing house of electronics industry, vol. 141, no. 7, 2002.
[73] K. Zuiderveld, "Contrast limited adaptive histogram equalization," in Graphics gems IV, 1994: Academic Press Professional, Inc., pp. 474-485.
[74] M. Abadi et al., "Tensorflow: A system for large-scale machine learning," in 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16), 2016, pp. 265-283.
[75] X. Glorot and Y. Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proceedings of the thirteenth international conference on artificial intelligence and statistics, 2010, pp. 249-256.
[76] Y.-Y. Chou, "Convolutional Neural Network Analytics of Melasma in Harmonically Generated Microscopy Images," Department of Electrical Engineering, National Cheng Kung University, 2018.
[77] Daily Diet Guide [Online]. Available: https://www.hpa.gov.tw/Pages/EBook.aspx?nodeid=1208.
[78] Food Substitute Table [Online]. Available: https://www.hpa.gov.tw/Pages/Detail.aspx?nodeid=543&pid=8382.
[79] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, "Grad-cam: Visual explanations from deep networks via gradient-based localization," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 618-626.
[80] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, "Visualizing higher-layer features of a deep network," University of Montreal, vol. 1341, no. 3, p. 1, 2009.
[81] C. Olah, A. Mordvintsev, and L. Schubert, "Feature Visualization," Distill, Nov. 7 2017.
[82] M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.
[83] R.-E. Fan and C.-J. Lin, "A study on threshold selection for multi-label classification," Department of Computer Science, National Taiwan University, pp. 1-23, 2007.
[84] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, "Feature pyramid networks for object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2117-2125.
[85] Y. Kawano and K. Yanai, "Foodcam-256: a large-scale real-time mobile food recognitionsystem employing high-dimensional features and compression of classifier weights," in Proceedings of the 22nd ACM international conference on Multimedia, 2014: ACM, pp. 761-762.
[86] I. Woo, K. Otsmo, S. Kim, D. S. Ebert, E. J. Delp, and C. J. Boushey, "Automatic portion estimation and visual refinement in mobile dietary assessment," in Computational Imaging VIII, 2010, vol. 7533: International Society for Optics and Photonics, p. 75330O.
[87] M. Puri, Z. Zhu, Q. Yu, A. Divakaran, and H. Sawhney, "Recognition and volume estimation of food intake using a mobile device," in 2009 Workshop on Applications of Computer Vision (WACV), 2009: IEEE, pp. 1-8.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2025-03-26起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2025-03-26起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw