進階搜尋


   電子論文尚未授權公開,紙本請查館藏目錄
(※如查詢不到或館藏狀況顯示「閉架不公開」,表示該本論文不在書庫,無法取用。)
系統識別號 U0026-0608202016043800
論文名稱(中文) 利用自動編碼器與光流對影片進行重新編排
論文名稱(英文) Video Reordering with Optical Flows and Autoencoder
校院名稱 成功大學
系所名稱(中) 資訊工程學系
系所名稱(英) Institute of Computer Science and Information Engineering
學年度 108
學期 2
出版年 109
研究生(中文) 吳俊德
研究生(英文) Chun-Te Wu
學號 P76071446
學位類別 碩士
語文別 英文
論文頁數 36頁
口試委員 指導教授-李同益
口試委員-葉奕成
口試委員-林昭宏
口試委員-顏韶威
口試委員-林士勛
中文關鍵字 影片重組  自動編碼器  光流  路徑搜尋演算法 
英文關鍵字 video resequencing  autoencoder architecture  optical flows  path finding algorithms 
學科別分類
中文摘要 為了解決影片重新排序的問題,我們提出了一種創新的深度學習框架來生成具有平滑運動的影片。給定一個影片或是一堆無序的圖片集合,一開始先利用我們所提出的神經網路,從圖片或影片裡的每一幀畫面中提取出特徵向量。接著,我們使用特徵向量之間的距離建構出一個完全圖。最後,根據使用者的要求,我們會使用三種不同的路徑搜尋演算法遍歷整個圖來產生影片結果。這些演算法對應於我們框架的三種不同應用:原始視頻重建,中間影格插入和影片重新排序。為了確保生成的影片裡的動作能夠「盡可能的順暢和合理」,我們在路徑搜尋演算法中將光流作為約束條件,並使用我們提出的神經網絡來計算光流之間的差異。實驗結果顯示,我們所提出的網絡在特徵提取方面比先前的研究有更好的表現。影片結果也證明了我們的框架可以適用於多種不同風格的影片或無序圖像集合,包括卡通、動畫、或是真實世界的影片。而我們的影片結果中也不會產生那些在先前研究中所出現的不合理的動作。
英文摘要 To solve the general video resequencing problem, we propose a novel deep learning framework to generate the natural result videos with smooth motion. Given an unordered image collection or a video, we first extract the latent vectors from the images/video frames by a novel architecture we propose. Then, we build a complete graph with the distance between latent vectors. Three different path finding algorithms are used to traverse the graph for producing video sequence results, which correspond to three applications of our framework: original video reconstruction, in-between frames insertion, and video resequencing. To ensure the motion of the resulting videos is “as smooth and reasonable as possible”, we use optical flows as the constraints in the path finding algorithms, and the network architecture we proposed is used to compute the difference of the optical flows. The experimental evaluation demonstrates that our proposed network has better performance than the previous work on the feature extraction, and the appealing result videos also show that our framework can be applied on many styles of videos or unordered image collection, including cartoon and realistic videos without unappealing motion problems in previous study.
論文目次 摘要 i
Abstract ii
誌謝 iii
Table of Contents iv
List of Tables v
List of Figures vi
Chapter 1 Introduction 1
Chapter 2 Related Work 3
2.1 Feature Extraction and Dimension Reduction 3
2.2 Images sequence ordering 4
Chapter 3 Method 7
3.1 Perceptual distance 8
3.1.1 Network architecture 10
3.1.2 Training 12
3.2 Optical flow coherency 13
3.2.1 Optical flow computing 13
3.2.2 Difference of optical flow 15
3.3 Animation sequencing 16
3.3.1 Original video reconstructing 18
3.3.2 In-between frames insertion 19
3.3.3 Animation resequencing 20
Chapter 4 Result 27
4.1 2AFC dataset comparison 27
4.2 Encoder evaluation 29
4.3 Video Results 31
4.3.1 In-between frames insertion results 31
4.3.2 Video resequencing results 32
Chapter 5 Conclusion and Future Works 34
References 35
參考文獻 [1] O. Fried, S. Avidan, and D. Cohen-Or. “Patch2vec: Globally consistent image patch representation.” In Computer Graphics Forum, volume 36, pages 183–194. Wiley Online Library, 2017,
Available: https://onlinelibrary.wiley.com/doi/abs/10.1111/cgf.13284
[2] J. Yu, D. Tao, J. Li, J. Chen. “Semantic preserving distance metric learning and applications.” Inform. Sci. 281 (2014) 674–686,
Available: http://dx.doi.org/10.1016/j.ins.2014.01.025
[3] Y. Yang, Y. Zhuang, D. Tao, D. Xu, J. Yu, and J. Luo. “Recognizing cartoon image gestures for retrieval and interactive cartoon clip synthesis,” IEEE Trans. Circuits Syst. Video Technol., vol. 20, no. 12, pp. 1745–1756, Dec. 2010.
[4] Alex Gammerman, Volodya Vovk, Vladimir Vapnik. “Learning by transduction.” arXiv preprint, arXiv:1301.7375, 2013
[5] A. SCHo ̈DL, R. SZELISKI, D. H. SALESIN, I. ANDESSA. “Video textures.” Proceedings of SIGGRAPH 2000(July), 489–498. ISBN 1-58113-208-5
[6] L. P. Kaelbling, M. L. Littman, A. W. Moore. “Reinforcement learning: A survey.” J. Artif. Int. Res., vol. 4, no. 1, pp. 237–285, May 1996. [Online].
Available: http://dl.acm.org/citation.cfm?id=1622737.1622748
[7] Jun Yu, Dacheng Tao, Meng Wang. “Semi-automatic cartoon generation
by motion planning.” Multimedia Systems, 17(5):409-419, 2011
[8] Charles C. Morace, Chi-Kuo Yeh, Shang-Wei Zhang, Tong-Yee Lee. “Learning a Perceptual Manifold with Deep Features for Animation Video Resequencing.” Transactions on Visualization and Computer Graphics 2018, Sep. 2018
[9] J. Zhang, J. Yu, and D. Tao. “Local deep-feature alignment for unsupervised dimension reduction.” IEEE Trans. Image Process., vol. 27, no. 5, pp. 2420–2432, May 2018.
[10] M. Osadchy, Y. L. Cun, and M. L. Miller. “Synergistic face detection and pose estimation with energy-based models.” J. Mach. Learn. Res., vol. 8, pp. 1197–1215, May 2007. [Online].
Available: http://dl.acm.org/citation.cfm?id=1248659.1248700
[11] D. Holden, J. Saito, T. Komura, and T. Joyce. “Learning motion manifolds with convolutional autoencoders.” in SIGGRAPH Asia 2015 Technical Briefs, ser. SA ’15. New York, NY, USA: ACM, 2015, pp. 18:1–18:4. [Online].
Available: http://doi.acm.org/10.1145/2820903.2820918
[12] A. Scho ̈dl and I. A. Essa. “Machine learning for video-based rendering.” in Advances in Neural Information Processing Systems 13, T. K. Leen, T. G. Dietterich, and V. Tresp, Eds. MIT Press, 2001, pp. 1002–1008. [Online].
Available: http://papers.nips.cc/paper/1874-machine-learningfor-video-based-rende ring.pdf
[13] A. Scho ̈dl and I. A. Essa. “Controlled animation of video sprites.” in Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, ser. SCA ’02. New York, NY, USA: ACM, 2002, pp. 121–127. [Online].
Available: http://doi.acm.org/10.1145/545261.545281
[14] Shang-Wei Zhang, Charles C.Morace, Thi Ngoc Hanh Le, Chih-Kuo Yeh, Shih-Syun Lin, Sheng-Yi Yao, Tong-Yee Lee. "Animation Video Resequencing with a Convolutional AutoEncoder." SIGGRAPH Asia 2019, Poster, Brisbane, Australia, Nov. 2019
[15] R. Zhang, P. Isola, A. A. Efros, E. Shechtman, and O. Wang. “The unreasonable effectiveness of deep features as a perceptual metric.” CoRR, vol. abs/1801.03924, 018. [Online].
Available: http://arxiv.org/abs/1801.03924
[16] K. Simonyan and A. Zisserman. “Very deep convolutional networks for large-scale image recognition.” arXiv preprint, arXiv:1409.1556, 2014.
[17] K. He, X. Zhang, S. Ren, and J. Sun, “Identity Mappings in Deep Residual Networks,” CoRR, vol. abs/1603.05027, 2016. [Online].
Available: https://arxiv.org/abs/1603.05027
[18] L. A. Gatys, A. S. Ecker, and M. Bethge. “Image style transfer
using convolutional neural networks.” CVPR, 2016.
[19] G. Huang, Z. Liu, K.Q. Weinberger, L. van der Maaten. “Densely connected convolutional networks.” In: Proceedings of the IEEE conference on computer vision and pattern recognition. vol. 1, p. 3 (2017)
[20] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2014. [Online].
Available: http://arxiv.org/abs/1412.6980
[21] D. Sun, X. Yang, M.-Y. Liu, and J. Kautz. “PWC-Net: CNNs
for optical flow using pyramid, warping, and cost volume.” arXiv preprint, arXiv:1709.02371, 2017.
論文全文使用權限
  • 同意授權校內瀏覽/列印電子全文服務,於2024-09-01起公開。
  • 同意授權校外瀏覽/列印電子全文服務,於2024-09-01起公開。


  • 如您有疑問,請聯絡圖書館
    聯絡電話:(06)2757575#65773
    聯絡E-mail:etds@email.ncku.edu.tw