||Learning a Perceptual Manifold for Animation Video Resequencing
||Institute of Computer Science and Information Engineering
||Charles C. Morace
This work proposes a framework for animation video resequencing using deep learning and optimal graph traversal techniques. The proposed system produces new animation sequences by reordering a collection of animation images or existing animation video. To maintain tem poral coherence in the generated animation sequences, a perceptual distance is utilized so that adjacent frames in the resequenced animations are as perceptually similar as possible. To measure perceptual distance, we extract image features using activations of deep convolu tional neural networks and learn a perceptual distance by training these activation features on a small network with data comprised of human perceptual judgments. With this perceptual metric and graphbased manifold learning techniques, the framework can produce smooth and visually appealing animation results for a variety of animation styles. In contrast to pre vious work on animation resequencing, the proposed framework applies to a broader range of image styles and does not require handcrafted feature extraction, background subtrac tion, or feature correspondence. The framework has additional applications to sequencing unstructured collections of images.
Table of Contents iii
List of Tables v
List of Figures vi
Chapter 1. Introduction 1
Chapter 2. Related Work 3
Chapter 3. System Overview 7
Chapter 4. Method 9
Chapter 5. Results 15
Chapter 6. Conclusion 28
 Hadar AverbuchElor and Daniel CohenOr. Ringit: Ringordering casual photos of a temporal event. ACM Trans. Graph., 34(3):33:1–33:11, May 2015.
 Hadar AverbuchElor, Daniel CohenOr, and Johannes Kopf. Smooth image sequences for datadriven morphing. Comput. Graph. Forum, 35(2):203–213, May 2016.
 Mikhail Belkin and Partha Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15(6):1373–1396, June 2003.
 Qifeng Chen and Vladlen Koltun. Photographic image synthesis with cascaded refine ment networks. CoRR, abs/1707.09405, 2017.
 Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Intro duction to Algorithms, Third Edition. The MIT Press, 3rd edition, 2009.
 Christina de Juan and Bobby Bodenheimer. Cartoon textures. In Proceedings of the 2004 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’04, pages 267–276, AirelaVille, Switzerland, Switzerland, 2004. Eurographics Associa tion.
 Alexey Dosovitskiy and Thomas Brox. Generating images with perceptual similarity metrics based on deep networks. CoRR, abs/1602.02644, 2016.
 OhadFried,ShaiAvidan,andDanielCohenOr.Patch2Vec:GloballyConsistentImage Patch Representation. Computer Graphics Forum, 2016.
 Michael R. Garey and David S. Johnson. Computers and Intractability; A Guide to the Theory of NPCompleteness. W. H. Freeman & Co., New York, NY, USA, 1990.
 L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional neu ral networks. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2414–2423, June 2016.
 Daniel Holden, Jun Saito, Taku Komura, and Thomas Joyce. Learning motion mani folds with convolutional autoencoders. In SIGGRAPH Asia 2015 Technical Briefs, SA ’15, pages 18:1–18:4, New York, NY, USA, 2015. ACM.
 D. P. Huttenlocher, G. A. Klanderman, and W. J. Rucklidge. Comparing images using the hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelli gence, 15(9):850–863, Sep 1993.
 Justin Johnson, Alexandre Alahi, and FeiFei Li. Perceptual losses for realtime style transfer and superresolution. CoRR, abs/1603.08155, 2016.
 Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey. J. Artif. Int. Res., 4(1):237–285, May 1996.
 M. G. Kendall. A new measure of rank correlation. Biometrika, 30(1/2):81–93, 1938.
 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Sys tems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Pro ceedings of a meeting held December 36, 2012, Lake Tahoe, Nevada, United States., pages 1106–1114, 2012.
 Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Con ference on Neural Information Processing Systems Volume 1, NIPS’12, pages 1097– 1105, USA, 2012. Curran Associates Inc.
 J. B. Kruskal. On the shortest spanning subtree of a graph and the traveling salesman problem. In Proceedings of the American Mathematical Society, 7, 1956.
 J. B. Kruskal. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika, 29(1):1–27, Mar 1964.
 A. Kushal, B. Self, Y. Furukawa, D. Gallup, C. Hernandez, B. Curless, and S. M. Seitz. Photo tours. In 2012 Second International Conference on 3D Imaging, Modeling, Pro cessing, Visualization Transmission, pages 57–64, Oct 2012.
 Gilbert Laporte. The traveling salesman problem: An overview of exact and approxi mate algorithms. European Journal of Operational Research, 59(2):231 – 247, 1992.
 Haibin Ling and David W. Jacobs. Shape classification using the innerdistance. IEEE Trans. Pattern Anal. Mach. Intell., 29(2):286–299, February 2007.
 Margarita Osadchy, Yann Le Cun, and Matthew L. Miller. Synergistic face detection and pose estimation with energybased models. J. Mach. Learn. Res., 8:1197–1215, May 2007.
 Sam T. Roweis and Lawrence K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500):2323–2326, 2000.
 Arno Schödl and Irfan A. Essa. Machine learning for videobased rendering. In T. K. Leen, T. G. Dietterich, and V. Tresp, editors, Advances in Neural Information Process ing Systems 13, pages 1002–1008. MIT Press, 2001.
 Arno Schödl and Irfan A. Essa. Controlled animation of video sprites. In Proceedings of the 2002 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, SCA ’02, pages 121–127, New York, NY, USA, 2002. ACM.
 Arno Schödl, Richard Szeliski, David H. Salesin, and Irfan Essa. Video textures. In Proceedings of the 27th Annual Conference on Computer Graphics and Interac tive Techniques, SIGGRAPH ’00, pages 489–498, New York, NY, USA, 2000. ACM Press/AddisonWesley Publishing Co.
 K. Schoeffmann and D. Ahlstrom. Similaritybased visualization for image browsing revisited. In 2011 IEEE International Symposium on Multimedia, pages 422–427, Dec 2011.
 Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large scale image recognition. CoRR, abs/1409.1556, 2014.
 E. W. Stacy. A generalization of the gamma distribution. The Annals of Mathematical Statistics, 33(3):1187–1192, 1962.
 Wolfram Research, Inc. Mathematica 11.3, 2018.
 J. Yu, D. Liu, D. Tao, and H. S. Seah. On combining multiple features for cartoon char acter retrieval and clip synthesis. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 42(5):1413–1427, Oct 2012.
 J. Yu, M. Wang, and D. Tao. Semisupervised multiview distance metric learning for cartoon synthesis. IEEE Transactions on Image Processing, 21(11):4636–4648, Nov 2012.
 Jun Yu, Jun Cheng, and Dacheng Tao. Interactive cartoon reusing by transfer learning. Signal Process., 92(9):2147–2158, September 2012.
 RichardZhang,PhillipIsola,AlexeiA.Efros,EliShechtman,andOliverWang.Theun reasonable effectiveness of deep features as a perceptual metric. CoRR, abs/1801.03924, 2018.
 ShangWei Zhang, Charles C. Morace, Thi Ngoc Hanh Le, ChihKuo Yeh, ShengYi Yao, ShihSyun Lin, and TongYee Lee. Animation video resequencing with a con volutional autoencoder. In SIGGRAPH Asia 2019 Posters, SA 2019, Brisbane, QLD, Australia, November 1720, 2019, pages 19:1–19:2. ACM, 2019.