Shapes and Context:
In-the-wild Image Synthesis & Manipulation
Aayush Bansal
Yaser Sheikh
Deva Ramanan
[Paper]
[Interface]
[Code]


Our approach synthesizes images from label maps by non-parametric matching of shapes, parts, and pixels.



Demo video of our web-app (beta version).

Abstract

We introduce a data-driven approach for interactively synthesizing in-the-wild images from semantic label maps. Our approach is dramatically different from recent work in this space, in that we make use of no learning. Instead, our approach uses simple but classic tools for matching scene context, shapes, and parts to a stored library of exemplars. Though simple, this approach has several notable advantages over recent work: (1) because nothing is learned, it is not limited to specific training data distributions (such as cityscapes, facades, or faces); (2) it can synthesize arbitrarily high-resolution images, limited only by the resolution of the exemplar library; (3) by appropriately composing shapes and parts, it can generate an exponentially large set of viable candidate output images (that can say, be interactively searched by a user). We present results on the diverse COCO dataset, significantly outperforming learning-based approaches on standard image synthesis metrics. Finally, we explore user-interaction and user-controllability, demonstrating that our system can be used as a platform for user-driven content creation.



A. Bansal, Y. Sheikh, D. Ramanan
Shapes and Context: In-the-wild Image Synthesis & Manipulation.
In CVPR, 2019.
(Oral Presentation, Best Paper Award Finalist)

[Bibtex]

Five Minutes


A quick five minutes summary of this work!



The Art of COCO


This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



This video is played 4X faster. Enjoy it with bitter sweet symphony!



We present a diverse set of examples from COCO Panoptic Segmentation dataset to demonstrate in-the-wild image synthesis using our approach. We also present multiple outputs generated using our approach. The left side shows input, and right side shows output. The video is played 20X faster. Enjoy it with a symphony from Mozart!





Facades


We present a video showing various facades generated by our approach. The left side shows input, and right side shows output. The video is played 20X faster. Enjoy it with a music from Bach!



Elephants


Preliminary Result: Video created per-frame without any temporal information.





Acknowledgements

We thank the authors of COCO and COCO Panoptic Segmentation dataset, Cityscapes, and Facades dataset for their work. Nothing would have really worked had they not put tremendous efforts in curating these datasets. We thank David Forsyth for the valuable suggestions to improve this work. We also thank the authors of Colorful Image Colorization for this webpage design.