Dec 6, 2021
We introduce SceneGAN, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature inherent to visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight sketching phase, where we plan a high-level scene layout, followed by an attention-based painting phase, where it is honed and refined, evolving into a rich and detailed output image. It moves away from conventional black-box GAN architectures that feature a flat monolithic latent space, towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate the model's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes on one end to the challenging COCO images on the other, showing it successfully achieves state-of-the-art performance in terms of image quality, consistency and diversity. Further quantitative and qualitative experiments illustrate the additional merits of our approach in terms of adaptability, versatility and disentanglement, and provide a deeper insight into its generation process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes.
Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.
Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%
Presentations on similar topic, category or speaker