Compositional Transformers for Scene Generation

Dec 6, 2021



We introduce SceneGAN, an iterative object-oriented transformer, explored for the task of generative modeling. The network incorporates strong and explicit structural priors, to reflect the compositional nature inherent to visual scenes, and synthesizes images through a sequential process. It operates in two stages: a fast and lightweight sketching phase, where we plan a high-level scene layout, followed by an attention-based painting phase, where it is honed and refined, evolving into a rich and detailed output image. It moves away from conventional black-box GAN architectures that feature a flat monolithic latent space, towards a transparent design that encourages efficiency, controllability and interpretability. We demonstrate the model's strengths and qualities through a careful evaluation over a range of datasets, from multi-object CLEVR scenes on one end to the challenging COCO images on the other, showing it successfully achieves state-of-the-art performance in terms of image quality, consistency and diversity. Further quantitative and qualitative experiments illustrate the additional merits of our approach in terms of adaptability, versatility and disentanglement, and provide a deeper insight into its generation process, as it proceeds step-by-step from a rough initial sketch, to a detailed layout that accounts for objects' depths and dependencies, and up to the final high-resolution depiction of vibrant and intricate real-world scenes.


About NeurIPS 2021

Neural Information Processing Systems (NeurIPS) is a multi-track machine learning and computational neuroscience conference that includes invited talks, demonstrations, symposia and oral and poster presentations of refereed papers. Following the conference, there are workshops which provide a less formal setting.

Store presentation

Should this presentation be stored for 1000 years?

How do we store presentations

Total of 0 viewers voted for saving the presentation to eternal vault which is 0.0%


Recommended Videos

Presentations on similar topic, category or speaker

Interested in talks like this? Follow NeurIPS 2021