Nov 28, 2022
We are witnessing a surge of works on building and improving 3D-aware generators. To induce a 3D-aware bias, such models rely on volumetric rendering, which is expensive to employ at high resolutions. The dominant strategy to address the scaling issue is to train a separate 2D decoder to upsample a low-resolution volumetrically rendered representation. But this solution comes at a cost. Not only does it break multi-view consistency, e.g. shape and texture change when a camera moves, but it also learns the geometry in a low fidelity. In this work, we take a different route to 3D synthesis and develop a non-upsampler-based generator with state-of-the-art image quality, high-resolution geometry and which trains 2.5 × faster. For this, we revisit and improve patch-based optimization in two ways. First, we design a location- and scale-aware discriminator by modulating its filters with a hypernetwork. Second, we modify the patch sampling strategy based on an annealed beta distribution to stabilize training and accelerate the convergence. We train on four datasets (two introduced in this work) at 256^2 and 512^2 resolutions, directly, without the need of a 2D upsampler, and our model attains better or comparable FID and has higher fidelity geometry than the current SotA.Code/data/visualizations: https://rethinking-3d-gans.github.io
Professional recording and live streaming, delivered globally.
Presentations on similar topic, category or speaker