Generating HIgh-Fidelity Images with Subscale Pixel Networks and Multidimensional Upscaling
Abstract
The unconditional generation of high fidelity images is a longstanding benchmark
for testing the performance of image decoders. Autoregressive image models have
been able to generate small images unconditionally, but the extension of these
methods to large images where fidelity can be more readily assessed has remained
an open problem. Among the major challenges are the capacity to encode the vast
previous context and the sheer difficulty of learning a distribution that preserves
both global semantic coherence and exactness of detail. To address the former
challenge, we propose the Subscale Pixel Network (SPN), a conditional decoder
architecture that generates an image as a sequence of sub-images of equal size. The
SPN compactly captures image-wide spatial dependencies and requires a fraction
of the memory and the computation required by other fully autoregressive models.
To address the latter challenge, we propose to use Multidimensional Upscaling
to grow an image in both size and depth via intermediate stages utilising distinct
SPNs. We evaluate SPNs on the unconditional generation of CelebAHQ of size
256 and of ImageNet from size 32 to 256. We achieve state-of-the-art likelihood
results in multiple settings, set up new benchmark results in previously unexplored
settings and are able to generate very high fidelity large scale samples on the basis
of both datasets.