Unleashing the Power of GANs: The Dawn of Interactive Image Manipulation

How Drag Your GAN is changing the game.

Tan Han Wei


Quick Demo of Drag Your GAN

Generative Adversarial Networks (GANs) have been making waves in the tech industry, and for good reason. They have the potential to revolutionize how we interact with digital content, and a recent development in this field is a testament to this fact. The technology in question is an interactive point-based manipulation on the generative image manifold, as detailed in the SIGGRAPH ’23 Conference Proceedings.

The Power of Interactive Point-Based Manipulation

The technology, aptly named “Drag Your GAN,” allows users to perform a series of manipulations on generated images. For instance, users can edit the pose, hair, shape, and expression of a face in an image. This level of control is unprecedented and opens up a world of possibilities for digital content creation and manipulation.

The Evolution of GANs

The evolution of GANs has seen several methods proposed for editing unconditional GANs by manipulating the input latent vectors. Some approaches find meaningful latent directions via supervised learning from manual annotations or prior 3D models. Other approaches compute the important semantic directions in the latent space in an unsupervised manner.

Recent developments have achieved the controllability of coarse object position by introducing intermediate “blobs” or heatmaps. All of these approaches enable editing of either image-aligned semantic attributes such as appearance, or coarse geometric attributes such as object position and pose.

The Future of GANs

The future of GANs is promising, with several methods modifying the architecture of the GAN to enable 3D control. These models generate 3D representations that can be rendered using a physically-based analytic renderer. However, control is currently limited to global pose or lighting.

Diffusion models have also enabled image synthesis at high quality. These models iteratively denoise a randomly sampled noise to create a photorealistic image. Recent models have shown expressive image synthesis conditioned on text inputs…