Use of Generative Adversary Networks for Image Synthesis and Manipulation
Convolutional neural networks (CNNs) are widely used in image processing, in tasks such as object detection and recognition, semantic segmentation, facial recognition, and others. Furthermore, CNNs with adversarial training strategies, the so-called DCGANs, reached new performance levels in image synthesis and manipulation, leaning the research and development community attention towards this class of models.
Applications that include creating synthetic faces indistinguishable from real ones, altering the appearance of a face in its age and gender, creating natural scenarios based on semantic maps, interpolating distinct images to gather a hybrid, or domain transformation, as changing a horse into a zebra, are some examples of tasks in which DCGANs constitute the state of the art.
In this work we study the use of Deep Convolutional Generative Adversarial Networks (DCGANs) in domain transformation and intra-domain transformation tasks.
We performed domain transformation by changing car sketches into real-life car pictures, and with human-like sketches into colored, texturized cartoons. With this kind of application artists may be able to sketch their prototypes while seeing in real time their creations with a more realistic appearance.
The chosen intra-domain task was the manipulation of the latent space for supervised and unsupervised GANs, with a database of faces and other of insect sketches. Through this latent space it is possible to transform a face into a older version, change its gender, reduce its hair, and change other attributes featured on the training set images. With the insects dataset, for example, such a model can help on the study of how the features of two different species may have developed from a single common ancestor.
In our preliminary analysis, transforming synthetic sketches into car pictures with the supervised architecture Pix2Pix, had a good performance in creating good images, but was not able to generalize when applied to sketches made by other artists. However, the unsupervised architecture CycleGAN had a better result when compared to the Pix2Pix, even when considering the generalization. Unfortunatelly this approach was not efficient on the human-like cartoons task, and a discussion of the main reasons follows.
To the intra-domain transformation task, we evaluated the use of the Pix2Pix framework as a autoencoder to create a vectorial representation of the image on the latent space. This training did not converge, and again the main hypothesis are discussed. However, with the use of a unsupervised, progressive approach (ProGAN), we could represent the data inputs as latent vectors. Now we aim to understand if it is possible to manipulate the image directly on the feature space by using the GAN Inversion technique.