Have you ever looked yourself in the mirror and wished you could accentuate, reduce, eliminate, change a part of yourself? We all have. How about when looking at one of the many selfies you’ve taken. Don’t you wish you could just tweak it a little bit? Some days I bet you even wished you could go further than that and change yourself dramatically. Researchers at Facebook have just given us the ability to do that to our pictures.
Guillaume Lample and company have created a deep neural network (a class of artificially intelligent models) that can modify photos along particular parameters. Meaning, they can take a photograph of a face of a person and manipulate gender, age, and expression and turn that person into something like a twin. Someone who looks very similar to that person but has a few key features changed, accentuated, reversed.
What if you could do this?
So how can we manipulate photographs?
The team used an encoder-decoder architecture trained to identify particular features of a person in an image: wavy hair, wearing glasses, whether they are old or young, male or female. This model can pick out how much of that attribute is evident in a particular photo. With this information, the researchers could dial up one aspect while dialing back another. The team could dial up the femininity present in an image, for example.
The first part of the model (the encoder) takes an input image (x) with the attributes in the photo (y) and maps those attributes to a latent (hidden) space (z). The second model (the decoder) reconstructs that space (z) given the attributes (y, z). They call this architecture Fader Networks as a nod to the sliders on an audio sampling machine. You can dial up different aspects of what you want.
Fader Networks can generate different realistic versions of images by modifying attributes such as gender or age group. They can swap multiple attributes at a time, and continuously interpolate between each attribute value.
The data that makes this possible
These researchers were able to work with a large database of celebrity images.
CelebFaces Attributes Dataset (CelebA) is a large-scale face attributes dataset with more than 200K celebrity images, each with 40 attribute annotations. The images in this dataset cover large pose variations and background clutter. CelebA has large diversities, large quantities, and rich annotations, including
- 10,177 number of identities,
- 202,599 number of face images, and
- 5 landmark locations, 40 binary attributes annotations per image.
This rich dataset can be used to help models identify faces, attributes of faces, and highlight parts of individual faces.
The results are stunning
Enough with the description of what they did, show us. To understand how incredible this really is let’s upload one of our selfies. There.
Ok, we’re a young blonde woman. What if we want to see what we look like with glasses…
Neat. We can play with age…
Woah. Ok, we’re smiing too much, close that up.
Incredible! Note that no one physically photoshopped this picture of us. This was generated by the model learning what masculinity looks like and we dialed that up in our picture. The team has a range of other attributes and expressions which you can see below.
The faces across the top row are originals. You can see they represent a mix of men and women, old and young. Down the vertical side of this image are different attributes that are being applied to each. For example, each image on the second row is of someone having their gender reversed.
Again, THEY TOOK AN ORIGINAL IMAGE OF SOMEONE’S FACE AND GAVE THEM GLASSES THEY HAD NEVER WORN BEFORE AND THIS WAS NOT PHOTOSHOPPED.
Visions of the future
OK, that’s cool and all but after all these are still images. They couldn’t do this with video, could they? It’s already been done:
Here researchers used a relatively new class of models called Generative Adversarial Models (GANs) to translate the style of a zebra onto a horse. They say they now have the ability to start…
…capturing special characteristics of one image collection and figuring out how these characteristics could be translated into the other image collection, all in the absence of any paired training examples
We don’t need to have a picture of a horse-zebra pair to be able to generate an image of a horse-zebra. We only need to have the horse and have the style of a zebra to be able to apply the zebra-ness to the horse. In this way you can accomplish artistic styling translation, filling in rough sketches of images, amongst other applications.
Take an item in your everyday life and mesh it together with something else. We have all imagined it. Now it can be done algorithmically.
Conclusion and other thoughts
Here are two things to consider (1) there is an ever-increasing abundance of data; (2) these techniques are still in their infancy. The truth of (1) should be evident by anyone who is part of a social media platform. People willingly deliver their data to public places. We hope that someone will pick our face out of a crowd. We desire recognition. The second assertion is not as obvious. Researchers created the fundamental concepts that allow for these technology years ago, it’s only recently that the data and the horsepower have become available to make applications like this a reality. But since they are available now researchers are rushing in and trying new applications and building new theories. It’s a very exciting time for the artificial intelligence field, generally.
What does this mean, where is it going?
First, we are all going to have to reevaluate our conception of truth. These techniques go beyond photoshopping, they are of a different kind entirely. You are not manipulating an underlying image so much as you are generating an entirely new image based on characteristics present in other images. You are able to generate a new version of reality.
The internet will start being crowded with more and faker people. Even now you might be complaining about how people are not presenting the most realistic versions of themselves. Once this technology ripens it will be much more difficult to pick out what is genuine and what was produced in a model.
It also means that any picture or video of you that is floating around the internet could be manipulated. Your face can be transformed into a version of you that you may not recognize. This could happen with or without your permission. You could be placed into scenes and contexts that you never consented to be a part of — and the people you love might not be able to tell it’s a fake.
Enough with the negativity, the techniques that the researchers at Facebook used have broader applications across fields:
GANs have been used to produce samples of photorealistic images for the purposes of visualizing new interior/industrial design, shoes, bags and clothing items or items for computer games’ scenes..Recently, GANs have modeled patterns of motion in video. They have also been used to reconstruct 3D models of objects from images and to improve astronomical images. In 2017 a fully convolutional feedforward GAN was used for image enhancement using automated texture synthesis in combination with perceptual loss. The system focused on realistic textures rather than pixel-accuracy. The result was a higher image quality at high magnification.
What do you think? Are you excited about this new world and it’s possible applications? Are you afraid of how it might be abused? Leave a comment below and let us know.