Artificial Intelligence Uses Audio to Recreate a Talking Obama


This should spark intrigue among conspiracy lovers across the world.

Researchers from the University of Washington have created an artificial intelligence that produces photorealistic footage of Barack Obama speaking on video. It’s the result of feeding it many hours of audio and video footage showing the former president speaking or making conversation and matching it to a set of digital lips.

The three researchers will present their findings during Siggraph 2017, an annual conference centered around topics on computer graphics and interactive techniques. But since the paper is publically available we can already take part in the production process. At first, the recurrent neural network received input audio from online video footage. It then moves on to syncing mouth movements and placing it onto a textured mouth, following this the neural network makes sure that head movements and lip-syncing are matched, based on the audio feed. Less audio means less movement and vice versa.

At times you can notice clear discrepancies between the original video footage and the footage that results from the neural network’s interpretation of the audio. But it’s clear that this is a huge leap forward in terms of creating realistic footage based on audio.

For now, the technology still comes with some limitations. The footage requires a target video, meaning that the neural network will assume the positions of certain facial features in given positions from another clip. However, don’t mistake this for just manipulating the original footage, the target video can be a completely different clip so using audio from another clip will still result in a photorealistic look. Recognizing where parts of the faces are located allows the neural network to more easily manipulate relevant parts of the video. An example of this can be found in the research group’s paper, where a failure to properly match the positions of the lower half of the face to the target video resulted in a double chin.

Provided that the technology becomes even better, which it doubtlessly will considering that this is still in its initial ages, there will be some interesting ethical debates concerning what it means to show up in video footage in the future.