Over the past few days, a software package called Deep-Live-Cam has been going viral on social media because it can take the face of a person extracted from a single photo and apply it to a live webcam video source while following pose, lighting, and expressions performed by the person on the webcam. While the results aren’t perfect, the software shows how quickly the tech is developing—and how the capability to deceive others remotely is getting dramatically easier over time.
The Deep-Live-Cam software project has been in the works since late last year, but example videos that show a person imitating Elon Musk and Republican Vice Presidential candidate J.D. Vance (among others) in real time have been making the rounds online. The avalanche of attention briefly made the open source project leap to No. 1 on GitHub’s trending repositories list (it’s currently at No. 4 as of this writing), where it is available for download for free.
“Weird how all the major innovations coming out of tech lately are under the Fraud skill tree,” wrote illustrator Corey Brickley in an X thread reacting to an example video of Deep-Live-Cam in action. In another post, he wrote, “Nice remember to establish code words with your parents everyone,” referring to the potential for similar tools to be used for remote deception—and the concept of using a safe word, shared among friends and family, to establish your true identity.
Face-swapping technology is not new. The term “deepfake” itself originated in 2017 from a Reddit user called “deepfakes” (combining the terms “deep learning” and “fakes”), who posted pornography that swapped a performer’s face with the face of a celebrity. At that time, the technology was expensive and slow and did not operate in real time. However, due to projects like Deep-Live-Cam, it’s getting easier for anyone to use this technology at home with a regular PC and free software.
The dangers of deepfakes aren’t new, either. In February, we covered an alleged heist in Hong Kong where someone impersonated a company’s CFO over a video call and walked off with over $25 million dollars. Audio deepfakes have led to other financial fraud or extortion schemes. We might expect instances of remote video fraud to increase with easily available real-time deepfake software, and it’s not just celebrities or politicians who might be affected.
Using face-swapping software, someone could take a photo of you from social media and impersonate you to someone not entirely familiar with how you look and act—given the current need to imitate similar mannerisms, voice, hair, clothing, and body structure. Techniques to clone those aspects of appearance and voice also exist (using voice cloning and video image-to-image AI synthesis) but have not yet reached reliable photorealistic real-time implementations. But given time, that technology will likely also become readily available and easy to use.
How does it work?
Like many open source GitHub projects, Deep-Live-Cam wraps together several existing software packages under a new interface (and is itself a fork of an earlier project called “roop“). It first detects faces in both the source and target images (such as a frame of live video). It then uses a pre-trained AI model called “inswapper” to perform the actual face swap and another model called GFPGAN to improve the quality of the swapped faces by enhancing details and correcting artifacts that occur during the face-swapping process.
The inswapper model, developed by a project called InsightFace, can guess what a person (in a provided photo) might look like using different expressions and from different angles because it was trained on a vast dataset containing millions of facial images of thousands of individuals captured from various angles, under different lighting conditions, and with diverse expressions.
During training, the neural network underlying the inswapper model developed an “understanding” of facial structures and their dynamics under various conditions, including learning the ability to infer the three-dimensional structure of a face from a two-dimensional image. It also became capable of separating identity-specific features, which remain constant across different images of the same person, from pose-specific features that change with angle and expression. This separation allows the model to generate new face images that combine the identity of one face with the pose, expression, and lighting of another.
Deep-Live-Cam is far from the only face-swapping software project out there. Another GitHub project, called facefusion, uses the same face-swapping AI model with a different interface. Most of them rely heavily on a nested web of Python and deep learning libraries like PyTorch, so Deep-Live-Cam isn’t as easy as a one-click install yet. But it’s likely that this kind of face-swapping capability will become even easier to install over time and will likely improve in quality as people iterate and build on each other’s work in the open source AI development space.