Adil Khan 9 months ago
AdiKhanOfficial #FYP Ideas

Animazing

Animazing aims to create deepfake videos. Deep Fakes are synthetically generated videos in which the original face of an actor is replaced by the face of the user. The face of the user will be taken as input and called the source face. The video in which face-swapping is to be done will be give

Project Title

Animazing

Project Area of Specialization

Artificial Intelligence

Project Summary

Animazing aims to create deepfake videos. Deep Fakes are synthetically generated videos in which the original face of an actor is replaced by the face of the user. The face of the user will be taken as input and called the source face. The video in which face-swapping is to be done will be given by the system and called destination video. The user will not have the option to upload the destination video himself because many videos can have legal and ethical implications. Furthermore, to achieve the best results, those videos are preferred in which the actors are looking directly into the camera and not moving around a lot. Therefore, the destination videos are chosen after a lot of research.

Currently, most of the face-swapping softwares that exist work only for images or single-face videos. We aim to do face swapping of two input faces with two characters in the destination video. User will also have the choice to select which input face he wants to swap with which destination face. This is the final goal of Animazing; to be able to create deepfakes in videos with two faces. Once that is done, the project can be extended to face swapping in videos with multiple faces.

Cutting the user's face and pasting it on top of the actor’s face is not the only objective of this project. The original expressions of the actor (including lip and eye movements) and head movements (head moving left and right etc.) should also stay the same. Furthermore, the original soundtrack and background of the video should stay the same as well. Only the face of the actor will be swapped and the rest of the body, including the hair, will stay the same.

Project Objectives

The objective of Animazing is to be a user-friendly and interactive web application that will:

  • Replace the face of a character in a video with the user’s face
  • Keep the background and the body of the destination video character intact
  • Keep the head movements and facial expressions of the destination video character intact
  • Keep the soundtrack and background of the destination video intact
  • Create deepfake for videos with a single character
  • Create deepfake for videos with two characters
  • Be easy to use and intuitive
  • Allow users to generate and download his/her deepfake video in a few hours only

Project Implementation Method

Animazing uses an existing system, DeepFaceLab (DFL), for face swapping. This is a software that allows expert users to swap faces in videos. This software was extensively researched and read about. Understanding and implementation DFL was the major task in FYP-I. DFL requires GPU resources which is why this project needs Google Colab to run. With the help of DFL, we have been able to achieve face swapping in destination videos with a single face. However, for destination videos having two faces, DFL produces very bad results even after taking twice as long. The results of our implementation of DFL can be viewed here:

https://drive.google.com/drive/folders/1DsXWdl6v80GfcRw-XG4c8MxiUarWhmIt?usp=sharing

The fact that DFL produces acceptable results for videos with a single face will be used to improve the results for videos with two faces. The strategy of face swapping in videos with two faces is basically to use DFL twice. With face detection, the user will be given the opportunity to select a face, A, with which he wants to swap his first source face. The other face, B, of the destination video will be replaced by a black rectangle for the time being. Since DFL will only detect face A, the results will be satisfactory. Now the swapped face A will be replaced by a black rectangle and DFL will detect face B only. A new input face will be taken from the user and it will be swapped with destination face B. Eventually, the black rectangle from face A will be removed and both swapped faces will be displayed. The time taken for this process will be double the time taken for a video with a single face but the results will be much better than those of using DFL once.

The most important modules of DFL are as follows:

  • Preprocessing: for each destination and source video, the frames are extracted and a face detection algorithm is applied to the frames. If no face is found, the image is skipped. If a face is found, the face portion of the image is cropped and saved. Preprocessing two videos (destination and source) of 60 seconds takes about 20 minutes.
  • Training: training is a time-consuming process and takes a few hours. For a 60 seconds destination video, about 3 hours of training produces satisfactory results. The trained model is saved every hour so training can be stopped and resumed at any time. The trainer of DFL is essentially a generator. The generator takes images of input and destination faces in different facial expressions and head movements. For each face expression/head movement of the destination face, the generator tries to find the corresponding source face and generates a new face. The new face has the face of the source face and expression/head movements of the destination face.
  • Merge: once the user is satisfied with the training, he/she can merge the newly generated face with the background in each frame. All the frames are joined together and the deepfake video is generated.

Benefits of the Project

Animazing will mainly be an entertainment application as it would allow people to see themselves as their favorite character in live-action mini-clips. Anyone would be able to generate their own deepfake in just a few hours without having any background in artificial intelligence or image manipulation. That is the real benefit of this software because deepfake generation is not an easy task, even for people with a background in artificial intelligence or image manipulation. No training to use the Animazing web application will be required because it will be a simple and intuitive interface with a step by step process.

Furthermore, the software and documentation will be open-source which means anyone can learn the process by which a deepfake can be generated. Since the process is explained in great detail and simple worlds in the documentation, it can be greatly beneficial for students and academics who are working in the area of deepfake generation.

Technical Details of Final Deliverable

The final product of this FYP is a web-based application with which the user will interact. The user will upload the input face(s), choose a destination video, and start the deepfake generation process. In the end, users can download this deep fake video as well. 

The web application will be based on Anvil which is a platform for building and hosting full-stack web applications. The backend of this web application is basically a Google Colab Notebook and an Anvil application will be built for this notebook. However, this application will be live only as long as the Google Colab notebook corresponding to it is connected to a GPU. Once the GPU disconnects or the notebook is closed, the application will stop working.

Even if only the team of this project gets access to GPU, that will not be enough because then only the team could use this software. Google Colab provides everybody access to free GPU which is why it is the only option for this project. However, Colab is not an unlimited resource. The free access to GPU can be disconnected anytime and a single session could only run for 12 hours maximum. 

However, the paid version of Google Colab provides benefits that could ease the development of this project. With Colab Pro:

  • A notebook can stay connected for up to 24 hours instead of 12 hours
  • Idle timeouts(random disconnects) are less frequent as compared to the free version
  • Almost 5GB more of GPU is provided which basically means the neural network training process should be faster
  • Twice the RAM is provided which might be useful considering that working with videos is a resource-intensive task

Final Deliverable of the Project

Software System

Core Industry

Education

Other Industries

Media

Core Technology

Artificial Intelligence(AI)

Other Technologies

Others

Sustainable Development Goals

Industry, Innovation and Infrastructure

Required Resources

Item Name Type No. of Units Per Unit Cost (in Rs) Total (in Rs)
Google Colab Pro Subscription Miscellaneous 516008000
Total in (Rs) 8000
If you need this project, please contact me on contact@adikhanofficial.com
Market Place for Pets and Poultry

The pet & poultry market is massive in Pakistan. People often like to have a pet at ho...

1675638330.png
Adil Khan
9 months ago
Structure Scanning and Designing Device

Acquiring detailed knowledge of any area or structure can be  of immense importance f...

1675638330.png
Adil Khan
9 months ago
Image Colorization

Main purpose of doing this is to revive the historical images/ art galleries and give them...

1675638330.png
Adil Khan
9 months ago
Optimal motion

This Project is to design a snake-like robot that can provide the locomotion as a real bio...

1675638330.png
Adil Khan
9 months ago
Machine Vision based Smart Spraying System for Weed and Disease Contro...

Agriculture is considered the backbone of Pakistan's economy as it contributes 18.9% to gr...

1675638330.png
Adil Khan
9 months ago