To get started with LipSick on Windows, follow these steps to set up your environment. This branch has been tested with Anaconda using Python 3.10 and CUDA 11.6 & CUDA 11.8 with only 4GB VRAM. Using a different Cuda version can cause speed issues.
See branches for Linux or HuggingFace GPU / CPU or Collab
Install
- Clone the repository:
git clone https://github.com/Inferencer/LipSick.git
cd LipSick
- Create and activate the Anaconda environment:
conda env create -f environment.yml
conda activate LipSick
Download Links
Please download pretrained_lipsick.pth using this link and place the file in the folder ./asserts
Then, download output_graph.pb using this link and place the file in the same folder.
Please download shape_predictor_68_face_landmarks.dat using this link and place the file in the folder ./models
.
โโโ ...
โโโ asserts
โ โโโ examples # A place to store inputs if not using gradio UI
โ โโโ inference_result # Results will be saved to this folder
โ โโโ output_graph.pb # The DeepSpeech model you manually download and place here
โ โโโ pretrained_lipsick.pth # Pre-trained model you manually download and place here
โ
โโโ models
โ โโโ Discriminator.py
โ โโโ LipSick.py
โ โโโ shape_predictor_68_face_landmarks.dat # Dlib Landmark tracking model you manually download and place here
โ โโโ Syncnet.py
โ โโโ VGG19.py
โโโ ...
- Run the application:
python app.py
Or use the new autorun tool by double clicking run_lipsick.bat
This will launch a Gradio interface where you can upload your video and audio files to process them with LipSick.
- Add support MacOS.
- Add upscale reference frames with masking.
- Add alternative option for face tracking model SFD (likely best results, but slower than Dlib).
- Examine CPU speed upgrades.
- Reintroduce persistent folders for frame extraction as an option with existing frame checks for faster extraction on commonly used videos. ๐ท
- Provide HuggingFace space CPU (free usage but slower). ๐ท
- Release Tutorial on manual masking using DaVinci. ๐ท
- Image to MP4 conversion so a single image can be used as input.
- Automatic audio conversion to WAV regardless of input audio format. ๐ค
- Clean README.md & provide command line inference.
- Remove input video 25fps requirement.
- Upload cherry picked input footage for user download & use.
- Create a Discord to share results, faster help, suggestions & cherry picked input footage.
- Multi face Lipsync on large scene scene changes/ cut scenes
- Mutli face Lipsync support on 1+ person in video.
- skipable frames when no face it detected.
- Close mouth fully on silence
- Add visualization for custom ref frames & print correct values ๐คฎ
- Add auto masking to remove the common bounding box around mouths. ๐คข
- Provide Google Colab .IPYNB. ๐คฎ
- Add support for Linux. ๐คข
- Looped original video generated as an option for faster manual masking. ๐คฎ
- Upload results footage montage to GitHub so new users can see what LipSick is capable of. ๐คฎ
- Add custom reference frame feature. ๐คฎ
- auto git pull updater .bat file ๐คข
- Add auto persistent crop_radius to prevent mask flickering. ๐คฎ
- Auto run the UI with a .bat file. ๐คฎ
- Auto open UI in default browser. ๐คฎ
- Add custom crop radius feature to stop flickering Example ๐คฎ
- Provide HuggingFace space GPU. ๐คฎ
- Remove warning messages in command prompt that don't affect performance. ๐คข
- Moved frame extraction to temp folders. ๐คฎ
- Results with the same input video name no longer overwrite existing results. ๐คฎ
- Remove OpenFace CSV requirement. ๐คฎ
- Detect accepted media input formats only. ๐คฎ
- Upgrade to Python 3.10. ๐คฎ
- Add UI. ๐คฎ
- ๐คฎ = Completed & published
- ๐คข = Completed & published but requires community testing
- ๐ท = Tested & working but not published yet
- ๐ค = Tested but not ready for public use
- Available
- Unavailable
This project, LipSick, is heavily inspired by and based on DINet. Specific components are borrowed and adapted to enhance LipSick
We express our gratitude to the authors and contributors of DINet for their open-source code and documentation.