Skip to content

lvhonglong/LKM

Repository files navigation

A Real-Time Video Portrait Matting Method Based on Large Kernel Convolution

Teaser

English | 中文

The GitHub repository for the paper A Real-Time Video Portrait Matting Method Based on Large-Kernel Convolution. LKM is designed for high-precision, real-time video matting of portraits. This paper proposes a video portrait matting network based on large-kernel convolution, achieving high-precision real-time video portrait matting through the use of large-kernel convolutions, wavelet transforms, and re-parameterization techniques. The network employs large-kernel convolutions to enhance the encoding and decoding capabilities of image features. Additionally, the wavelet transform is used to pass image information to the decoding network, providing more useful information. Finally, re-parameterization technology is employed to fuse multiple convolution branches with large kernels in parallel, effectively reducing computational burden and speeding up inference. The model can achieve processing speeds of 103 fps for HD video and 94 fps for 4K video on an Nvidia GTX 3090 GPU, enabling real-time video matting.

News



Environment Setup

  1. Unzip the package and create a virtual environment:
unzip LKM-main.zip
cd LKM-main
conda create -n LKM python=3.7
conda activate LKM
  1. Install the corresponding version of PyTorch for CUDA. Here we are using version CUDA 11.1:
pip install torch==1.10.1+cu111 torchvision==0.11.2+cu111 torchaudio==0.10.1 -f https://download.pytorch.org/whl/cu111/torch_stable.html
  1. Install the necessary library files for using unireplknet:
pip install -U openmim
mim install mmcv-full==1.5.0
mim install mmsegmentation==0.27.0
pip install timm==0.6.11 mmdet==2.28.1
  1. Install other library files:
pip install -r requirements.txt

Training and Evaluation

Please refer to the training documentation to train and evaluate your own model.


Demo

video:

python inference.py \
  --variant unireplknet \
  --checkpoint "./pretrained/model.pth" \
  --device cuda \
  --input-source "input.mp4" \
  --output-type video \
  --output-composition "out_com.mp4" \
  --output-alpha "out_pha.mp4" \
  --output-foreground "out_fgr.mp4" \
  --output-video-mbps 4 \
  --seq-chunk 12

wecab:

python inference_wecab.py

Speed

Speed is measured with inference_speed_test.py for reference.

GPU dType HD (1920x1080) 4K (3840x2160)
RTX 3090 FP16 103 FPS 94 FPS
  • Note 1: HD uses downsample_ratio=0.25, 4K uses downsample_ratio=0.125. All tests use batch size 1 and frame chunk 1.
  • Note 2: We refer to the method used by RVM for testing speed by measuring only tensor throughput. Directly using a camera for testing may reduce the speed, but it still maintains a fast processing capability, achieving real-time inference. The use of video hardware encoding and decoding technology could further improve the inference speed. For details, please refer to PyNvCodec.


About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages