Skip to content

Commit

Permalink
v0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
floe committed Jan 4, 2021
1 parent 097cb3f commit fa37e39
Show file tree
Hide file tree
Showing 3 changed files with 54 additions and 23 deletions.
75 changes: 53 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,10 @@ I've heard good things about this deep learning stuff, so let's try that. I firs

I had a look at the corresponding [Python example](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/examples/python/label_image.py), [C++ example](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/examples/label_image), and [Android example](https://github.com/tensorflow/examples/tree/master/lite/examples/image_segmentation/android), and based on those, I first cobbled together a [Python demo](https://github.com/floe/deepbacksub/blob/master/deepseg.py). That was running at about 2.5 FPS, which is really excruciatingly slow, so I built a [C++ version](https://github.com/floe/deepbacksub/blob/master/deepseg.cc) which manages 10 FPS without too much hand optimization. Good enough.

I've also tested a TFLite-converted version of the [Body-Pix model](https://blog.tensorflow.org/2019/11/updated-bodypix-2.html), but the results haven't been much different to DeepLab for this use case.

More recently, Google has released a model specifically trained for [person segmentation that's used in Google Meet](https://ai.googleblog.com/2020/10/background-features-in-google-meet.html). This has way better performance than DeepLab, both in terms of speed and of accuracy, so this is now the default. It needs one custom op from the MediaPipe framework, but that was quite easy to integrate. Thanks to @jiangjianping for pointing this out in the [corresponding issue](https://github.com/floe/deepbacksub/issues/28).

## Replace Background

This is basically one line of code with OpenCV: `bg.copyTo(raw,mask);` Told you that's the easy part.
Expand All @@ -48,46 +52,73 @@ The dataflow through the whole program is roughly as follows:

- init
- load background.png, convert to YUYV
- load DeepLab v3+ network, initialize TFLite
- initialize TFLite, register custom op
- load Google Meet segmentation model
- setup V4L2 Loopback device (w,h,YUYV)
- loop
- grab raw YUYV image from camera
- extract square ROI in center
- downscale ROI to 257 x 257 (*)
- extract portrait ROI in center
- downscale ROI to 144 x 256 (*)
- convert to RGB (*)
- run DeepLab v3+
- convert result to binary mask for class "person"
- run Google Meet segmentation model
- convert result to binary mask using softmax
- denoise mask using erode/dilate
- upscale mask to raw image size
- copy background over raw image with mask (see above)
- `write()` data to virtual video device

(*) these are required input parameters for DeepLab v3+
(*) these are required input parameters for this model

## Requirements

Tested with the following dependencies:

- Ubuntu 20.04, x86-64
- Linux kernel 5.6 (stock package)
- OpenCV 4.2.0 (stock package)
- V4L2-Loopback 0.12.5 (stock package)
- Tensorflow Lite 2.4.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.4.0/tensorflow/lite))
- Ubuntu 18.04.5, x86-64
- Linux kernel 4.15 (stock package)
- OpenCV 3.2.0 (stock package)
- V4L2-Loopback 0.10.0 (stock package)
- Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite))
- Ultra-short build guide for Tensorflow Lite C++ library: clone repo above, then...
- run `./tensorflow/lite/tools/make/download_dependencies.sh`
- run `./tensorflow/lite/tools/make/build_lib.sh`
- Linux kernel 4.15 (stock package)
- OpenCV 3.2.0 (stock package)
- V4L2-Loopback 0.10.0 (stock package)
- Tensorflow Lite 2.1.0 (from [repo](https://github.com/tensorflow/tensorflow/tree/v2.1.0/tensorflow/lite))

Tested with the following software:

- Firefox
- 74.0.1 (works)
- 84.0 (works)
- 76.0.1 (works)
- 74.0.1 (works)
- Skype
- 8.58.0.93 (works)
- 8.67.0.96 (works)
- 8.60.0.76 (works)
- guvcview 2.0.5 (works with parameter `-c read`)
- Microsoft Teams 1.3.00.5153 (works)
- Chrome 81.0.4044.138 (works)
- Zoom 5.0.403652.0509 (works - yes, I'm a hypocrite, I tested it with Zoom after all :-)

- 8.58.0.93 (works)
- guvcview
- 2.0.6 (works with parameter `-c read`)
- 2.0.5 (works with parameter `-c read`)
- Microsoft Teams
- 1.3.00.30857 (works)
- 1.3.00.5153 (works)
- Chrome
- 87.0.4280.88 (works)
- 81.0.4044.138 (works)
- Zoom - yes, I'm a hypocrite, I tested it with Zoom after all :-)
- 5.4.54779.1115 (works)
- 5.0.403652.0509 (works)

## Building

Install dependencies (`sudo apt install libopencv-dev build-essential v4l2loopback-dkms`).

Run `make` to build everything (should also clone and build Tensorflow Lite).

If the first part doesn't work:
- Clone https://github.com/tensorflow/tensorflow/ repo into tensorflow/ folder
- Checkout tag v2.4.0
- run ./tensorflow/lite/tools/make/download_dependencies.sh
- run ./tensorflow/lite/tools/make/build_lib.sh

## Usage

First, load the v4l2loopback module (extra settings needed to make Chrome work):
Expand All @@ -106,13 +137,13 @@ As usual: pull requests welcome.
- Resolution is currently hardcoded to 640x480 (lowest common denominator).
- Only works with Linux, because that's what I use.
- Needs a webcam that can produce raw YUYV data (but extending to the common YUV420 format should be trivial)
- CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS.
- Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance?

## Fixed

- Should probably do a erosion (+ dilation?) operation on the mask.
- Background image size needs to match camera resolution (see issue #1).
- CPU hog: maxes out two cores on my 2.7 GHz i5 machine for just VGA @ 10 FPS. Fixed via Google Meet segmentation model.
- Uses stock Deeplab v3+ network. Maybe re-training with only "person" and "background" classes could improve performance? Fixed via Google Meet segmentation model.

## Other links

Expand Down
2 changes: 1 addition & 1 deletion deepseg.cc
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ void *grab_thread(void *arg) {

int main(int argc, char* argv[]) {

printf("deepseg v0.1.0\n");
printf("deepseg v0.2.0\n");
printf("(c) 2020 by floe@butterbrot.org\n");
printf("https://github.com/floe/deepseg\n");

Expand Down
File renamed without changes.

0 comments on commit fa37e39

Please sign in to comment.