-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Does this work in Windows? #92
Comments
|
Hopefully someone will get time to port the core library to Windows, along with my OBS plugin that uses it, that is the reason we have split it out 😄 |
@phlash The core library part should mostly run on Windows as is (with maybe minor fixes). Where I currently see the biggest open questions is with the Camera Interface. Once we have image data for processing, we should be fine even on Windows, but pushing the processed image into some camera device may be tricky. |
I've done single image replacement in Windows, although there seems to be issues with the models. Probably my mucking with the code to make it build in Windows. I'm trying to build the complete package now in Windows. I thought I had an issue with pthreads, but the issue is with the package pthreadpool. A different, now edited version of this post was incorrect on that matter. I think there may be a versioning issue with TFlite. Pthreadpool builds on a different version I have, but the cmake pull in this project fetches a version that doesn't build with Windows Visual Studio. |
Thanks for digging in @OmarJay1! I presume you are attempting to build the This ticket (tensorflow/tensorflow#47166) seems to indicate TFLite builds work for x64 binaries from v2.4.1 tagged Tensorflow source.. I might have a go with a cross-compiler (mingw64) locally myself.. |
I got something to build and with the default settings it seems to work. It's very messy at this point. I'll try to clean it up a bit. Build-wise the main issues were pthreadpool, which TFLite needs, and pthreads which deepseg.cc needs. I had to get the latest version of pthreadpool and build against that. For pthreads, I hacked in overrides to the functions using std::thread and std::mutex. I'm not sure what the best way to present a PR. Right now it's a messy bunch of #if !_WINDOWS,#else,#endif. Pthreadpool would need to be changed in the cmake file. Here's what I have right now. Sorry for the formatting problems. I keep forgetting how to format in Markdown. I used the "insert code" feature, but it still seems to be hiding #include links. #if !_WINDOWS
#include <unistd.h>
#else
// C:\temp\pthreadpool\include
// pthreadpool\Debug\pthreadpool.lib
#include <chrono>
#include <thread>
#include <io.h>
#include <mutex>
void usleep(int usec)
{
std::this_thread::sleep_for(std::chrono::microseconds(usec));
}
void pthread_mutex_lock(std::mutex *m)
{
m->lock();
}
void pthread_mutex_unlock(std::mutex* m)
{
m->unlock();
}
#define pthread_mutex_t std::mutex
#define pthread_t std::thread
#endif
#include <cstdio>
#include <chrono>
#include <string>
#include "tensorflow/lite/interpreter.h"
#include "tensorflow/lite/kernels/register.h"
#include "tensorflow/lite/model.h"
#include "tensorflow/lite/optional_debug_tools.h"
#include <opencv2/core/core.hpp>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/types_c.h>
#include <opencv2/videoio/videoio_c.h>
#if !_WINDOWS
#include "loopback.h"
#endif
#include "transpose_conv_bias.h"
int fourCcFromString(const std::string& in)
{
if (in.empty())
return 0;
if (in.size() <= 4)
{
// fourcc codes are up to 4 bytes long, right-space-padded and upper-case
// c.f. http://ffmpeg.org/doxygen/trunk/isom_8c-source.html and
// c.f. https://www.fourcc.org/codecs.php
std::array<uint8_t, 4> a = {' ', ' ', ' ', ' '};
for (size_t i = 0; i < in.size(); ++i)
a[i] = ::toupper(in[i]);
return cv::VideoWriter::fourcc(a[0], a[1], a[2], a[3]);
}
else if (in.size() == 8)
{
// Most people seem to agree on 0x47504A4D being the fourcc code of "MJPG", not the literal translation
// 0x4D4A5047. This is also what ffmpeg expects.
return std::stoi(in, nullptr, 16);
}
return 0;
}
// OpenCV helper functions
cv::Mat convert_rgb_to_yuyv( cv::Mat input ) {
cv::Mat tmp;
cv::cvtColor(input,tmp,CV_RGB2YUV);
std::vector<cv::Mat> yuv;
cv::split(tmp,yuv);
cv::Mat yuyv(tmp.rows, tmp.cols, CV_8UC2);
uint8_t* outdata = (uint8_t*)yuyv.data;
uint8_t* ydata = (uint8_t*)yuv[0].data;
uint8_t* udata = (uint8_t*)yuv[1].data;
uint8_t* vdata = (uint8_t*)yuv[2].data;
for (unsigned int i = 0; i < yuyv.total(); i += 2) {
uint8_t u = (uint8_t)(((int)udata[i]+(int)udata[i+1])/2);
uint8_t v = (uint8_t)(((int)vdata[i]+(int)vdata[i+1])/2);
outdata[2*i+0] = ydata[i+0];
outdata[2*i+1] = v;
outdata[2*i+2] = ydata[i+1];
outdata[2*i+3] = u;
}
return yuyv;
}
// Tensorflow Lite helper functions
using namespace tflite;
#define TFLITE_MINIMAL_CHECK(x) \
if (!(x)) { \
fprintf(stderr, "Error at %s:%d\n", __FILE__, __LINE__); \
exit(1); \
}
std::unique_ptr<Interpreter> interpreter;
cv::Mat getTensorMat(int tnum, int debug) {
TfLiteType t_type = interpreter->tensor(tnum)->type;
TFLITE_MINIMAL_CHECK(t_type == kTfLiteFloat32);
TfLiteIntArray* dims = interpreter->tensor(tnum)->dims;
if (debug) for (int i = 0; i < dims->size; i++) printf("tensor #%d: %d\n",tnum,dims->data[i]);
TFLITE_MINIMAL_CHECK(dims->data[0] == 1);
int h = dims->data[1];
int w = dims->data[2];
int c = dims->data[3];
float* p_data = interpreter->typed_tensor<float>(tnum);
TFLITE_MINIMAL_CHECK(p_data != nullptr);
return cv::Mat(h,w,CV_32FC(c),p_data);
}
// deeplabv3 classes
const std::vector<std::string> labels = { "background", "aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "dining table", "dog", "horse", "motorbike", "person", "potted plant", "sheep", "sofa", "train", "tv" };
// label number of "person" for DeepLab v3+ model
const size_t cnum = labels.size();
const size_t pers = std::distance(labels.begin(), std::find(labels.begin(),labels.end(),"person"));
// timing helpers
typedef std::chrono::high_resolution_clock::time_point timestamp_t;
typedef struct {
timestamp_t bootns;
timestamp_t lastns;
timestamp_t waitns;
timestamp_t lockns;
timestamp_t copyns;
timestamp_t openns;
timestamp_t tfltns;
timestamp_t maskns;
timestamp_t postns;
timestamp_t v4l2ns;
// these are already converted to ns
long grabns;
long retrns;
} timinginfo_t;
timestamp_t timestamp() {
return std::chrono::high_resolution_clock::now();
}
long diffnanosecs(timestamp_t t1, timestamp_t t2) {
return std::chrono::duration_cast<std::chrono::nanoseconds>(t1-t2).count();
}
// threaded capture shared state
typedef struct {
cv::VideoCapture *cap;
cv::Mat *grab;
cv::Mat *raw;
int64 cnt;
timinginfo_t *pti;
pthread_mutex_t lock;
} capinfo_t;
enum class modeltype_t {
Unknown,
BodyPix,
DeepLab,
GoogleMeetSegmentation,
MLKitSelfie,
};
struct normalization_t {
float scaling;
float offset;
};
typedef struct {
const char *modelname;
modeltype_t modeltype;
normalization_t norm;
size_t threads;
size_t width;
size_t height;
int debug;
std::unique_ptr<tflite::FlatBufferModel> model;
cv::Mat input;
cv::Mat output;
cv::Rect roidim;
cv::Mat mask;
cv::Mat mroi;
cv::Mat raw;
cv::Mat ofinal;
cv::Mat element;
float ratio;
} calcinfo_t;
// capture thread function
void *grab_thread(void *arg) {
capinfo_t *ci = (capinfo_t *)arg;
bool done = false;
// while we have a grab frame.. grab frames
while (!done) {
timestamp_t ts = timestamp();
ci->cap->grab();
long ns = diffnanosecs(timestamp(),ts);
pthread_mutex_lock(&ci->lock);
ci->pti->grabns = ns;
if (ci->grab!=NULL) {
ts = timestamp();
ci->cap->retrieve(*ci->grab);
ci->pti->retrns = diffnanosecs(timestamp(),ts);
} else {
done = true;
}
ci->cnt++;
pthread_mutex_unlock(&ci->lock);
}
return NULL;
}
modeltype_t get_modeltype(const char* modelname) {
if (strstr(modelname, "body-pix")) {
return modeltype_t::BodyPix;
}
else if (strstr(modelname, "deeplab")) {
return modeltype_t::DeepLab;
}
else if (strstr(modelname, "segm_")) {
return modeltype_t::GoogleMeetSegmentation;
}
else if (strstr(modelname, "selfie")) {
return modeltype_t::MLKitSelfie;
}
return modeltype_t::Unknown;
}
normalization_t get_normalization(modeltype_t type) {
// TODO: This should be read out from actual mode metadata instead
switch (type) {
case modeltype_t::DeepLab:
#if !_WINDOWS
return normalization_t{.scaling = 1/127.5, .offset = -1};
#else
{
normalization_t norm;
norm.scaling = 1 / 127.5;
norm.offset = -1;
return norm;
}
#endif
case modeltype_t::BodyPix:
case modeltype_t::GoogleMeetSegmentation:
case modeltype_t::MLKitSelfie:
case modeltype_t::Unknown:
default:
#if !_WINDOWS
return normalization_t{.scaling = 1/255.0, .offset = 0};
#else
{
normalization_t norm;
norm.scaling = 1 / 255.0;
norm.offset = 0;
return norm;
}
#endif
}
}
void init_tensorflow(calcinfo_t &info) {
// Load model
info.model = tflite::FlatBufferModel::BuildFromFile(info.modelname);
TFLITE_MINIMAL_CHECK(info.model != nullptr);
// Build the interpreter
tflite::ops::builtin::BuiltinOpResolver resolver;
// custom op for Google Meet network
resolver.AddCustom("Convolution2DTransposeBias", mediapipe::tflite_operations::RegisterConvolution2DTransposeBias());
InterpreterBuilder builder(*info.model, resolver);
builder(&interpreter);
TFLITE_MINIMAL_CHECK(interpreter != nullptr);
// Allocate tensor buffers.
TFLITE_MINIMAL_CHECK(interpreter->AllocateTensors() == kTfLiteOk);
// set interpreter params
interpreter->SetNumThreads(info.threads);
interpreter->SetAllowFp16PrecisionForFp32(true);
// get input and output tensor as cv::Mat
info.input = getTensorMat(interpreter->inputs ()[0],info.debug);
info.output = getTensorMat(interpreter->outputs()[0],info.debug);
info.ratio = (float)info.input.cols/(float) info.input.rows;
// initialize mask and square ROI in center
info.roidim = cv::Rect((info.width-info.height/info.ratio)/2,0,info.height/info.ratio,info.height);
info.mask = cv::Mat::ones(info.height,info.width,CV_8UC1);
info.mroi = info.mask(info.roidim);
// erosion/dilation element
info.element = cv::getStructuringElement( cv::MORPH_RECT, cv::Size(5,5) );
// create Mat for small mask
info.ofinal = cv::Mat(info.output.rows,info.output.cols,CV_8UC1);
}
void calc_mask(calcinfo_t &info, timinginfo_t &ti) {
// map ROI
cv::Mat roi = info.raw(info.roidim);
// resize ROI to input size
cv::Mat in_u8_bgr, in_u8_rgb;
cv::resize(roi,in_u8_bgr,cv::Size(info.input.cols,info.input.rows));
cv::cvtColor(in_u8_bgr,in_u8_rgb,CV_BGR2RGB);
// TODO: can convert directly to float?
// bilateral filter to reduce noise
if (1) {
cv::Mat filtered;
cv::bilateralFilter(in_u8_rgb,filtered,5,100.0,100.0);
in_u8_rgb = filtered;
}
// convert to float and normalize to values expected by model
in_u8_rgb.convertTo(info.input,CV_32FC3,info.norm.scaling,info.norm.offset);
ti.openns=timestamp();
// Run inference
TFLITE_MINIMAL_CHECK(interpreter->Invoke() == kTfLiteOk);
ti.tfltns=timestamp();
float* tmp = (float*)info.output.data;
uint8_t* out = (uint8_t*)info.ofinal.data;
switch (info.modeltype) {
case modeltype_t::DeepLab:
// find class with maximum probability
for (unsigned int n = 0; n < info.output.total(); n++) {
float maxval = -10000; size_t maxpos = 0;
for (size_t i = 0; i < cnum; i++) {
if (tmp[n*cnum+i] > maxval) {
maxval = tmp[n*cnum+i];
maxpos = i;
}
}
// set mask to 0 where class == person
uint8_t val = (maxpos==pers ? 0 : 255);
out[n] = (val & 0xE0) | (out[n] >> 3);
}
break;
case modeltype_t::BodyPix:
case modeltype_t::MLKitSelfie:
// threshold probability
for (unsigned int n = 0; n < info.output.total(); n++) {
// FIXME: hardcoded threshold
uint8_t val = (tmp[n] > 0.65 ? 0 : 255);
out[n] = (val & 0xE0) | (out[n] >> 3);
}
break;
case modeltype_t::GoogleMeetSegmentation:
/* 256 x 144 x 2 tensor for the full model or 160 x 96 x 2
* tensor for the light model with masks for background
* (channel 0) and person (channel 1) where values are in
* range [MIN_FLOAT, MAX_FLOAT] and user has to apply
* softmax across both channels to yield foreground
* probability in [0.0, 1.0]. */
for (unsigned int n = 0; n < info.output.total(); n++) {
float exp0 = expf(tmp[2*n ]);
float exp1 = expf(tmp[2*n+1]);
float p0 = exp0 / (exp0+exp1);
float p1 = exp1 / (exp0+exp1);
uint8_t val = (p0 < p1 ? 0 : 255);
out[n] = (val & 0xE0) | (out[n] >> 3);
}
break;
case modeltype_t::Unknown:
fprintf(stderr, "Unknown model type\n");
break;
}
ti.maskns=timestamp();
// denoise
cv::Mat tmpbuf;
cv::dilate(info.ofinal,tmpbuf,info.element);
cv::erode(tmpbuf,info.ofinal,info.element);
// scale up into full-sized mask
cv::resize(info.ofinal,info.mroi,cv::Size(info.raw.rows/info.ratio,info.raw.rows));
}
int main(int argc, char* argv[]) {
printf("deepseg v0.2.0\n");
printf("(c) 2021 by floe@butterbrot.org\n");
printf("https://github.com/floe/deepbacksub\n");
timinginfo_t ti;
ti.bootns = timestamp();
int debug = 0;
bool showProgress = false;
size_t threads= 2;
size_t width = 640;
size_t height = 480;
const char *back = nullptr; // "images/background.png";
const char *vcam = "/dev/video0";
const char *ccam = "/dev/video1";
bool flipHorizontal = false;
bool flipVertical = false;
int fourcc = 0;
#if !_WINDOWS
const char* modelname = "models/segm_full_v679.tflite";
#else
const char* modelname = "../models/segm_full_v679.tflite";
#endif
bool showUsage = false;
for (int arg=1; arg<argc; arg++) {
bool hasArgument = arg+1 < argc;
if (strncmp(argv[arg], "-?", 2)==0) {
showUsage = true;
} else if (strncmp(argv[arg], "-d", 2)==0) {
++debug;
} else if (strncmp(argv[arg], "-p", 2)==0) {
showProgress = true;
} else if (strncmp(argv[arg], "-H", 2)==0) {
flipHorizontal = !flipHorizontal;
} else if (strncmp(argv[arg], "-V", 2)==0) {
flipVertical = !flipVertical;
} else if (strncmp(argv[arg], "-v", 2)==0) {
if (hasArgument) {
vcam = argv[++arg];
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-c", 2)==0) {
if (hasArgument) {
ccam = argv[++arg];
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-b", 2)==0) {
if (hasArgument) {
back = argv[++arg];
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-m", 2)==0) {
if (hasArgument) {
modelname = argv[++arg];
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-w", 2)==0) {
if (hasArgument && sscanf(argv[++arg], "%zu", &width)) {
if (!width) {
showUsage = true;
}
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-h", 2)==0) {
if (hasArgument && sscanf(argv[++arg], "%zu", &height)) {
if (!height) {
showUsage = true;
}
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-f", 2)==0) {
if (hasArgument) {
fourcc = fourCcFromString(argv[++arg]);
if (!fourcc) {
showUsage = true;
}
} else {
showUsage = true;
}
} else if (strncmp(argv[arg], "-t", 2)==0) {
if (hasArgument && sscanf(argv[++arg], "%zu", &threads)) {
if (!threads) {
showUsage = true;
}
} else {
showUsage = true;
}
}
}
if (showUsage) {
fprintf(stderr, "\n");
fprintf(stderr, "usage:\n");
fprintf(stderr, " deepseg [-?] [-d] [-p] [-c <capture>] [-v <virtual>] [-w <width>] [-h <height>]\n");
fprintf(stderr, " [-t <threads>] [-b <background>] [-m <modell>]\n");
fprintf(stderr, "\n");
fprintf(stderr, "-? Display this usage information\n");
fprintf(stderr, "-d Increase debug level\n");
fprintf(stderr, "-p Show progress bar\n");
fprintf(stderr, "-c Specify the video source (capture) device\n");
fprintf(stderr, "-v Specify the video target (sink) device\n");
fprintf(stderr, "-w Specify the video stream width\n");
fprintf(stderr, "-h Specify the video stream height\n");
fprintf(stderr, "-f Specify the camera video format, i.e. MJPG or 47504A4D.\n");
fprintf(stderr, "-t Specify the number of threads used for processing\n");
fprintf(stderr, "-b Specify the background image\n");
fprintf(stderr, "-m Specify the TFLite model used for segmentation\n");
fprintf(stderr, "-H Mirror the output horizontally\n");
fprintf(stderr, "-V Mirror the output vertically\n");
exit(1);
}
printf("debug: %d\n", debug);
printf("ccam: %s\n", ccam);
printf("vcam: %s\n", vcam);
printf("width: %zu\n", width);
printf("height: %zu\n", height);
printf("flip_h: %s\n", flipHorizontal ? "yes" : "no");
printf("flip_v: %s\n", flipVertical ? "yes" : "no");
printf("threads:%zu\n", threads);
printf("back: %s\n", back ? back : "(none)");
printf("model: %s\n\n", modelname);
cv::Mat bg;
if (back) {
bg = cv::imread(back);
}
if (bg.empty()) {
if (back) {
printf("Warning: could not load background image, defaulting to green\n");
}
bg = cv::Mat(height,width,CV_8UC3,cv::Scalar(0,255,0));
}
cv::resize(bg,bg,cv::Size(width,height));
#if !_WINDOWS
int lbfd = loopback_init(vcam,width,height,debug);
if(lbfd < 0) {
fprintf(stderr, "Failed to initialize vcam device.\n");
exit(1);
}
#endif
#if !_WINDOWS
cv::VideoCapture cap(ccam, CV_CAP_V4L2);
#else
cv::VideoCapture cap;
int deviceID = 0; // 0 = open default camera
int apiID = cv::CAP_ANY; // 0 = autodetect default API
cap.open(deviceID, apiID);
#endif
TFLITE_MINIMAL_CHECK(cap.isOpened());
cap.set(CV_CAP_PROP_FRAME_WIDTH, width);
cap.set(CV_CAP_PROP_FRAME_HEIGHT, height);
if (fourcc)
cap.set(CV_CAP_PROP_FOURCC, fourcc);
cap.set(CV_CAP_PROP_CONVERT_RGB, true);
auto modeltype = get_modeltype(modelname);
auto norm = get_normalization(modeltype);
if (modeltype_t::Unknown == modeltype) {
fprintf(stderr, "Unknown model type '%s'.\n", modelname);
exit(1);
}
calcinfo_t calcinfo = { modelname, modeltype, norm, threads, width, height, debug };
init_tensorflow(calcinfo);
// kick off separate grabber thread to keep OpenCV/FFMpeg happy (or it lags badly)
#if !_WINDOWS
pthread_t grabber;
cv::Mat buf1;
cv::Mat buf2;
int64 oldcnt = 0;
capinfo_t capinfo = { &cap, &buf1, &buf2, 0, &ti, PTHREAD_MUTEX_INITIALIZER };
if (pthread_create(&grabber, NULL, grab_thread, &capinfo)) {
perror("creating grabber thread");
exit(1);
}
#else
cv::Mat buf1;
cv::Mat buf2;
int64 oldcnt = 0;
capinfo_t capinfo = { &cap, &buf1, &buf2, 0, &ti};
std::thread grabber(grab_thread, &capinfo);
#endif
ti.lastns = timestamp();
printf("Startup: %ldns\n", diffnanosecs(ti.lastns,ti.bootns));
bool filterActive = true;
// mainloop
for(bool running = true; running; ) {
// wait for next frame
while (capinfo.cnt == oldcnt) usleep(10000);
oldcnt = capinfo.cnt;
int e1 = cv::getTickCount();
ti.waitns=timestamp();
// switch buffer pointers in capture thread
pthread_mutex_lock(&capinfo.lock);
ti.lockns=timestamp();
cv::Mat *tmat = capinfo.grab;
capinfo.grab = capinfo.raw;
capinfo.raw = tmat;
pthread_mutex_unlock(&capinfo.lock);
// we can now guarantee capinfo.raw will remain unchanged while we process it..
calcinfo.raw = *capinfo.raw;
ti.copyns=timestamp();
if (calcinfo.raw.rows == 0 || calcinfo.raw.cols == 0) continue; // sanity check
if (filterActive) {
// do background detection magic
calc_mask(calcinfo, ti);
// copy background over raw cam image using mask
bg.copyTo(calcinfo.raw,calcinfo.mask);
} // filterActive
if (flipHorizontal && flipVertical) {
cv::flip(calcinfo.raw,calcinfo.raw,-1);
} else if (flipHorizontal) {
cv::flip(calcinfo.raw,calcinfo.raw,1);
} else if (flipVertical) {
cv::flip(calcinfo.raw,calcinfo.raw,0);
}
ti.postns=timestamp();
#if !_WINDOWS
// write frame to v4l2loopback as YUYV
calcinfo.raw = convert_rgb_to_yuyv(calcinfo.raw);
int framesize = calcinfo.raw.step[0]*calcinfo.raw.rows;
while (framesize > 0) {
int ret = write(lbfd,calcinfo.raw.data,framesize);
TFLITE_MINIMAL_CHECK(ret > 0);
framesize -= ret;
}
#else
cv::imshow("Live", calcinfo.raw);
if (cv::waitKey(5) >= 0)
break;
#endif
ti.v4l2ns=timestamp();
if (!debug) {
if (showProgress) {
printf(".");
fflush(stdout);
}
continue;
}
// timing details..
printf("wait:%9ld lock:%9ld [grab:%9ld retr:%9ld] copy:%9ld open:%9ld tflt:%9ld mask:%9ld post:%9ld v4l2:%9ld ",
diffnanosecs(ti.waitns,ti.lastns),
diffnanosecs(ti.lockns,ti.waitns),
ti.grabns,
ti.retrns,
diffnanosecs(ti.copyns,ti.lockns),
diffnanosecs(ti.openns,ti.copyns),
diffnanosecs(ti.tfltns,ti.openns),
diffnanosecs(ti.maskns,ti.tfltns),
diffnanosecs(ti.postns,ti.maskns),
diffnanosecs(ti.v4l2ns,ti.postns));
int e2 = cv::getTickCount();
float t = (e2-e1)/cv::getTickFrequency();
printf("FPS: %5.2f\e[K\r",1.0/t);
fflush(stdout);
ti.lastns = timestamp();
if (debug < 2) continue;
cv::Mat test;
cv::cvtColor(calcinfo.raw,test,CV_YUV2BGR_YUYV);
cv::imshow("output.png",test);
auto keyPress = cv::waitKey(1);
switch(keyPress) {
case 'q':
running = false;
break;
case 's':
filterActive = !filterActive;
break;
case 'h':
flipHorizontal = !flipHorizontal;
break;
case 'v':
flipVertical = !flipVertical;
break;
}
}
pthread_mutex_lock(&capinfo.lock);
capinfo.grab = NULL;
pthread_mutex_unlock(&capinfo.lock);
printf("\n");
return 0;
} |
Looking at your code changes this liiks like you are compiling with MSVC in pre-C++11 mode, as #if !_WINDOWS
return normalization_t{.scaling = 1/255.0, .offset = 0};
#else
{
normalization_t norm;
norm.scaling = 1 / 255.0;
norm.offset = 0;
return norm;
}
#endif uses the old style assignments. Alternatively Apart from that you probably may want to split out the platform dependent stuff into their own implementation files with one common header for providing the interface for these functions. As you mentioned problems with pthread: There is some work on getting the whole source code up to C++11 and thus also use the STL thread library everywhere. If you didn't have a look at the NB: Knowing the OpenSSL source code I got to loathe negative |
Thanks. I'll look into enabling full C++ 11 compliance in Visual Studio. The only thing is that if it has to be done at the Application level as opposed the project level, it could cause confusion for users who would have to change settings in their Visual Studio setup to get it to build. I guess I should have used the experimental branch in the 1st place. I'll do a Windows build for that and see how it works. From 1st glance it seems that besides the pthreadpool version, the only thing necessary would be a separate Windows viewer. I think the xwindows stuff wouldn't work on Windows, although it looks like you're using OpenGL to render and that's possible on Windows. One thing that confuses me is I don't understand the necessity of converting to and from YUV when displaying frames. |
The default pixel format for the virtual camera device is YUV, but some parts of the code operate on RGB data. |
Almost correct 😺 We are using OpenCV to render a monitor/debug video stream on screen which would work in Windows, however applications (eg: Zoom, Teams, etc) are expected to consume our processed video via a virtual camera. The work required to produce a virtual camera on Windows is non-trivial (and gross), it's much easier for us to plug into someone who has done the hard work, hence the separation into a library and wrapper app in the experimental branch and my OBS Studio plug-in that uses the library as a demo (plus thinking about GStreamer and Pipewire plugins, and whatever macOS has). FYI here's the OBS studio code (which uses DirectShow to create a virtual camera), and the relevant SO thread 😄 https://github.com/obsproject/obs-studio/tree/master/plugins/win-dshow ..compared to the code I needed to write for OBS Studio (233 LoC, one file): https://github.com/phlash/obs-backscrub.
Yep, after some experimentation by @floe to find out what works in most consumer applications (turns out, they don't like RGB at all). The v4l2loopback module will transport almost any format as long as it can identify frame boundaries. [edit] to note that OBS Studio uses NV12 video format for the Windows virtual camera (likely for similar reasons). |
So you would want obs-backscrub to build on Windows as well? The Stackoverflow article says a virtual camera in Windows is a kernel mode driver, which beyond building and testing can be complex to install. Maybe I misread it. If it's being targeted to OBS users who already have it installed, then it's probably useful to them. I'm not saying it's not otherwise worthwhile, just trying to clarify what needs to be done. Thanks. |
AIUI OBS Studio's code and the SO post's more detailed answer uses DirectShow to create a virtual camera on Windows without a kernel driver, but there is still a lot of code to create a COM object and register it, etc. etc. Other operating systems will differ again, hence we thought it wise to avoid repeating the work others have done (in OBS Studio and other media processing frameworks), and concentrate on the unique / valuable aspect here - using a TFLite model to scrub off the background - allowing others to connect that into their chosen video processing workflow / tools, while providing one implementation for Linux via a v4l2loopback virtual camera as it's easy. Thus the separation into Regards targetting - I chose OBS Studio as earlier commenters mentioned how popular it is amongst the streaming community and it lacks the feature we have here. If you have a different use case, then by all means address that itch first! |
I'm going to have to play around with some of the virtual camera examples to see what's possible. This https://github.com/Fenrirthviti/obs-virtual-cam apparently works by making the output of OBS a DirectShow virtual device. Since you already have an OBS plugin, that means people could use OBS + your plugin implemented on Windows to have a virtual camera. If you want to have your viewer also output as a virtual camera with OBS, that's more complex. On a slightly different topic, have you done any tests with like 1920x1080 video to see what it looks like? I'm curious as to how a 244x160 (? or whatever it is) mask performs and what kind of filters make make it look better with broadcast quality video. If there's going to be a generic lib, that may be an issue. Thanks. |
Have fun! 😄 - just before you hop down that rabbit hole, would you mind sharing your current build environment info that compiles TFLite, as I have not been able to build anything so far with VS2019 build tools and Microsoft supplied CMake?
Yep, this was my expectation - I think the later OBS (26+) pulled the parent fork (https://github.com/CatxFish/obs-virtual-cam) into their distribution (the code I referenced above looks very similar, and is in the core repo). There is also mention of a MacOS virtual camera in the README for the parent fork.
I don't think this is necessary, although I did wonder if we could load/reuse the OBS virtual camera plugin DLL for ourselves? That does seem like an odd use case though - if someone has installed OBS they probably want to use all of it, not just have us steal a bit of it...
This is a good question - I haven't myself (only having a cheapo webcam!), @floe might have some thoughts on HD+ video processing? There are some non-real time HD+ projects that get a mention in the 'other code bases' thread #58 too. |
I have two PCs with HD webcams each, that I successfully managed to use as 1280x720 MJPG video source for backscrub. The result is okay-ish but you notice some blockish artifacts at the border of the mask. There's also an issue on this subject, cf. #72 for smoothing and #65 multiple segmentation passes per image … |
Thanks. I'm trying to build the experimental branch and I'm having some trouble with CMake. The main branch downloaded Tensorflow + lite for me, and I had to fix a few issues, but the experimental branch gives me an error: CMake Error at CMakeLists.txt:17 (add_subdirectory): I'm completely clueless about CMake. Is there somewhere in CMakeLists.txt where I should look to fix this problem? Thank you. |
That looks like the Tensorflow source tree is not present? I usually do the following: % cd backscrub
% git submodule update --init --recursive
% cd tensorflow
% git log to ensure I have Tensorflow source at the expected version, before running any Note that the GNU |
That worked, thanks. I mentioned an issue on obs-backscrub concerning libobs. The experimental branch of backscrub builds on Windows, except for pthreads issues with deepseg.cc. Are pthreads going to be taken out in favor of std::thread and std::mutex? Also, I'm still a bit unclear on the directshow implementation and displaying video in deepseg.cc. Displaying the results of webcam input with background replacement on an OpenCV video display is trivial in Windows without any need for using the loopback mechanism. If there's going to be any directshow virtual camera implementation in conjunction with the deepseg.cc sample app, it will definitely complicate the project significantly. |
Yes. The goal is to replace |
I can't see any direct references to ^ https://github.com/microsoft/STL/blob/main/stl/src/cthread.cpp |
My apologies. I was cloning the wrong branch. I think what I'll do is modify the code to what I think will work on Windows and post as pull requests. One issue that will take some experimenting is with CMake adding a compiler option for Visual Studio. https://stackoverflow.com/questions/64889383/how-to-enable-stdclatest-in-cmake |
OK, so I've spent some time with a fresh Azure development VM: starting with the standard Microsoft template: Windows Server 2019 plus VS2019 Community image. Here's what works:
% cmake -B build -D CMAKE_PREFIX_PATH=C:\Packages\opencv\build
% cmake --build build -t backscrub
% cmake -B build -D CMAKE_PREFIX_PATH=C:\Packages;C:\Packages\opencv\build
% cmake --build build
^ https://github.com/phlash/backscrub/tree/windows-build |
Hi, So I try to use the static lib instead but my application is compile in MT and Tflite is compile in MD and I can't compile it in MT, Did some of you try to do it already and could help me ? ( I post a message on tensorflow's github but no answer since ) |
@JVpltr - I didn't look very deeply into the TFlite Windows build, as it 'just worked'. From here it seems CMake defaults to I'm surprised there is any material performance difference between a DLL and a static library version of TFLite, unless your application is loading TFLite each time you [edited to add] Does your DLL build of TFLite have XNNPACK enabled? This was a huge performance improvement (2x) for us. |
Hi, Yes I have tried to force the MT compilation in TfLite CmakeFile ( and all the dependences), and even if the compilation are (it seems like) in MT, I have some errors with dependence librairy like : Error LNK2001 Unresolved external symbol "void __cdecl ruy::KernelFloatAvx(struct ruy::KernelParamsFloat<8,8> const &)" (?KernelFloatAvx@ruy@@YAXAEBU?$KernelParamsFloat@$07$07@1@@z)peerconnection_client .......\tensorflow-lite.lib(fully_connected.obj) I configure my interpreter only one time and the invoke is call for each frame the same (more or less than in deepseg) . It surprised me too so I check by putting the dll library I compile in the deepseg program, and the invoke take also more time . |
I've got some news. So the compilation in MT work fine but I had to link the tfLite static lib and all the dependences in my project (wich wasn't obvious for me cause in a static lib you are suppose to have all the symbols idk). |
Yay - well done 😄 As you have discovered, static libraries don't get merged together as you build a tree of them, so you need all the transitive dependencies when linking the final binary. I wrote a macro in https://github.com/floe/backscrub/blob/experimental/CMakeLists.txt to collect all these and export them for callers to link with, feel free to borrow that one! |
Sorry for not following this thread. Are there any remaining Windows issues that need to be dealt with? Thanks. |
@OmarJay1 - Aside from the compatibility fixes I had to make in my fork (to avoid having to pass |
Right! We have now merged everything together, so this is next 😄 |
Never heard of v4l2loopback. Is it hard to make this thing work in windows? Because I want to be able to just go to a computer, any computer, which 99% of the time is Windows, and run this :)
The text was updated successfully, but these errors were encountered: