Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lots of database connections created by transcribe_documents? #20

Open
kaedroho opened this issue Jul 23, 2018 · 0 comments
Open

Lots of database connections created by transcribe_documents? #20

kaedroho opened this issue Jul 23, 2018 · 0 comments

Comments

@kaedroho
Copy link

kaedroho commented Jul 23, 2018

Just had a quick browse of the code and noticed that it uses asyncio to create background threads which fetch/extract text from documents.

Is it likely that Django would start handling the next request before the background thread has finished running? Because if the same database connection is used by both the text extraction and the new request at the same time, this could cause issues as database connections are not thread safe.

EDIT: looks like Django has this covered: https://github.com/django/django/blob/master/django/db/utils.py#L142

This might cause another issue: Async IO uses a thread pool of 5 * num_cpus by default which might create too many connections for some users (eg, on shared hosting) so maybe we should add a "concurrency" parameter to the "transcribe_documents" command which allows the user to specify a limit on the number of worker threads? (you can specify this in run_in_executor).

@kaedroho kaedroho changed the title Thread safety? Lots of database connections created by transcribe_documents? Jul 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant