Parallelize TextCategorizer training #3828
Replies: 2 comments
-
As stated here at stackoverflow spaCy wasn't build to run on mutliple CPUs but to be efficent. You can run some tasks of spaCy in parallel as commented in this issue, but training doesn't seem to be included. Could you quantify your "awfully long long time" and the dataset you're using? I am working with the TextCategorizer as well and don't really face long training times on my not-state-of-the-art CPU. |
Beta Was this translation helpful? Give feedback.
-
In my case, with 120K lines in the dataset, training seems to take about 4 hours (I'm talking about NER here, maybe a different issue?). As I'm still in the beginning, I'd like to be able to iterate faster (and my CPU is at 12% or so during training, so it can do more work) and be able to experiment with parameters, different data layouts, etc. Is there any way, even a hacky way, to make training faster? Maybe the training set is too large? But then I see that if I make it small it learn that well :) Unless I go crazy with the learning rate, which seems to backfire, too. |
Beta Was this translation helpful? Give feedback.
-
Is there a way to do this? It's using only 1 CPU core (by design, it seems) but it's taking an awfully long long time; shame to have 15 other threads sitting idle.
Beta Was this translation helpful? Give feedback.
All reactions