Slow training speed for textcat pipeline #4633
Replies: 4 comments
-
Sparse categories are definitely a problem with the current training format. I haven't tried using the It's hard to guess what's going on from the code provided here. I would suggest profiling with something like |
Beta Was this translation helpful? Give feedback.
-
@adrianeboyd Hi! Thanks for your reply! Sorry the I would need to investigate the default model's implementation further. But currently, you can only pass in I would update the ticket later with the profilling result. Thanks for your suggestion! |
Beta Was this translation helpful? Give feedback.
-
Hi @adrianeboyd . I got some
My train script is really simple. The train loop looks like
Also, it seems like the process is still using Thank you for your time! |
Beta Was this translation helpful? Give feedback.
-
That's an interesting analysis. I am not an expert on the thinc internals, so I think @honnibal might need to take a look to see if he knows what might be going on. |
Beta Was this translation helpful? Give feedback.
-
Hi!
I am trying to train a textcat pipeline with over 6000 classes. The training data consists of around 300k documents. I tried to convert my training data to the correct
jsonl
format but that would result in a file size of over 100G. And the initialization ofGoldCorpus
would take forever in writing the message packs. Therefore I wrote the followingTextcatGoldCorpus
class:Then I write a regular train loop to call
nlp.update
for batch size of 64. However, the training is so slow. I am using an Nvidia V100 GPU and the average update speed is around 2-3 documents/second. This would take around two days to train one epoch for my task. I also notice the GPU training does not gain any significant speedup from CPU.I used to train a Convolution model (with PyTorch) on the exact same task and each epoch takes around 3 to 4 hours. I also used to fine-tune Bert Base model on classification task and the entire training finished in around one day with 3 epochs.
I have almost no idea about the potential cause of this slowdown. Please give me some suggestions. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions