Best settings for speed in transformer/NER model in cloud GPU #9039
-
Hi everyone,
I would like to know which options in my config file I can change to try and improve the speed to execute my training. I'm not very familiar with the parameters (such as batch_size), so I would like some orientation on what seems the best option here. Also, I'm not aware about the I/O utilization of spaCy trainings, but would using a SSD storage improve the speed as well? This is my config file:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Sorry for the delayed reply on this, it's kind of a tricky question to answer. The easiest way to make training on GPU faster is to reduce the number of training iterations. Obviously that also decreases the quality of your model. Without sacrificing training iterations, your best bet is to maximize use of memory at any given time, so that you can cover more training examples with fewer batches. For that you should use the largest batch size you can get away with. For other parameters, they can make a difference, but how they interact and work with speed will be more complicated to define. An SSD should not make a significant difference - usually the volume of data is small and it'll just be loaded into memory at the start anyway, and writes are simple and occasional. If speed is really important in training, usually your best bet is to not use Transformers, but instead use smaller CPU based models. This can be a good idea if you need to update models frequently (maybe you train on daily news or something) or if you're iterating on data or an app design (and need to frequently change model settings). In the latter case you can also swap in Transformers for the CPU based model after you've figured out the other details. |
Beta Was this translation helpful? Give feedback.
Sorry for the delayed reply on this, it's kind of a tricky question to answer.
The easiest way to make training on GPU faster is to reduce the number of training iterations. Obviously that also decreases the quality of your model.
Without sacrificing training iterations, your best bet is to maximize use of memory at any given time, so that you can cover more training examples with fewer batches. For that you should use the largest batch size you can get away with. For other parameters, they can make a difference, but how they interact and work with speed will be more complicated to define.
An SSD should not make a significant difference - usually the volume of data is small and it'll just…