Best settings for speed in transformer/NER model in cloud GPU #9039

felipefoschiera · 2021-08-24T23:56:18Z

felipefoschiera
Aug 24, 2021

Hi everyone,
I am using the spaCy CLI to train a model for named entity recognition with transformers in the GPU. I am using a Google Cloud Compute Engine with the following configuration:

n1-highmem-2 machine (2 vCPUs, 13GB RAM)
NVIDIA Tesla T4 GPU
standard disk

I would like to know which options in my config file I can change to try and improve the speed to execute my training. I'm not very familiar with the parameters (such as batch_size), so I would like some orientation on what seems the best option here.

Also, I'm not aware about the I/O utilization of spaCy trainings, but would using a SSD storage improve the speed as well?

This is my config file:

[paths]
train = null
dev = null
vectors = null
init_tok2vec = null

[system]
gpu_allocator = "pytorch"
seed = 0

[nlp]
lang = "en"
pipeline = ["transformer","ner"]
batch_size = 128
disabled = []
before_creation = null
after_creation = null
after_pipeline_creation = null
tokenizer = {"@tokenizers":"spacy.Tokenizer.v1"}

[components]

[components.ner]
factory = "ner"
incorrect_spans_key = null
moves = null
update_with_oracle_cut_size = 100

[components.ner.model]
@architectures = "spacy.TransitionBasedParser.v2"
state_type = "ner"
extra_state_tokens = false
hidden_width = 64
maxout_pieces = 2
use_upper = false
nO = null

[components.ner.model.tok2vec]
@architectures = "spacy-transformers.TransformerListener.v1"
grad_factor = 1.0
pooling = {"@layers":"reduce_mean.v1"}
upstream = "*"

[components.transformer]
factory = "transformer"
max_batch_items = 4096
set_extra_annotations = {"@annotation_setters":"spacy-transformers.null_annotation_setter.v1"}

[components.transformer.model]
@architectures = "spacy-transformers.TransformerModel.v1"
name = "roberta-base"

[components.transformer.model.get_spans]
@span_getters = "spacy-transformers.strided_spans.v1"
window = 128
stride = 96

[components.transformer.model.tokenizer_config]
use_fast = true

[corpora]

[corpora.dev]
@readers = "spacy.Corpus.v1"
path = ${paths.dev}
max_length = 555
gold_preproc = false
limit = 0
augmenter = null

[corpora.train]
@readers = "spacy.Corpus.v1"
path = ${paths.train}
max_length = 555
gold_preproc = false
limit = 0
augmenter = null

[training]
accumulate_gradient = 3
dev_corpus = "corpora.dev"
train_corpus = "corpora.train"
seed = ${system.seed}
gpu_allocator = ${system.gpu_allocator}
dropout = 0.1
patience = 1600
max_epochs = 0
max_steps = 20000
eval_frequency = 200
frozen_components = []
annotating_components = []
before_to_disk = null

[training.batcher]
@batchers = "spacy.batch_by_padded.v1"
discard_oversize = true
size = 2000
buffer = 256
get_length = null

[training.logger]
@loggers = "spacy.ConsoleLogger.v1"
progress_bar = false

[training.optimizer]
@optimizers = "Adam.v1"
beta1 = 0.9
beta2 = 0.999
L2_is_weight_decay = true
L2 = 0.01
grad_clip = 1.0
use_averages = false
eps = 0.00000001

[training.optimizer.learn_rate]
@schedules = "warmup_linear.v1"
warmup_steps = 250
total_steps = 20000
initial_rate = 0.00005

[training.score_weights]
ents_f = 1.0
ents_p = 0.0
ents_r = 0.0
ents_per_type = null

[pretraining]

[initialize]
vectors = ${paths.vectors}
init_tok2vec = ${paths.init_tok2vec}
vocab_data = null
lookups = null
before_init = null
after_init = null

[initialize.components]

[initialize.tokenizer]

Answered by polm

Aug 30, 2021

Sorry for the delayed reply on this, it's kind of a tricky question to answer.

The easiest way to make training on GPU faster is to reduce the number of training iterations. Obviously that also decreases the quality of your model.

Without sacrificing training iterations, your best bet is to maximize use of memory at any given time, so that you can cover more training examples with fewer batches. For that you should use the largest batch size you can get away with. For other parameters, they can make a difference, but how they interact and work with speed will be more complicated to define.

An SSD should not make a significant difference - usually the volume of data is small and it'll just…

View full answer

polm · 2021-08-30T06:26:44Z

polm
Aug 30, 2021

Sorry for the delayed reply on this, it's kind of a tricky question to answer.

The easiest way to make training on GPU faster is to reduce the number of training iterations. Obviously that also decreases the quality of your model.

Without sacrificing training iterations, your best bet is to maximize use of memory at any given time, so that you can cover more training examples with fewer batches. For that you should use the largest batch size you can get away with. For other parameters, they can make a difference, but how they interact and work with speed will be more complicated to define.

An SSD should not make a significant difference - usually the volume of data is small and it'll just be loaded into memory at the start anyway, and writes are simple and occasional.

If speed is really important in training, usually your best bet is to not use Transformers, but instead use smaller CPU based models. This can be a good idea if you need to update models frequently (maybe you train on daily news or something) or if you're iterating on data or an app design (and need to frequently change model settings). In the latter case you can also swap in Transformers for the CPU based model after you've figured out the other details.

2 replies

svlandeg Aug 30, 2021
Maintainer

On top of what Paul wrote, I'd also want to add:

eval_frequency = 200

This defines how often the evaluation is run to output loss & dev numbers to your console. If you're not actively monitoring the training, you can significantly increase this number so that the training loop performs less intermediate evaluations on the dev set - this can definitely make a difference when your dev set is large-ish, and it should otherwise not effect the training loop so much.

felipefoschiera Aug 30, 2021
Author

Thanks for your help Paul and Sofie! I'll experiment with those changes :)
And yes, I have a CPU based model that runs pretty quickly, but on my work I'm comparing the results of different models so I needed that GPU with transformers to train a different one from the CPU.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best settings for speed in transformer/NER model in cloud GPU #9039

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Best settings for speed in transformer/NER model in cloud GPU #9039

felipefoschiera Aug 24, 2021

Replies: 1 comment · 2 replies

polm Aug 30, 2021

svlandeg Aug 30, 2021 Maintainer

felipefoschiera Aug 30, 2021 Author

felipefoschiera
Aug 24, 2021

Replies: 1 comment 2 replies

polm
Aug 30, 2021

svlandeg Aug 30, 2021
Maintainer

felipefoschiera Aug 30, 2021
Author