Very high losses when training a custom NER in SpaCy v3.2 #9971

faouinti · 2022-01-03T12:11:03Z

faouinti
Jan 3, 2022

Hi,

I am trying to train a blank model from scratch for medical NER in SpaCy v3.2. I have around 717 texts with 46 labels (18 816 annotated entities). Even after all epochs, losses NER do not decrease and the model still doesn't predict the output correctly. Here are the results obtained from the last epoch:
E # LOSS TOK2VEC LOSS NER ENTS_F ENTS_P ENTS_R SCORE
499 141400 160940.15 1768.19 24.21 29.10 20.73 0.24

and training parameters in config file:
patience = 0
max_epochs = 500
max_steps = 0
eval_frequency = 200

Why does this happen, how do I train the model properly?

Thanks in advance,
FA

Answered by ljvmiranda921

Jan 4, 2022

Without knowing a lot from your dataset / use-case, it might be tricky to infer how to best train your model.
However here are some sanity-checks that I usually do:

Do you have data imbalance? You can use spaCy's debug data to check that.
Are you using a spaCy NER config that was optimized for accuracy? You can generate your own using init config or refer to the Quickstart here.
Perhaps you can use medspacy as your baseline? If you have access to a GPU, it may be worth to try the en_core_web_trf model as well.
You can probably do some hyperparameter search on dropout, batch size, and learning rate. You can refer to the WandB sweep project to get you started.

View full answer

ljvmiranda921 · 2022-01-04T04:38:06Z

ljvmiranda921
Jan 4, 2022

Without knowing a lot from your dataset / use-case, it might be tricky to infer how to best train your model.
However here are some sanity-checks that I usually do:

Do you have data imbalance? You can use spaCy's debug data to check that.
Are you using a spaCy NER config that was optimized for accuracy? You can generate your own using init config or refer to the Quickstart here.
Perhaps you can use medspacy as your baseline? If you have access to a GPU, it may be worth to try the en_core_web_trf model as well.
You can probably do some hyperparameter search on dropout, batch size, and learning rate. You can refer to the WandB sweep project to get you started.

8 replies

ljvmiranda921 Jan 13, 2022

There are some things you might want to inspect:

What do your entities look like? Perhaps this may be solved by doing two stages of NER. Or a combination of text classification and NER. It may be possible that dividing the problem space can solve your problem better.
If you can label more data, try at least having 100-200 samples per entity. What I'd probably recommend though is to keep their distributions the same.
What does the inter-label P/R/F look like? There you can probably see which labels are being detected correctly (more or less these are the ones with "good" distributions) and those that suffer low Precision-Recall.
Perhaps you have overlapping and very long entities? You can check out SpanCat, here's a good comparison between the two.
I also recommend checking out the HealthSea blogpost. It uses NER on a similar domain, maybe you can find some strategies to solve your own problem.

faouinti Jan 21, 2022
Author

I have taken your remarks into consideration and thank you.

I noticed that when I train my model on 2 entities (gender, age), I get:
gender
p: 0.7837837838
r: 0.4142857143
f: 0.5420560748
age
p: 0.9770992366
r: 0.9142857143
f: 0.9446494465

On the other hand, when I train the same model but with 2 different entities (dose, mode), I get:
dose
p: 0.8611111111
r: 0.4769230769
f: 0.6138613861
mode
p: 0.8623853211
r: 0.4585365854
f: 0.5987261146

But when I train the same model on these 4 entities at the same time (gender, age, dose, mode), P/R/F are null for dose and mode:
gender
p: 0.9044585987
r: 0.5
f: 0.6439909297
age
p: 0.9781021898
r: 0.9273356401
f: 0.9520426288
dose
p: 0
r: 0
f:0
mode
p: 0
r: 0
f:0

Is this normal? Where does this problem comes from?

ljvmiranda921 Jan 21, 2022

This might be unusual. As a sanity-check, are you setting the entities properly in your training / dev data? Also, do your entities overlap? NER does not work effectively with overlapping entities (a quick glance at your label names suggests that they don't, but just to be sure).

Pravin770 Feb 15, 2023

@ljvmiranda921 I tried to train my new spacy model with the company names but received huge loss while training the model. How can I fix it?

Length of the training data seems more than 1000 samples

ex training data:
[('AGS', {'entities': [(0, 3, 'CUST')]}), ('YML SERVICOS LTD', {'entities': [(0, 16, 'CUST')]}), ('BORG GROUP', {'entities': [(0, 10, 'CUST')]}), ('GRABCRANEX', {'entities': [(0, 10, 'CUST')]}), ('GREEN SHIP', {'entities': [(0, 10, 'CUST')]}),

Here is my Code:

from spacy.training import Example
from google.colab import files
from spacy.util import minibatch, compounding
def train_spacy(data, iterations, nlp):  # <-- Add model as nlp parameter
    TRAIN_DATA = data
    # create the built-in pipeline components and add them to the pipeline
    # nlp.create_pipe works for built-ins that are registered with spaCy
    if 'ner' not in nlp.pipe_names:
        ner = nlp.create_pipe('ner')
        nlp.add_pipe('ner', last=True)

    elif 'ner' in nlp.pipe_names:
      ner=nlp.get_pipe("ner")
   

    # add labels
    for _, annotations in TRAIN_DATA:
         for ent in annotations.get('entities'):
                ner.add_label(ent[2])

    # get names of other pipes to disable them during training
    other_pipes = [pipe for pipe in nlp.pipe_names if pipe != 'ner']
    with nlp.disable_pipes(*other_pipes):  # only train NER
        sizes = compounding(4.0, 500.0, 1.001)
        # optimizer = nlp.begin_training()
        optimizer = nlp.create_optimizer()
        for itn in range(iterations):
            print("Statring iteration " + str(itn))
            random.shuffle(TRAIN_DATA)
            batches = minibatch(TRAIN_DATA, size=sizes)
            losses = {}
            for batch in batches:
#                 text, annotations = zip(*batch)
                
#                 print(text, annotations)
#                 print(",".join(text))
                for text, annotations in batch:
                    doc = nlp.make_doc(text)
                    example = Example.from_dict(doc, annotations)
    #                 example = Example.from_dict(doc, {"entities": annotations})
                    nlp.update(
                        [example],  # batch of texts
                           # batch of annotations
                        drop=0.5,  # dropout - make it harder to memorise data
                        sgd=optimizer,  # callable to update weights
                        losses=losses)
                print(losses, 'Iteration Number: '+ str(itn))
            if itn == 10:
                nlp.to_disk("Spacy_CUST_NAME_Model_10epochs")
            if itn == 20:
                nlp.to_disk("Spacy_CUST_NAME_Model_20epochs")
            if itn == 30:
                nlp.to_disk("Spacy_CUST_NAME_Model_30epochs")
            if itn == 40:
                nlp.to_disk("Spacy_CUST_NAME_Model_40epochs")
    return nlp

nlp = spacy.blank('en')  # create blank Language class  # Train new model.
# nlp = spacy.load("/content/content/Spacy_CUST_NAME_Model") #Retrain the old model.
# nlp = spacy.load('en_core_web_sm')
nlp.max_length = 150000000000
start_training = train_spacy(train_data, 10, nlp)

tamish-jain Jul 30, 2024

The example training data you've provided has entities inside of entities ? Could you clarify this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very high losses when training a custom NER in SpaCy v3.2 #9971

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 8 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Very high losses when training a custom NER in SpaCy v3.2 #9971

faouinti Jan 3, 2022

Replies: 1 comment · 8 replies

ljvmiranda921 Jan 4, 2022

ljvmiranda921 Jan 13, 2022

faouinti Jan 21, 2022 Author

ljvmiranda921 Jan 21, 2022

Pravin770 Feb 15, 2023

tamish-jain Jul 30, 2024

faouinti
Jan 3, 2022

Replies: 1 comment 8 replies

ljvmiranda921
Jan 4, 2022

faouinti Jan 21, 2022
Author