NER Overfitting word position in sentence #9998

annis-souames · 2022-01-06T12:41:15Z

annis-souames
Jan 6, 2022

I trying to build a brand detection from product titles from scratch using Spacy : example :

It should detect Apple as brand in "Smart Watch Apple X-160 32 GB". I have curated a large dataset of 98k product titles with their corresponding brands. I also converted the dataset to spacy format for trainset (88k recrod), dev set (10k records), test set (3k record).

After only few epochs (4) and a batchsize of 32, I get an F1 score of 0.86 which theoretically is amazing, however when testing it on real world cases (my own examples), it almost always fail to predict the brand if it's not the first word in product title.

Example :
Apple Smart Watch 32GB, it detects Apple successfully
Smart Watch Apple 32GB, it detects Smart as the brand 😢 .

This is mainly due to the fact that 89% of product titles in my dataset have the brand as their first word. So the model got quite biased toward predicting the first word of a sentence as a brand.

Thank you for your suggestions

Are there any techniques to avoid this overfitting ? I'm already using dropout with 0.6 as value. The same behaviour has been noticed with other models such as Flair tagging model.

Answered by polm

Jan 7, 2022

Sorry you're having trouble with this, we have never seen a report of this before. I have worked on a similar model before and observed the same pattern in the data, though it was long enough ago that I was using a CRF or something.

The spaCy NER model does not explicitly encode position as a parameter, so it's hard to point at one thing as the cause. My best guesses are:

Because the tok2vec uses a CNN, even with convolutions, if your signals are always on the left edge that will distort the model, since they will always reach the same weights in the network.
Because the NER model is transition-based, it is overfitting the NULL → BRAND transition at the start.

You might be able to modif…

View full answer

polm · 2022-01-07T05:27:36Z

polm
Jan 7, 2022

Sorry you're having trouble with this, we have never seen a report of this before. I have worked on a similar model before and observed the same pattern in the data, though it was long enough ago that I was using a CRF or something.

The spaCy NER model does not explicitly encode position as a parameter, so it's hard to point at one thing as the cause. My best guesses are:

Because the tok2vec uses a CNN, even with convolutions, if your signals are always on the left edge that will distort the model, since they will always reach the same weights in the network.
Because the NER model is transition-based, it is overfitting the NULL → BRAND transition at the start.

You might be able to modify hyperparameters to improve this, but I'm not really sure what to recommend... maybe increasing the depth would help? I don't think dropout would help with this.

The most surefire approach is definitely going to be augmenting the input data to provide more variety in the structure of your Docs, whether by adding tokens that should be ignored to the start, or augmenting training data to shuffle the position of the brand.

4 replies

annis-souames Jan 7, 2022
Author

Thank you for this great insight, I will look into spacy architecture and try to play with some parameters such depth and width.
Based on your experience, does a CRF model give good unbiased results in this case ?

polm Jan 7, 2022

It has been many years since I trained that model, and I didn't work on it long and it never saw production. I would not go out of my way to use a CRF these days.

svlandeg Jan 7, 2022
Maintainer

My personal two cents: in general, seeing high performance on train/dev but decreased practical performance on actual applications or test data, often means that the training/dev set is not representative enough of your challenge. It might very well be that the model is overfitting on position, in which case it should hopefully be straightforward to do some data augmentation as Paul mentioned, or annotate some more realistic data points. Let us know how you go!

annis-souames Jan 7, 2022
Author

I tried a larger depth but seems that this didn't solve the problem as I was hoping for, I will go for data augmentation since it's the most logical approach and the sure one.
Thank you for great advice !

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NER Overfitting word position in sentence #9998

{{title}}

Replies: 1 comment 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

NER Overfitting word position in sentence #9998

annis-souames Jan 6, 2022

Replies: 1 comment · 4 replies

polm Jan 7, 2022

annis-souames Jan 7, 2022 Author

polm Jan 7, 2022

svlandeg Jan 7, 2022 Maintainer

annis-souames Jan 7, 2022 Author

annis-souames
Jan 6, 2022

Replies: 1 comment 4 replies

polm
Jan 7, 2022

annis-souames Jan 7, 2022
Author

svlandeg Jan 7, 2022
Maintainer

annis-souames Jan 7, 2022
Author