NER Overfitting word position in sentence #9998
-
I trying to build a brand detection from product titles from scratch using Spacy : example : It should detect Apple as brand in "Smart Watch Apple X-160 32 GB". I have curated a large dataset of 98k product titles with their corresponding brands. I also converted the dataset to spacy format for trainset (88k recrod), dev set (10k records), test set (3k record). After only few epochs (4) and a batchsize of 32, I get an F1 score of 0.86 which theoretically is amazing, however when testing it on real world cases (my own examples), it almost always fail to predict the brand if it's not the first word in product title. Example : This is mainly due to the fact that 89% of product titles in my dataset have the brand as their first word. So the model got quite biased toward predicting the first word of a sentence as a brand. Thank you for your suggestions Are there any techniques to avoid this overfitting ? I'm already using dropout with 0.6 as value. The same behaviour has been noticed with other models such as Flair tagging model. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Sorry you're having trouble with this, we have never seen a report of this before. I have worked on a similar model before and observed the same pattern in the data, though it was long enough ago that I was using a CRF or something. The spaCy NER model does not explicitly encode position as a parameter, so it's hard to point at one thing as the cause. My best guesses are:
You might be able to modify hyperparameters to improve this, but I'm not really sure what to recommend... maybe increasing the The most surefire approach is definitely going to be augmenting the input data to provide more variety in the structure of your Docs, whether by adding tokens that should be ignored to the start, or augmenting training data to shuffle the position of the brand. |
Beta Was this translation helpful? Give feedback.
Sorry you're having trouble with this, we have never seen a report of this before. I have worked on a similar model before and observed the same pattern in the data, though it was long enough ago that I was using a CRF or something.
The spaCy NER model does not explicitly encode position as a parameter, so it's hard to point at one thing as the cause. My best guesses are:
NULL → BRAND
transition at the start.You might be able to modif…