Poor performance of Spacy Model #12685
Unanswered
turulix
asked this question in
Help: Model Advice
Replies: 1 comment 2 replies
-
Could you tell us how you are preprocessing the training data? The presence of HTML tags in the model's input could be affecting its performance. Additionally, please post the output of the |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey!
I'm trying to develop a model that can extract various types of features or technical details from product descriptions. Currently, I have approximately 34 different features that I need to extract, and I have a dataset of around 10,000 labeled documents.
The problem I'm facing is that when I train the model, it consistently returns a score of 0 for 33 of the labels, and only performs decently for one of them. I'm completely puzzled as to what could be causing this issue.
While it is true that I have limited examples for a few features, with only around 50 or so, I find it perplexing that even features with over 1000 examples are not being recognized at all. It's becoming increasingly challenging to determine the underlying cause of this problem.
I'm currently using this base config:
where the callback in before_init is:
Maybe the training data itself is at fault?
Heres a small excerpt from the Training data itself: https://haste.turulix.de/osixocisof.json
Another issue I'm currently facing, for which I'm unsure about the best solution, is handling cases where I'm only interested in a specific part of a word. For example, consider the word Edelstahlmöbel (which means stainless steel furniture in German). When tokenized, it becomes a single token, but I'm primarily interested in the Edelstahl part, which represents the material it is made out of. However, tokenizing the word at the character level doesn't seem like an optimal approach.
While this is just a side note, the main problem I mentioned earlier is the one I'm primarily seeking assistance with. I'm uncertain if there might be a correlation between these two issues, but any insights or suggestions related to either problem would be greatly appreciated.
Thank you in advance for your assistance. Best regards and heartfelt thanks in advance,
turulix
Beta Was this translation helpful? Give feedback.
All reactions