Skip to content

Possible ORG misidentification #13438

Discussion options

You must be logged in to vote

The pretrained spaCy pipelines use probabilistic models trained on example sets. As a result, a model will make errors because:

  • Training sets are not complete enough to know all entities or contexts that would help predicting an entity (or the absence of an entity).
  • The model capacity may be too limited to capture all entities or contexts that would help predicting the entity (or the absence of an entity).
  • Training dynamics.

Accuracy will never be 100% on unseen data. That said, there are several ways in which you can improve prediction:

  • Use a larger model than en_core_web_sm, for instance the lg or trf models. These models are larger and generally have better prediction accuracy.
  • Trai…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@grabastart
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / en English language data and models feat / ner Feature: Named Entity Recognizer perf / accuracy Performance: accuracy
2 participants
Converted from issue

This discussion was converted from issue #13437 on April 15, 2024 08:28.