Skip to content

Custom Sentencer causes poor ner training performance #12873

Discussion options

You must be logged in to vote

The ner component has been developed for traditional named entities, which are typically short noun phrases that never cross sentence boundaries. There's a hard-coded constraint in the ner component to not predict any entities across sentence boundaries.

Your spans don't sound like named entities, so ner might not be the best choice, but if it's working fine otherwise, then a simple solution is to reorder the pipeline components so that the sentence boundaries are set after ner. But you might also want to consider testing other components like spancat that are more flexible in terms of handling longer spans that don't look like short noun phrases.

Replies: 1 comment 8 replies

Comment options

You must be logged in to vote
8 replies
@adrianeboyd
Comment options

@micmizer
Comment options

@adrianeboyd
Comment options

@micmizer
Comment options

@adrianeboyd
Comment options

Answer selected by micmizer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer
2 participants