Trying to understand spancat vs NER use-cases #8930
-
I'm trying to understand situations where I am better off using spancat vs NER. Description from the doc Are overlapping entities the only scenario where I should opt for spancat? Let's say my use case is the WORK_OF_ART entity, currently if some span such as "Harry Potter and the Chamber of Secrets" is identified as a WORK_OF_ART , Harry Potter no longer gets identified as a name. Let's say if currently I am using two separate NER models to identify "Harry Potter and the Chamber of Secrets" as WORK_OF_ART and "Harry Potter" as PERSON. Would I be able to train a single spancat model to do this task better than NER? I'm having trouble establishing boundaries on when I "should not" use spancat and why is spancat not a replacement for NER. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
When you use the default NER model, it has the constraint that a single token can't be more than one label, so it learns tradeoffs between different labels. In contrast the spancat component can't make that assumption, so it has more limited information to draw from. To give a concrete if somewhat contrived example, consider these sentences.
In 1, XXX could be a GPE (country, state, city) or a LOC (non-GPE location). In 2 it would not be a GPE (X "John lives at Spain") but could be a LOC ("the North Pole"). In spancat these associations would have to be learned separately for each label type, since the fact that "lives at" is followed by a LOC doesn't rule out a GPE (since multiple labels are possible and the decisions are independent). But in the default NER model, when it sees that "lives at" is often followed by a LOC it automatically takes part of the probability space away from GPE and other labels (because it has to). So because this is a single intellectual step in the basic NER model, there's fewer invalid/useless states for it to get stuck in. The result of this is that generally the basic NER model should have better accuracy. Because of that I would generally recommend the NER model instead of spancat unless you specifically need some of the spancat features. |
Beta Was this translation helpful? Give feedback.
When you use the default NER model, it has the constraint that a single token can't be more than one label, so it learns tradeoffs between different labels. In contrast the spancat component can't make that assumption, so it has more limited information to draw from.
To give a concrete if somewhat contrived example, consider these sentences.
In 1, XXX could be a GPE (country, state, city) or a LOC (non-GPE location). In 2 it would not be a GPE (X "John lives at Spain") but could be a LOC ("the North Pole").
In spancat these associations would have to be learned separately for each label type, since the fact that "lives at" is followed by a LOC doesn…