divergence in token POS and TAG #11730
-
I'm seeing a divergence in a token's POS and TAG, wondering which one is authoritative? For example, in a certain sentence starting with {'text': 'get', 'lemma': 'get', 'pos': 'VERB', 'dep': 'ROOT', 'tag': 'VB'} Here POS and TAG are in agreement, but with a slightly different sentence also starting with {'text': 'get', 'lemma': 'get', 'pos': 'AUX', 'dep': 'aux', 'tag': 'VB'} Here POS and TAG have diverged. Is it safer to rely on TAG rather than POS? I've been experimenting with different model sizes, this behavior has been observed with
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Quick markdown note: JSON doesn't allow single quotes, so if you mark your blocks as JSON they show up highlighted completely in red as invalid. This looks like Python Anyway, about your question, Which one you should rely on depends on what you're using them for. |
Beta Was this translation helpful? Give feedback.
Quick markdown note: JSON doesn't allow single quotes, so if you mark your blocks as JSON they show up highlighted completely in red as invalid. This looks like Python
repr
output, so I changed the blocks to Python. It would also be fine to not specify a language.Anyway, about your question,
.pos_
and.tag_
are related but different things - it's not a question of one being "better" or "authoritative". POS is Universal Dependencies tags, which are coarse-grained and designed to be transferable between languages. The values in.tag_
are language-specific tags, which are more fine grained and typically unique to a given language.Which one you should rely on depends on what you're using th…