Skip to content

divergence in token POS and TAG #11730

Nov 1, 2022 · 1 comments · 2 replies
Discussion options

You must be logged in to vote

Quick markdown note: JSON doesn't allow single quotes, so if you mark your blocks as JSON they show up highlighted completely in red as invalid. This looks like Python repr output, so I changed the blocks to Python. It would also be fine to not specify a language.

Anyway, about your question, .pos_ and .tag_ are related but different things - it's not a question of one being "better" or "authoritative". POS is Universal Dependencies tags, which are coarse-grained and designed to be transferable between languages. The values in .tag_ are language-specific tags, which are more fine grained and typically unique to a given language.

Which one you should rely on depends on what you're using th…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@adrianeboyd
Comment options

@asquare
Comment options

Answer selected by asquare
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / tagger Feature: Part-of-speech tagger
3 participants