Age as date? #8944
-
I took a text example from https://opennlp.apache.org
# pip install -U spacy
# python -m spacy download en_core_web_sm
import spacy
# Load English tokenizer, tagger, parser and NER
nlp = spacy.load("en_core_web_sm")
# Process whole documents
text = "Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29. Mr. Vinken is chairman of Elsevier N.V., the Dutch publishing group. Rudolph Agnew, 55 years old and former chairman of Consolidated Gold Fields PLC, was named a director of this British industrial conglomerate."
doc = nlp(text)
# Find named entities, phrases and concepts
for entity in doc.ents:
print(entity.label_, " => ", entity.text) Output:
Should age not be represented as https://corenlp.run/ is able to suggest "61 years old" as DURATION. Close, but still not age. I am new to spaCy. Please help! Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
The spaCy NER models are not rule based, and you cannot "tweak" them. It's not like a function in code where you can go in and change a line. The models have a bunch of numbers they use to make decisions, and we give them example documents and they learn the optimal numbers to reproduce the labels we put on the example documents. For English we use OntoNotes for the training data. Looking at OntoNotes documents it looks like they use the If you want to label things as Also see #3052 about the model making errors. |
Beta Was this translation helpful? Give feedback.
The spaCy NER models are not rule based, and you cannot "tweak" them. It's not like a function in code where you can go in and change a line.
The models have a bunch of numbers they use to make decisions, and we give them example documents and they learn the optimal numbers to reproduce the labels we put on the example documents.
For English we use OntoNotes for the training data. Looking at OntoNotes documents it looks like they use the
DATE
label for dates or periods of time, so that's how the spaCy models work. (in the pdf, look for "absolute or relative dates or periods")If you want to label things as
AGE
, you can try training your own model, or using rules to re-labelDATE
entities …