Noun Phrase Chunking - Improvements in recall. #4906

naveenjafer · 2020-01-13T18:47:34Z

naveenjafer
Jan 13, 2020

Feature description

Noun Phrase chunking Precision and Recall.

Could the feature be a custom component or spaCy plugin?

It would be a simple modification to existing noun phrase chunker https://github.com/explosion/spaCy/blob/master/spacy/lang/en/syntax_iterators.py#L7

@adrianeboyd had pointed me to https://www.clips.uantwerpen.be/conll2000/chunking/ to experiment with Noun Phrase chunking in spaCy. Going by the test.txt file in the CoNLL 2000 dataset as the ground truth for NP chunking, spaCy's NP chunker had the below performance.

Evaluation Metric	Value (%)
Precision	92.60
Recall	80.83
F Score	86.32

After Simple modifications to the spaCy NP iterator after analysis of the cases that were responsbile for the low recall, the results are as follows.

Evaluation Metric	Value (%)
Precision	91.46
Recall	88.64
F Score	90.03

Some of the phrases that the chunker could'nt handle are as below

A company spokesman said yesterday that Coca-Cola Enterprises sticks by its 1989 forecast.
"yesterday" should have been a valid noun phrase that was missed out on. Spacy misses out occurences of Nouns with npadvmod dependency - Fixed by including it in the labels.
Net loss : $ 1.7 million vs. net income : $ 21.2 million ; or 12 cents a share
12 cents is picked up as a valid NP but $ 1.7 million is not. Fixed by including NUM in POS that initiate the search for NP phrases.
When bank financing for the buy-out collapsed last week, so did UAL's stock.
"last week" is a valid NP that is not identified. Same as example 1 of npadvmod dependencies.
Mr. Wolf owns 75,000 UAL shares and has options to buy another 250,000 at $ 83.3125 each.
"another 250,000" is missed by spaCy. The changes in example 2 fixes the issue here too.

Case that needs a discussion

Great American said it increased its loan-loss reserves by $ 93 million after reviewing its loan portfolio, raising its total loan and real estate reserves to $ 217 million.
The ground truth has considered "its total loan and real estate reserves" as one single NP. spaCy splits them into 2. I am unsure as to which one of them is more desirable. This has got to do with a Coordinating conjunction in between 2 NP phrases.

I can go ahead and raise a MR with the changes and the new test cases that will have to be included if someone can review these cases and let me know if this was an intended spaCy behavioral deviation from the CoNLL 2000 ground truth.

naveenjafer · 2020-01-16T09:19:47Z

naveenjafer
Jan 16, 2020
Author

@honnibal would you happen to have any inputs on this?

0 replies

svlandeg · 2020-01-17T07:54:11Z

svlandeg
Jan 17, 2020
Maintainer

After Simple modifications to the spaCy NP iterator after analysis of the cases that were responsbile for the low recall

Did you analyse these cases on the development set? If so, could you also include performance numbers on the test (if you haven't looked manually into those)? That would ensure we're not overfitting on our evaluation, which is pretty important.

0 replies

naveenjafer · 2020-01-17T15:57:12Z

naveenjafer
Jan 17, 2020
Author

@svlandeg By development set do you mean the training set for that task? There are 2 sets, the training and the test set for the CoNNL 2000 dataset.

Here are the results for the training set - 8936 sentences.

Spacy Noun Chunker current.

Evaluation Metric	Value (%)
Precision	92.73
Recall	80.82
F Score	86.36

Updated spacy Noun Chunker.

Evaluation Metric	Value (%)
Precision	91.76
Recall	88.55
F Score	90.28

This is a very similar result as observed in the test set as published in the previous comment.

0 replies

naveenjafer · 2020-01-22T05:17:34Z

naveenjafer
Jan 22, 2020
Author

@svlandeg shall I go ahead and raise a MR if everything looks good to go?

0 replies

svlandeg · 2020-01-22T07:59:59Z

svlandeg
Jan 22, 2020
Maintainer

Yes, like I said before, proposed changes to code are more easy to review as a PR than as an issue, because it's hard to judge what exactly the changes would be.

Additionally, to be honest, I do worry a bit about your evaluation. If you have consistent accuracy improvements that is ofcourse wonderful. But from your first post, I deduce that you looked at the errors on the test set, and went ahead and addressed those. This is what you would usually use a development set for (which is a % of your training set), because it may mean that you end up overfitting on this dataset that you are tuning on. Ideally, you would have an independent test set that you hadn't looked at yet, not even run on once during design of your methods, and that can be used to get a realistic view of the final performance.

Considering these evaluation issues, an analysis "in the wild" (like I explained before) could be useful for us to understand the difference between the two implementations.

0 replies

adrianeboyd · 2020-01-23T19:49:37Z

adrianeboyd
Jan 23, 2020

I think we need to take a step back and start with a clearer definition of noun chunk. I suggested the CoNLL data as a place to start and didn't necessarily mean for it to become the gold standard. There are cases from the CoNLL data that I don't think we want to have as noun chunks for spacy like all the cases with possessive 's like 's near-record deficits.

Not because I disagree with the linguists who did this conversion from the PTB, but because I think these kinds of phrases would be too unintuitive for typical users (and harder to extract from the dependency trees, which is also a factor). Even the example in the docs includes this kind of phrase: the world’s largest tech fund where the world is not a separate noun chunk.

0 replies

naveenjafer · 2020-01-24T03:12:38Z

naveenjafer
Jan 24, 2020
Author

@adrianeboyd the existing implementation of spaCy does take noun chunks with possesive 's as a whole, so nothing has changed there. There are just 2 places where I have focused on at the moment as in the previous example.

The "npadvmod" label that can include cases like "a company spokesman said yesterday ". The 2 NPs should be "A company spokesman" and "yesterday". Yesterday is not picked up in the current implementation.
If a phrase such as "30 cents" is picked up by spacy, why is it that "30 million" should not be?

Example sentence 1 - "I have $ 30 thousand with me".
I think $ 30 thousand should be a recognized NP, which spaCy currently does not.

Example sentence 2 - "I have 30 cents with me"
30 cents is labelled as NP correctly.

Example sentence 3 - "I have 30 million with me"
I think 30 million should be a recognized NP, which spaCy currently does not.

And regarding the example of "world's largest tech fund" spaCy considers the entire phrase as a NP and so does the CoNLL data, so I am not sure as to what is being implied in that case. Would you like spaCy to break it down further and treat "the world" as a spearate NP?

0 replies

naveenjafer · 2020-01-24T04:02:07Z

naveenjafer
Jan 24, 2020
Author

One more angle that I am working on currently is a BiLSTM + CRF implementation for the task of NP chunking. It is based on https://arxiv.org/pdf/1508.01991.pdf . Further work on this in recent years has brought about F scores of around 96% in Noun chunking tasks. I was wondering if I could try a similar implementation with thinc and explore how it performs on spaCy.

0 replies

adrianeboyd · 2020-01-24T09:30:16Z

adrianeboyd
Jan 24, 2020

I'm saying that (as far as I know) we don't have a formal definition of noun chunk beyond what's in the code and evaluating spacy's noun_chunks using data that was annotated with a different definition of noun chunk is not going to be useful for deciding whether to make changes to spacy.

I do definitely see some problems related to complex NPs. As an example, the second sentence in the CoNLL training data includes the NP Chancellor of the Exchequer Nigel Lawson's restated commitment. On the one hand, in the CoNLL data I don't see why Chancellor isn't an NP:

Chancellor NNP Oq
of IN B-PP
the DT B-NP
Exchequer NNP I-NP
Nigel NNP B-NP
Lawson NNP I-NP
's POS B-NP
restated VBN I-NP
commitment NN I-NP

and on the other, spacy's results are pretty unsatisfactory:

[the Exchequer]

The rules look for a noun chunk head in NOUN, PNOUN, PRON and thousand and million are tagged as NUM.

I think spacy is unlikely to move away from a fast rule-based approach based on the existing tagger/parser output, but the next version of thinc (nearly ready for release) should be much easier to use for experiments like this!

0 replies

naveenjafer · 2020-01-28T17:31:03Z

naveenjafer
Jan 28, 2020
Author

The rules look for a noun chunk head in NOUN, PNOUN, PRON and thousand and million are tagged as NUM.

@adrianeboyd
Correct, so shouldn't we probably be including NUM as a valid Noun chunk head? And npadvmod needs to be included as a part of the labels. Those are the only 2 inclusions I have at the moment to what I think would increase the coverage.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Noun Phrase Chunking - Improvements in recall. #4906

{{title}}

Replies: 10 comments

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Noun Phrase Chunking - Improvements in recall. #4906

naveenjafer Jan 13, 2020

Feature description

Could the feature be a custom component or spaCy plugin?

Replies: 10 comments

naveenjafer Jan 16, 2020 Author

svlandeg Jan 17, 2020 Maintainer

naveenjafer Jan 17, 2020 Author

naveenjafer Jan 22, 2020 Author

svlandeg Jan 22, 2020 Maintainer

adrianeboyd Jan 23, 2020

naveenjafer Jan 24, 2020 Author

naveenjafer Jan 24, 2020 Author

adrianeboyd Jan 24, 2020

naveenjafer Jan 28, 2020 Author

naveenjafer
Jan 13, 2020

naveenjafer
Jan 16, 2020
Author

svlandeg
Jan 17, 2020
Maintainer

naveenjafer
Jan 17, 2020
Author

naveenjafer
Jan 22, 2020
Author

svlandeg
Jan 22, 2020
Maintainer

adrianeboyd
Jan 23, 2020

naveenjafer
Jan 24, 2020
Author

naveenjafer
Jan 24, 2020
Author

adrianeboyd
Jan 24, 2020

naveenjafer
Jan 28, 2020
Author