Noun Phrase Chunking - Improvements in recall. #4906
Replies: 10 comments
-
@honnibal would you happen to have any inputs on this? |
Beta Was this translation helpful? Give feedback.
-
Did you analyse these cases on the development set? If so, could you also include performance numbers on the test (if you haven't looked manually into those)? That would ensure we're not overfitting on our evaluation, which is pretty important. |
Beta Was this translation helpful? Give feedback.
-
@svlandeg By development set do you mean the training set for that task? There are 2 sets, the training and the test set for the CoNNL 2000 dataset. Here are the results for the training set - 8936 sentences. Spacy Noun Chunker current.
Updated spacy Noun Chunker.
This is a very similar result as observed in the test set as published in the previous comment. |
Beta Was this translation helpful? Give feedback.
-
@svlandeg shall I go ahead and raise a MR if everything looks good to go? |
Beta Was this translation helpful? Give feedback.
-
Yes, like I said before, proposed changes to code are more easy to review as a PR than as an issue, because it's hard to judge what exactly the changes would be. Additionally, to be honest, I do worry a bit about your evaluation. If you have consistent accuracy improvements that is ofcourse wonderful. But from your first post, I deduce that you looked at the errors on the test set, and went ahead and addressed those. This is what you would usually use a development set for (which is a % of your training set), because it may mean that you end up overfitting on this dataset that you are tuning on. Ideally, you would have an independent test set that you hadn't looked at yet, not even run on once during design of your methods, and that can be used to get a realistic view of the final performance. Considering these evaluation issues, an analysis "in the wild" (like I explained before) could be useful for us to understand the difference between the two implementations. |
Beta Was this translation helpful? Give feedback.
-
I think we need to take a step back and start with a clearer definition of noun chunk. I suggested the CoNLL data as a place to start and didn't necessarily mean for it to become the gold standard. There are cases from the CoNLL data that I don't think we want to have as noun chunks for spacy like all the cases with possessive Not because I disagree with the linguists who did this conversion from the PTB, but because I think these kinds of phrases would be too unintuitive for typical users (and harder to extract from the dependency trees, which is also a factor). Even the example in the docs includes this kind of phrase: |
Beta Was this translation helpful? Give feedback.
-
@adrianeboyd the existing implementation of spaCy does take noun chunks with possesive 's as a whole, so nothing has changed there. There are just 2 places where I have focused on at the moment as in the previous example.
Example sentence 1 - "I have $ 30 thousand with me". Example sentence 2 - "I have 30 cents with me" Example sentence 3 - "I have 30 million with me" And regarding the example of "world's largest tech fund" spaCy considers the entire phrase as a NP and so does the CoNLL data, so I am not sure as to what is being implied in that case. Would you like spaCy to break it down further and treat "the world" as a spearate NP? |
Beta Was this translation helpful? Give feedback.
-
One more angle that I am working on currently is a BiLSTM + CRF implementation for the task of NP chunking. It is based on https://arxiv.org/pdf/1508.01991.pdf . Further work on this in recent years has brought about F scores of around 96% in Noun chunking tasks. I was wondering if I could try a similar implementation with thinc and explore how it performs on spaCy. |
Beta Was this translation helpful? Give feedback.
-
I'm saying that (as far as I know) we don't have a formal definition of noun chunk beyond what's in the code and evaluating spacy's I do definitely see some problems related to complex NPs. As an example, the second sentence in the CoNLL training data includes the NP
and on the other, spacy's results are pretty unsatisfactory:
The rules look for a noun chunk head in I think spacy is unlikely to move away from a fast rule-based approach based on the existing tagger/parser output, but the next version of thinc (nearly ready for release) should be much easier to use for experiments like this! |
Beta Was this translation helpful? Give feedback.
-
@adrianeboyd |
Beta Was this translation helpful? Give feedback.
-
Feature description
Noun Phrase chunking Precision and Recall.
Could the feature be a custom component or spaCy plugin?
It would be a simple modification to existing noun phrase chunker https://github.com/explosion/spaCy/blob/master/spacy/lang/en/syntax_iterators.py#L7
@adrianeboyd had pointed me to https://www.clips.uantwerpen.be/conll2000/chunking/ to experiment with Noun Phrase chunking in spaCy. Going by the test.txt file in the CoNLL 2000 dataset as the ground truth for NP chunking, spaCy's NP chunker had the below performance.
After Simple modifications to the spaCy NP iterator after analysis of the cases that were responsbile for the low recall, the results are as follows.
Some of the phrases that the chunker could'nt handle are as below
A company spokesman said yesterday that Coca-Cola Enterprises sticks by its 1989 forecast.
"yesterday" should have been a valid noun phrase that was missed out on. Spacy misses out occurences of Nouns with npadvmod dependency - Fixed by including it in the labels.
Net loss : $ 1.7 million vs. net income : $ 21.2 million ; or 12 cents a share
12 cents is picked up as a valid NP but $ 1.7 million is not. Fixed by including NUM in POS that initiate the search for NP phrases.
When bank financing for the buy-out collapsed last week, so did UAL's stock.
"last week" is a valid NP that is not identified. Same as example 1 of npadvmod dependencies.
Mr. Wolf owns 75,000 UAL shares and has options to buy another 250,000 at $ 83.3125 each.
"another 250,000" is missed by spaCy. The changes in example 2 fixes the issue here too.
Case that needs a discussion
The ground truth has considered "its total loan and real estate reserves" as one single NP. spaCy splits them into 2. I am unsure as to which one of them is more desirable. This has got to do with a Coordinating conjunction in between 2 NP phrases.
I can go ahead and raise a MR with the changes and the new test cases that will have to be included if someone can review these cases and let me know if this was an intended spaCy behavioral deviation from the CoNLL 2000 ground truth.
Beta Was this translation helpful? Give feedback.
All reactions