Merging spans after creating custom attributes #4806
Replies: 4 comments
-
Notice that the example script sets all the entities before doing any merging. Otherwise the saved spans from the The update for the example script would be: Current version (note the warning in the comment):
Updated version:
|
Beta Was this translation helpful? Give feedback.
-
Hi Adriane, Thank you for your response, however, I still seem to be having problems. Having followed your advice the indices are incorrect. So i've tried to work through the problem and have tried numerous permutations, the lastest version is here: `matches = self.matcher(doc)
This version, however, is returning the following error when iterating over the Do you know where I'm going wrong here please? I know the answer is going to be a simple one, but I just can't see it... |
Beta Was this translation helpful? Give feedback.
-
Ah, sorry, I missed the actual problem the first time around: you're adding the old spans to
This is tricky when you're merging because the span offsets will constantly shift, so it's better not to try to maintain a custom Let's see, the main difference to the deprecated
But if there's some reason that that wouldn't work because of how you're processing the documents, another way is to add a custom attribute as you're merging and then use that to identify the merged tokens afterwards. Here's an example with custom span and token attributes that get merged:
Output:
|
Beta Was this translation helpful? Give feedback.
-
Adriane, This is great, thank you for your comprehensive response. Let me take a look and I'll get back to you with the results. Have a great Christmas! Steve |
Beta Was this translation helpful? Give feedback.
-
I've been using the code examples from the documentation, particularly the 'Custom pipeline components and attribute extensions via a REST API' example. After creating the custom attribute extension there is a section of the code for creating the new spans as follows:
I see, however, that
merge()
has been deprecated, so have developed the following (rather inelegant) solution for the same task:spans
is a list object containing the spans derived from thematcher
This code is throwing up the following error message:
IndexError: [E036] Error calculating span: Can't find a token starting at character offset 1992.
Have also tried this code:
...but have got the same result.
Is there a more elegant solution to allow the creation of custom extensions please?
I would use the Entity Ruler, but am looking to modify the spans as they are created.
Which page or section is this issue related to?
(https://spacy.io/usage/examples)
Beta Was this translation helpful? Give feedback.
All reactions