Using TextCategorizer on Span instead of Doc #5290

NixBiks · 2020-04-10T09:00:57Z

NixBiks
Apr 10, 2020

I have a pipeline that first split my large document into sentences using the sentenziser. Then I want to classify each sentence. I could have a sentence pipeline (tokenize and predict) and use that on the sentence but I assume I could save the tokenization (since its already done).

My current idea is to extend TextCategorizer and override __call__ to do nothing, add Span._.cats as an extension attribute and add that component to my one and only pipeline. Not sure if I am over complicating things though?

Info about spaCy

spaCy version: 2.2.3
Platform: Linux-5.3.0-45-generic-x86_64-with-glibc2.2.5
Python version: 3.8.2

Answered by NixBiks

Apr 10, 2020

Now that I know of span.as_doc() then yes (didn't notice that one). So I'd change the last line to

text_cat(span.as_doc()).cats

Right?

View full answer

NixBiks · 2020-04-10T09:16:16Z

NixBiks
Apr 10, 2020
Author

Actually now when I think about it. I guess I should

text_cat = TextCategorizer(nlp.vocab)
text_cat.from_disk("/model/textcat", exlude=["vocab"])
text_cat.predict([span])  # and add some logic to link with labels

0 replies

svlandeg · 2020-04-10T09:23:41Z

svlandeg
Apr 10, 2020
Maintainer

Can't you use span.as_doc() ?

0 replies

NixBiks · 2020-04-10T09:25:55Z

NixBiks
Apr 10, 2020
Author

Now that I know of span.as_doc() then yes (didn't notice that one). So I'd change the last line to

text_cat(span.as_doc()).cats

Right?

0 replies

svlandeg · 2020-04-10T09:26:44Z

svlandeg
Apr 10, 2020
Maintainer

Yep!

0 replies

NixBiks · 2020-04-10T09:27:21Z

NixBiks
Apr 10, 2020
Author

Thanks a bunch!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using TextCategorizer on Span instead of Doc #5290

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Using TextCategorizer on Span instead of Doc #5290

NixBiks Apr 10, 2020

Info about spaCy

Replies: 5 comments

NixBiks Apr 10, 2020 Author

svlandeg Apr 10, 2020 Maintainer

NixBiks Apr 10, 2020 Author

svlandeg Apr 10, 2020 Maintainer

NixBiks Apr 10, 2020 Author

NixBiks
Apr 10, 2020

NixBiks
Apr 10, 2020
Author

svlandeg
Apr 10, 2020
Maintainer

NixBiks
Apr 10, 2020
Author

svlandeg
Apr 10, 2020
Maintainer

NixBiks
Apr 10, 2020
Author