Using TextCategorizer on Span instead of Doc #5290
-
I have a pipeline that first split my large document into sentences using the sentenziser. Then I want to classify each sentence. I could have a sentence pipeline (tokenize and predict) and use that on the sentence but I assume I could save the tokenization (since its already done). My current idea is to extend Info about spaCy
|
Beta Was this translation helpful? Give feedback.
Replies: 5 comments
-
Actually now when I think about it. I guess I should text_cat = TextCategorizer(nlp.vocab)
text_cat.from_disk("/model/textcat", exlude=["vocab"])
text_cat.predict([span]) # and add some logic to link with labels |
Beta Was this translation helpful? Give feedback.
-
Can't you use |
Beta Was this translation helpful? Give feedback.
-
Now that I know of text_cat(span.as_doc()).cats Right? |
Beta Was this translation helpful? Give feedback.
-
Thanks a bunch! |
Beta Was this translation helpful? Give feedback.
Now that I know of
span.as_doc()
then yes (didn't notice that one). So I'd change the last line toRight?