You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I found a very interesting result to tokenize a document. The example code is:
import spacy
nlp = spacy.load("en_core_web_sm")
# doc = nlp("Apple is looking at. startup for $1 billion.")
# for token in doc:
# print(token.text, token.pos_, token.dep_)
# Example text
text = '''Panel C: Gene Associations in LUAD and NATs
In LUAD tumors, ZNF71 is associated with JUN, SAMHD1, RNASEL, IFNGR1, IKKB, and EIF2A.
In non-cancerous adjacent tissues (NATs), the associated genes are OAS1, MP3K7, and IFNAR2.'''
# Process the text
doc = nlp(text)
out_sen = []
# Iterate over the sentences
for sent in doc.sents:
if len(sent) != 0:
print(sent.text)
out_sen.append(sent)
The result out_sen's length is 1, and it is treated as a whole sentence. Is this a bug or sth by default? Thanks.
The spacy version is 3.7.6
The text was updated successfully, but these errors were encountered:
Hi, I found a very interesting result to tokenize a document. The example code is:
The result out_sen's length is 1, and it is treated as a whole sentence. Is this a bug or sth by default? Thanks.
The spacy version is 3.7.6
The text was updated successfully, but these errors were encountered: