SspaCy NER Model Not Correctly Identifying Entities After Training #13517
Unanswered
aymankoo
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have trained a custom Named Entity Recognition (NER) model using spaCy to recognize specific entities in text queries. The entities I want to recognize are:
CFS for table names.
CREATION_DATE for date columns.
address for location columns.
After training the model, I am testing it with the same text used in the training examples, but the model is not correctly identifying the entities. This is resulting in incorrect SQL queries being generated.
For example:
Expected SQL Query: SELECT CREATION_DATE, address FROM CFS WHERE address = 'Area A'
Generated SQL Query: SELECT incident_date, address FROM TrafficIncidents WHERE location = 'Area A'
Training Data
**```
TRAIN_DATA = [
("Show me all traffic incidents in the TrafficIncidents table", {"entities": [(37, 53, "CFS")]}),
("I want the CREATION_DATE and address from CFS where address is Area A", {"entities": [(10, 23, "CREATION_DATE"), (28, 35, "address"), (41, 44, "CFS"), (52, 59, "address")]}),
("Retrieve the CREATION_DATE and address from CFS", {"entities": [(13, 26, "CREATION_DATE"), (31, 38, "address"), (44, 47, "CFS")]}),
("Give me the details from CFS where the address is Area A", {"entities": [(21, 24, "CFS"), (47, 54, "address")]}),
("List all entries in the CFS with CREATION_DATE", {"entities": [(20, 23, "CFS"), (29, 42, "CREATION_DATE")]}),
("Show me the CREATION_DATE from CFS where address is Area B", {"entities": [(12, 25, "CREATION_DATE"), (31, 34, "CFS"), (41, 48, "address")]}),
import spacy
from spacy.training import Example
import random
Load a blank English model
nlp = spacy.blank("en")
Create a new entity recognizer
if "ner" not in nlp.pipe_names:
ner = nlp.add_pipe("ner")
else:
ner = nlp.get_pipe("ner")
Add the new labels to the entity recognizer
ner.add_label("CFS")
ner.add_label("CREATION_DATE")
ner.add_label("address")
Convert the training data to spaCy's format
examples = [Example.from_dict(nlp.make_doc(text), ann) for text, ann in TRAIN_DATA]
Initialize training
optimizer = nlp.begin_training()
n_iter = 30 # Adjust this based on your observations
patience = 5 # Number of iterations to wait for improvement before stopping
best_loss = float('inf')
no_improvement_count = 0
Start training
for i in range(n_iter): # Number of iterations
random.shuffle(examples)
losses = {}
for example in examples:
nlp.update([example], losses=losses, drop=0.35)
print(f"Iteration {i+1}, Losses: {losses}")
Save the trained model
nlp.to_disk("custom_ner_model")
print("Model trained and saved successfully.")
Beta Was this translation helpful? Give feedback.
All reactions