-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproducibility for slot filling task #8
Comments
After changing the code, I ran slot filling on dstc8 fulll data, got 86.93 accuracy, lower than 90.05(the 'BERT' setting) reported in your article. I also tried run few-shot setting on dstc8, and got 42.28, lower than 45.05 as reported. I wonder why one line of code brings such big difference. Also in my previous experiments, if I use a different 'vocab.txt', the performance dropped about 3 percent. |
my script:
|
My scikit-learn and numpy version are not identical to yours. I updated them and got 44.61(45.05 reported) few-shot accuracy, But only got 85.8 on full dstc8 data. I‘m confused. |
hey we'll look into this discrepancy in the next few days. thanks for your note. |
Any idea? |
Hey @zqwerty The problem is with the original code, default running scripts, and packages from the requirements.txt my score is much lower than the score reported in the paper. |
Hello, Apologies for the long delay in dealing with this issue. It seems that you are trying to reproduce our result with BERT on the DSTC8 dataset. I pulled the training logs from our saved models, and below are the exact values of hyperparameters that are used:
Perhaps the different random seed is impacting the results. |
in
dialoglue/data_utils/process_slot.py
Lines 47 to 49 in 42737da
Slot BIO labels are stored in a python
set
, then saved into a pythonlist
using a for loop.But
set
is unordered. In my experiment,vocab.txt
is different in two runs. So I changed the code toslots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
slots = sorted(list(slots))
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))
and get the same
vocab.txt
for every run.The text was updated successfully, but these errors were encountered: