reproducibility for slot filling task #8

libing125 · 2020-10-22T10:03:24Z

in

Lines 47 to 49 in 42737da

    
           slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])]) 
        
           vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]] 
        
           json.dump(vocab, open(dataset + "vocab.txt", "w+"))

Slot BIO labels are stored in a python set, then saved into a python list using a for loop.
But set is unordered. In my experiment, vocab.txt is different in two runs. So I changed the code to

slots = set([slot['slot'] for row in train_data for slot in row.get('labels', [])])
slots = sorted(list(slots))
vocab = ["O"] + [prefix + slot for slot in slots for prefix in ["B-", "I-"]]
json.dump(vocab, open(dataset + "vocab.txt", "w+"))

and get the same vocab.txt for every run.

The text was updated successfully, but these errors were encountered:

libing125 · 2020-10-22T11:23:01Z

After changing the code, I ran slot filling on dstc8 fulll data, got 86.93 accuracy, lower than 90.05(the 'BERT' setting) reported in your article. I also tried run few-shot setting on dstc8, and got 42.28, lower than 45.05 as reported. I wonder why one line of code brings such big difference. Also in my previous experiments, if I use a different 'vocab.txt', the performance dropped about 3 percent.
Thanks!

libing125 · 2020-10-22T11:25:40Z

my script:

export BERT_MODEL_DIR=pretrained_models/bert-base-uncased
export BERT_VOCAB_PATH=$BERT_MODEL_DIR/vocab.txt

CUDA_VISIBLE_DEVICES=3 python3 run.py \
        --train_data_path data_utils/dialoglue/dstc8_sgd/train.json \
        --val_data_path data_utils/dialoglue/dstc8_sgd/val.json \
        --test_data_path data_utils/dialoglue/dstc8_sgd/test.json \
        --token_vocab_path $BERT_MODEL_DIR/vocab.txt \
        --train_batch_size 64 --dropout 0.1 --num_epochs 100 --learning_rate 6e-5 \
        --model_name_or_path $BERT_MODEL_DIR --task slot --do_lowercase --max_seq_length 50 --dump_outputs \

libing125 · 2020-10-22T13:21:18Z

My scikit-learn and numpy version are not identical to yours. I updated them and got 44.61(45.05 reported) few-shot accuracy, But only got 85.8 on full dstc8 data. I‘m confused.

mihail-amazon · 2020-10-22T17:50:12Z

hey we'll look into this discrepancy in the next few days. thanks for your note.

zqwerty · 2020-11-10T12:54:11Z

hey we'll look into this discrepancy in the next few days. thanks for your note.

Any idea?

nlpist · 2021-01-15T11:45:53Z

Hey @zqwerty
I wonder if you are able to reproduce the score of the original model (without your changes)?

The problem is with the original code, default running scripts, and packages from the requirements.txt my score is much lower than the score reported in the paper.

Shikib · 2021-05-02T22:35:10Z

Hello,

Apologies for the long delay in dealing with this issue. It seems that you are trying to reproduce our result with BERT on the DSTC8 dataset. I pulled the training logs from our saved models, and below are the exact values of hyperparameters that are used:

 "model_name_or_path": "bert-base-uncased", "task": "slot", "mlm_pre": false, "mlm_during": false, "example": false, "use_observers": false, "repeat": 1, "grad_accum": 1, "train_batch_size": 64, "max_seq_length": 50, "num_epochs": 100, "patience": 20, "logging_steps": 100, "do_lowercase": true, "dropout": 0.1, "learning_rate": 6e-05, "adam_epsilon": 1e-08, "weight_decay": 0.0, "device": 0, "max_grad_norm": -1.0, "seed": 33

Perhaps the different random seed is impacting the results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reproducibility for slot filling task #8

reproducibility for slot filling task #8

libing125 commented Oct 22, 2020 •

edited

Loading

libing125 commented Oct 22, 2020

libing125 commented Oct 22, 2020

libing125 commented Oct 22, 2020

mihail-amazon commented Oct 22, 2020

zqwerty commented Nov 10, 2020

nlpist commented Jan 15, 2021

Shikib commented May 2, 2021

reproducibility for slot filling task #8

reproducibility for slot filling task #8

Comments

libing125 commented Oct 22, 2020 • edited Loading

libing125 commented Oct 22, 2020

libing125 commented Oct 22, 2020

libing125 commented Oct 22, 2020

mihail-amazon commented Oct 22, 2020

zqwerty commented Nov 10, 2020

nlpist commented Jan 15, 2021

Shikib commented May 2, 2021

libing125 commented Oct 22, 2020 •

edited

Loading