This is the official repository of KoBBQ: Korean Bias Benchmark for Question Answering (TACL 2024).
- Our KoBBQ datasets and survey results can be found in KoBBQ/data & Huggingface Datasets.
Category | # of Templates | # of Samples |
---|---|---|
Age | 21 | 3,608 |
Disability Status | 20 | 2,160 |
Gender Identity | 25 | 768 |
Physical Appearance | 20 | 4,040 |
Race/Ethnicity/Nationality | 43 | 51,856 |
Religion | 20 | 688 |
Socio-Economic Status | 27 | 6,928 |
Sexual Orientation | 12 | 552 |
Domestic Area of Origin | 22 | 800 |
Family Structure | 23 | 1,096 |
Political Orientation | 11 | 312 |
Education Background | 24 | 3,240 |
Total | 268 | 76,048 |
- Our evaluation codes and prompts can be found in KoBBQ/evaluation.
- Put model outputs to
prediction
column in KoBBQ/data/KoBBQ_test_samples.tsv and save the file asKoBBQ/evaluation/outputs/KoBBQ_test/KoBBQ_test_evaluation_1_{$MODEL}.tsv
.- The model outputs should be one of the choices (as in
choices
column). Otherwise, they will be regarded as out-of-choice answers.
- The model outputs should be one of the choices (as in
- Run KoBBQ/evaluation/5_evaluation.py with
test
option.cd evaluation python3 5_evaluation.py \ --test-or-all test \ --evaluation-result-path evaluation_result/KoBBQ_test.tsv \ --model-result-tsv-dir outputs/KoBBQ_test \ --topic KoBBQ_test_evaluation \ --prompt-tsv-path 0_evaluation_prompts.tsv \ --prompt-id 1 \ --models $MODEL
- Put model outputs to
prediction
column in KoBBQ/data/KoBBQ_all_samples.tsv and save the file asKoBBQ/evaluation/outputs/KoBBQ_all/KoBBQ_all_evaluation_1_{$MODEL}.tsv
.- The model outputs should be one of the choices (as in
choices
column). Otherwise, they will be regarded as out-of-choice answers.
- The model outputs should be one of the choices (as in
- Run KoBBQ/evaluation/5_evaluation.py with
all
option.cd evaluation python3 5_evaluation.py \ --test-or-all all \ --evaluation-result-path evaluation_result/KoBBQ_all.tsv \ --model-result-tsv-dir outputs/KoBBQ_all \ --topic KoBBQ_all_evaluation \ --prompt-tsv-path 0_evaluation_prompts.tsv \ --prompt-id 1 \ --models $MODEL
- We do not condone any malicious use of our dataset. It must not be used as training data to automatically generate and publish biased languages targeting specific groups. We strongly encourage researchers and practitioners to utilize this dataset in beneficial ways, such as mitigating bias in language models.
@article{10.1162/tacl_a_00661,
author = {Jin, Jiho and Kim, Jiseon and Lee, Nayeon and Yoo, Haneul and Oh, Alice and Lee, Hwaran},
title = "{KoBBQ: Korean Bias Benchmark for Question Answering}",
journal = {Transactions of the Association for Computational Linguistics},
volume = {12},
pages = {507-524},
year = {2024},
month = {05},
issn = {2307-387X},
doi = {10.1162/tacl_a_00661},
url = {https://doi.org/10.1162/tacl_a_00661},
eprint = {https://direct.mit.edu/tacl/article-pdf/doi/10.1162/tacl_a_00661/2369542/tacl_a_00661.pdf}
}