-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add okapi task #3
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datasets version 2.20.0 removes the default variable 'trust_remote_code' and this is requires for all datasets with custom code.
File "/scratch/users/nus/ytyeo/envs/testOkapi/lib/python3.10/site-packages/datasets/load.py", line 133, in resolve_trust_remote_code
raise ValueError(
ValueError: The repository for aisingapore/m_hellaswag contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/aisingapore/m_hellaswag.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
dataset_kwargs: | ||
revision: dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datasets version 2.20.0 removes the default variable 'trust_remote_code' and this is requires for all datasets with custom code.
File "/scratch/users/nus/ytyeo/envs/testOkapi/lib/python3.10/site-packages/datasets/load.py", line 133, in resolve_trust_remote_code
raise ValueError(
ValueError: The repository for aisingapore/m_hellaswag contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/aisingapore/m_hellaswag.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Add Okapi tasks:
for the following languages:
Implemented a custom sampler to keep
mlmm-evaluation
's behavior of drawingnum_fewshot + 1
samples for fewshot examples.Modification to
mlmm-evaluation
dev
branchlm-evaluation-harness
--write_out
argumentModifications to
lm-evaluation-harness
Replicating results
m_hellaswag
Had to recreate the dataset https://huggingface.co/datasets/aisingapore/m_hellaswag as the
zh
subset does not work on the existing dataset on Hugging Face https://huggingface.co/datasets/alexandrainst/m_hellaswagCommand
lm-evaluation-harness
mlmm-evaluation
Output
m_arc
Task on lm-evaluation-harness main branch does not have num_fewshot set
Command
lm-evaluation-harness
mlmm-evaluation
Output
m_mmlu
Task on lm-evaluation-harness main branch uses
first_n
few shot sampling which differ from mlmm-evaluation's implementationCommand
lm-evaluation-harness
mlmm-evaluation
Output