You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now we can only see the scores of these models, but I'm very interested in how you evaluate these agents.
The text was updated successfully, but these errors were encountered:
SilasTHU
changed the title
Will you release the benchmark dataset, evaluation metrics and methods?
Will you release the benchmark dataset samples, evaluation metrics and methods?
Apr 2, 2024
Now we can only see the scores of these models, but I'm very interested in how you evaluate these agents.
The text was updated successfully, but these errors were encountered: