bug: mmlu evaluation scores are constant #1172

jalling97 · 2024-10-01T14:45:06Z

Steps to reproduce

Deploy LFAI
Run evaluations against deployed instance
View MMLU results

Expected result

Scores to very across each topic category in MMLU

Actual Result

All scores are the same for each category (approx 0.69, which is relatively high)

Visual Proof (screenshots, videos, text, etc)

INFO:root:MMLU task scores:
                            Task    Score
0  high_school_european_history  0.69697
1               business_ethics  0.69697

Additional Context

May need to investigate the underlying implementation of mmlu in deepeval

The text was updated successfully, but these errors were encountered:

jalling97 added the possible-bug 🐛 Something may not be working label Oct 1, 2024

jalling97 added this to the Next (M11) - Conformance | Stability | Documentation milestone Oct 1, 2024

jalling97 mentioned this issue Oct 1, 2024

EPIC: LeapfrogAI Evaluations v1.1 #1171

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: mmlu evaluation scores are constant #1172

bug: mmlu evaluation scores are constant #1172

jalling97 commented Oct 1, 2024

bug: mmlu evaluation scores are constant #1172

bug: mmlu evaluation scores are constant #1172

Comments

jalling97 commented Oct 1, 2024

Steps to reproduce

Expected result

Actual Result

Visual Proof (screenshots, videos, text, etc)

Additional Context