You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
pass@k is not that smooth sometimes, we may also monitor the test cases pass rated to measure LLMs' ability to handle corner cases.
GPT4 seems to like to do CoT by default. However, most open source LLMs generate code snippets first and then followed by an explanation. We may list the result with CoT separately.
Several LLMs changed the function signature in the generated results. This is not what we want in most cases.
Given that python code still dominates the training data, we should monitor the percentage of Julia/Python specific characters.
For Julia, we may be interested in do, |> , ∉, ÷, @, !, .+, .=, .(, function, and also many built-in functions.
Error types of different LLMs
Some LLMs learned to use external packages beyond built-in functions. We might also be interested in when/how to encourage such behaviors.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
pass@k
is not that smooth sometimes, we may also monitor the test cases pass rated to measure LLMs' ability to handle corner cases.do, |> , ∉, ÷, @, !, .+, .=, .(, function
, and also many built-in functions.Beta Was this translation helpful? Give feedback.
All reactions