Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

工具使用的四大能力之间有什么区别吗?(调用api,检索api,规划api,通用工具使用) #5

Open
DryPilgrim opened this issue Nov 15, 2023 · 4 comments

Comments

@DryPilgrim
Copy link

如题,
1.工具使用的四大能力之间有什么区别吗?(调用api,检索api,规划api,通用工具使用)
2.他们的测试集、测试方法、评估指标分别是啥呀?

感谢您的回答:)

@brightmart
Copy link
Member

  1. 能力的区别和示例在这里呢:https://github.com/CLUEbenchmark/SuperCLUE-agent#%E7%A4%BA%E4%BE%8B

@brightmart
Copy link
Member

  1. Agent基准参考了OPEN基准,采用被测模型与代表性国际模型进行对战形式,计算胜率。
    具体的,被测模型与3.5进行对战,计算胜(得3分)、平(得1分)、和(得0分)的成绩,算总成绩,并进行归一化。总之,这是相对于同一个基准模型的相对分数或成绩。

@zhangbaijin
Copy link

你好,可以开源评测代码吗

@goqw
Copy link

goqw commented Apr 16, 2024

完全没看懂function calling的评测方法,或许根本就没有。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants