Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Qwen2-7B测试过程中分数远远低于旧版本的分数 #1629

Open
2 tasks done
Alexyuanfun opened this issue Oct 22, 2024 · 2 comments
Open
2 tasks done

[Bug] Qwen2-7B测试过程中分数远远低于旧版本的分数 #1629

Alexyuanfun opened this issue Oct 22, 2024 · 2 comments
Assignees

Comments

@Alexyuanfun
Copy link

先决条件

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0',
'GPU 0,1,2,3,4,5,6,7': 'NVIDIA A800-SXM4-80GB',
'MMEngine': '0.10.5',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 12.1, V12.1.105',
'OpenCV': '4.10.0',
'PyTorch': '2.1.0+cu121',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2022.2-Product Build 20220804 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.1\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 8.9.2\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.1, '
'CUDNN_VERSION=8.9.2, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'PERF_WITH_AVX512=1, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.12 (main, Jul 5 2023, 18:54:27) [GCC 11.2.0]',
'TorchVision': '0.16.0+cu121',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.3+f390697',
'sys.platform': 'linux',
'transformers': '4.44.2'}

重现问题 - 代码/配置示例

模型 | ceval-circular-4 | mmlu-circular-4 | cmmlu-circular-4 | hellaswag-circular-4 | ARC-e-circular-4 | ARC-c-circular-4 | commonsense_qa-circular-4 | openbookqa_fact-circular-4 | race-middle-circular-4 | race-high-circular-4
使用旧版本opencompass 进行qwen2-7b-base测试,分数为:
Qwen2-7B-base | 21.41 | 42.07 | 52.7 | 42.44 | 64.02 | 60.34 | 65.77 | 83.4 | 78.69 | 73.4
使用新版本分数:
Qwen2-7B-base | 28.84 | 25.74 | 23.44 | 30.19 | 0.71 | 0.34 | 7.62 | 78.6 | 62.26 | 55.66

特别发现ARC-e、ARC-c的分数在base模型上分数很低

重现问题 - 命令或脚本

python run.py configs/eval_qwen2_base.py

重现问题 - 错误信息

分数不一致

其他信息

@tonysy
Copy link
Collaborator

tonysy commented Oct 22, 2024

Would you like to provide more information about the version of opencompass and transformers of different experiments?

@Alexyuanfun
Copy link
Author

Opencompass version is 0.3.3. Transformers version is 4.44.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants