Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build failed on jetson agx orin (Error generating file: build/CMakeFiles/ctranslate2.dir/src/ops/flash-attention/./ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o) #1771

Open
cyu021 opened this issue Sep 5, 2024 · 1 comment

Comments

@cyu021
Copy link

cyu021 commented Sep 5, 2024

I got the following error message when doing "make -j10"

CMake Error at ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o.Release.cmake:280 (message):
  Error generating file
  /workspace/workbench/ctranslate2/build/CMakeFiles/ctranslate2.dir/src/ops/flash-attention/./ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o

Here is the full log:

# make -j10
[  1%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/flash-attention/ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o
[  1%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/cuda/ctranslate2_generated_primitives.cu.o
[  2%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/cuda/ctranslate2_generated_random.cu.o
[  2%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_alibi_add_gpu.cu.o
[  3%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_bias_add_gpu.cu.o
[  3%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_concat_split_slide_gpu.cu.o
[  4%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_conv1d_gpu.cu.o
[  4%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_flash_attention_gpu.cu.o
[  5%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_dequantize_gpu.cu.o
[  6%] Building NVCC (Device) object CMakeFiles/ctranslate2.dir/src/ops/ctranslate2_generated_gather_gpu.cu.o
/workspace/workbench/ctranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): warning: attribute "__global__" does not apply here

/workspace/workbench/ctranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: incomplete type is not allowed

/workspace/workbench/ctranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: identifier "__grid_constant__" is undefined

/workspace/workbench/ctranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: expected a ")"

/workspace/workbench/ctranslate2/include/ctranslate2/ops/flash-attention/flash_fwd_launch_template.h(15): error: expected a ";"

4 errors detected in the compilation of "/workspace/workbench/ctranslate2/src/ops/flash-attention/flash_fwd_split_hdim96_fp16_sm80.cu".
CMake Error at ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o.Release.cmake:280 (message):
  Error generating file
  /workspace/workbench/ctranslate2/build/CMakeFiles/ctranslate2.dir/src/ops/flash-attention/./ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o


make[2]: *** [CMakeFiles/ctranslate2.dir/build.make:371: CMakeFiles/ctranslate2.dir/src/ops/flash-attention/ctranslate2_generated_flash_fwd_split_hdim96_fp16_sm80.cu.o] Error 1
make[2]: *** Waiting for unfinished jobs....
/workspace/workbench/ctranslate2/src/ops/flash_attention_gpu.cu: In function 'void ctranslate2::ops::set_params_splitkv(Flash_fwd_params&, int, int, int, int, int, int, int, cudaDeviceProp*)':
/workspace/workbench/ctranslate2/src/ops/flash_attention_gpu.cu:162:1: warning: unused parameter 'head_size_rounded' [-Wunused-parameter]
  161 |                                    const int num_heads, const int head_size, const int max_seqlen_k, const int max_seqlen_q,
      |                                                                                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  162 |                                    const int head_size_rounded,
      | ^


make[1]: *** [CMakeFiles/Makefile2:98: CMakeFiles/ctranslate2.dir/all] Error 2
make: *** [Makefile:136: all] Error 2

I run cmake with like following without problems.

# cmake -DWITH_MKL=OFF -DWITH_CUDA=ON -DWITH_CUDNN=ON -DOPENMP_RUNTIME=COMP -DBUILD_CLI=OFF -DCUDA_DYNAMIC_LOADING=ON ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Build spdlog: 1.10.0
-- Build type: Release
-- Compiling for multiple CPU ISA and enabling runtime dispatch
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- Using OpenMP: /usr/lib/gcc/aarch64-linux-gnu/9/libgomp.so;/usr/lib/aarch64-linux-gnu/libpthread.so
CMake Warning (dev) at CMakeLists.txt:433 (find_package):
  Policy CMP0146 is not set: The FindCUDA module is removed.  Run "cmake
  --help-policy CMP0146" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

This warning is for project developers.  Use -Wno-dev to suppress it.

-- Found CUDA: /usr/local/cuda (found suitable version "11.4", minimum required is "11.0")
-- Autodetected CUDA architecture(s):  8.7
-- NVCC host compiler: /usr/bin/c++
-- NVCC compilation flags: -std=c++17;-Xcompiler=-fopenmp;-gencode;arch=compute_87,code=sm_87;--expt-relaxed-constexpr;--expt-extended-lambda
-- Found cuDNN include directory: /usr/include
-- Found cuDNN libraries: /usr/lib/aarch64-linux-gnu/libcudnn.so
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_VISIBILITY - Success
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY
-- Performing Test COMPILER_HAS_HIDDEN_INLINE_VISIBILITY - Success
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR
-- Performing Test COMPILER_HAS_DEPRECATED_ATTR - Success
-- Configuring done (2.3s)
-- Generating done (0.1s)
-- Build files have been written to: /workspace/workbench/ctranslate2/build
@minhthuc2502
Copy link
Collaborator

Try to update to cuda 11.7 or 12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants