-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Build for gptj docker fails #14
Comments
Try downgrading |
@ChristinaHsu0115 Please consider renaming the issue. AMD did not submit to v3.1. You are using NVIDIA's code. |
@lapp0 Thanks for help.
(mlperf) jay@mlperf-inference-jay-x86-64-19218:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline" |
I had experience when running inference 3.0 with 2 of A100 PCIE GPU card
And the gptj model is new on inference 3.1.
follow the below link :
https://github.com/mlcommons/inference_results_v3.1/tree/main/closed/NVIDIA#readme
here are the procedure for your refernce
1: make prebuild to enter the container enviroment
2: make build
3. download gptj dataset
4. download gptj model
5. preprocessed gptj data.
6. create custom config file and modify with correct parameter
7. run gptj benchmark with offline scenarios. I got error message as below:
Does anyone how to fix the problem?
(mlperf) test@mlperf-inference-test-x86-64-7440:/work$ make run RUN_ARGS="--benchmarks=gptj --scenarios=offline"
make[1]: Entering directory '/work'
[2024-01-22 10:34:01,320 main.py:230 INFO] Detected system ID: KnownSystem.K905_A100X2
[2024-01-22 10:34:02,953 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/22/2024-10:34:02] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB)
[01/22/2024-10:34:08] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB)
[2024-01-22 10:34:09,676 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines
raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}")
RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:10] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.71s/it]
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict
if f.read(7) == "version":
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
[2024-01-22 10:34:40,406 generate_engines.py:172 INFO] Building engines for gptj benchmark in Offline scenario...
[01/22/2024-10:34:40] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 43, GPU 874 (MiB)
[01/22/2024-10:34:46] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +1957, GPU +346, now: CPU 2105, GPU 1220 (MiB)
[2024-01-22 10:34:47,175 gptj6b.py:103 INFO] Building GPTJ engine in ./build/engines/K905_A100X2/gptj/Offline, use_fp8: False command: python build/TRTLLM/examples/gptj/build.py --dtype=float16 --use_gpt_attention_plugin=float16 --use_gemm_plugin=float16 --max_batch_size=32 --max_input_len=1919 --max_output_len=128 --vocab_size=50401 --max_beam_width=4 --output_dir=./build/engines/K905_A100X2/gptj/Offline --model_dir=build/models/GPTJ-6B/checkpoint-final --enable_context_fmha --enable_two_optimization_profiles
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/work/code/actionhandler/base.py", line 189, in subprocess_target
return self.action_handler.handle()
File "/work/code/actionhandler/generate_engines.py", line 175, in handle
total_engine_build_time += self.build_engine(job)
File "/work/code/actionhandler/generate_engines.py", line 166, in build_engine
builder.build_engines()
File "/work/code/gptj/tensorrt/gptj6b.py", line 115, in build_engines
raise RuntimeError(f"Engine build fails! stderr: {ret.stderr}. See engine log: {stdout_fn} and {stderr_fn}")
RuntimeError: Engine build fails! stderr: [01/22/2024-10:34:48] [TRT-LLM] [I] Loading HF GPTJ model from build/models/GPTJ-6B/checkpoint-final...
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it]
Loading checkpoint shards: 33%|███▎ | 1/3 [00:02<00:05, 2.90s/it]
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 460, in load_state_dict
return torch.load(checkpoint_file, map_location="cpu")
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 868, in load
with _open_zipfile_reader(opened_file) as opened_zipfile:
File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 333, in init
super().init(torch._C.PyTorchFileReader(name_or_buffer))
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 464, in load_state_dict
if f.read(7) == "version":
File "/usr/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 128: invalid start byte
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "build/TRTLLM/examples/gptj/build.py", line 473, in
args = parse_arguments()
File "build/TRTLLM/examples/gptj/build.py", line 146, in parse_arguments
hf_gpt = AutoModelForCausalLM.from_pretrained(args.model_dir)
File "/home/test/.local/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 493, in from_pretrained
return model_class.from_pretrained(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
) = cls._load_pretrained_model(
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 3246, in _load_pretrained_model
state_dict = load_state_dict(shard_file)
File "/home/test/.local/lib/python3.8/site-packages/transformers/modeling_utils.py", line 476, in load_state_dict
raise OSError(
OSError: Unable to load weights from pytorch checkpoint file for 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin' at 'build/models/GPTJ-6B/checkpoint-final/pytorch_model-00002-of-00003.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.
. See engine log: ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stdout and ./build/engines/K905_A100X2/gptj/Offline/gptj-Offline-gpu-b32-fp16.custom_k_99_MaxP.stderr
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/work/code/main.py", line 232, in
main(main_args, DETECTED_SYSTEM)
File "/work/code/main.py", line 145, in main
dispatch_action(main_args, config_dict, workload_setting)
File "/work/code/main.py", line 203, in dispatch_action
handler.run()
File "/work/code/actionhandler/base.py", line 82, in run
self.handle_failure()
File "/work/code/actionhandler/base.py", line 186, in handle_failure
self.action_handler.handle_failure()
File "/work/code/actionhandler/generate_engines.py", line 183, in handle_failure
raise RuntimeError("Building engines failed!")
RuntimeError: Building engines failed!
make[1]: *** [Makefile:37: generate_engines] Error 1
make[1]: Leaving directory '/work'
make: *** [Makefile:31: run] Error 2
(mlperf) test@mlperf-inference-test-x86-64-7440:/work$
The text was updated successfully, but these errors were encountered: