[Bug] SLM running serve on known-good chat model crashes on `_print_kv_cache_metadata_in_json` #1921

Sing-Li · 2024-03-10T13:53:43Z

🐛 Bug

When I invoke serve.server with mistral-7b supplying the two required arguments, it crashes when trying to start the async_engine within _print_kv_cache_metadata_in_json

To Reproduce

Steps to reproduce the behavior:

use mlc_chat chat to generate/discover the lib name to mistral-7b on your system:

$ python3 -m mlc_chat chat  HF://mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
...
[2024-03-10 07:19:10] INFO download.py:124: Weights already downloaded: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
[2024-03-10 07:19:10] INFO chat_module.py:765: Model lib not found. Now compiling model lib on device...
[2024-03-10 07:19:10] INFO jit.py:34: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-03-10 07:19:10] INFO jit.py:116: Using cached model lib: /home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so

invoke server using the discovered paths as argument to serve:

$ python3 -m mlc_chat.serve.server  --model "/home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC"  --model-lib-path "/home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so"

Serve will crash on _print_kv_cache_metadata_in_json

Expected behavior

Server running servicing request.

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): CUDA
Operating system (e.g. Ubuntu/Windows/MacOS/...): Ubuntu 22.04lts
Device (e.g. iPhone 12 Pro, PC+RTX 3090, ...) rtx 3060 12GB
How you installed MLC-LLM (conda, source): conda
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.10
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information:

Additional context

This is the trace on the crashing run:

$ python3 -m mlc_chat.serve.server  --model "/home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC"  --model-lib-path "/home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so"
[2024-03-10 07:21:25] INFO auto_device.py:76: Found device: cuda:0
[2024-03-10 07:21:26] INFO auto_device.py:85: Not found device: rocm:0
[2024-03-10 07:21:27] INFO auto_device.py:85: Not found device: metal:0
[2024-03-10 07:21:27] INFO auto_device.py:76: Found device: vulkan:0
[2024-03-10 07:21:28] INFO auto_device.py:85: Not found device: opencl:0
[2024-03-10 07:21:28] INFO auto_device.py:33: Using device: cuda:0
[2024-03-10 07:21:28] INFO chat_module.py:373: Using model folder: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
[2024-03-10 07:21:28] INFO chat_module.py:374: Using mlc chat config: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json
[2024-03-10 07:21:28] INFO chat_module.py:516: Using library model: /home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 194, in <module>
    main()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 188, in main
    _print_kv_cache_metadata_in_json(metadata)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 125, in _print_kv_cache_metadata_in_json
    print(json.dumps(metadata["kv_cache"]))
KeyError: 'kv_cache'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/server/__main__.py", line 56, in <module>
    args: argparse.Namespace = parse_args_and_initialize()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/server/__main__.py", line 46, in parse_args_and_initialize
    engine = async_engine.AsyncThreadedEngine(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/async_engine.py", line 151, in __init__
    kv_cache_config.max_total_sequence_length = _estimate_max_total_sequence_length(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/engine.py", line 176, in _estimate_max_total_sequence_length
    kv_cache_metadata_str = subprocess.check_output(cmd, universal_newlines=True)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'mlc_chat.cli.model_metadata', '/home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so', '--print-kv-cache-metadata-in-json']' returned non-zero exit status 1.

The text was updated successfully, but these errors were encountered:

MasterJH5574 · 2024-03-11T18:30:03Z

Hi @Sing-Li, I think recompiling the mistral model will address this issue. I attached these metadata info to models in this PR #1888

Sing-Li · 2024-03-12T16:52:57Z

@MasterJH5574 Recompiling the model with the latest nightly triggered the following error:

$ python3 -m mlc_chat chat HF://mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
[2024-03-12 10:46:08] INFO auto_device.py:76: Found device: cuda:0
...
[2024-03-12 10:46:11] INFO auto_device.py:33: Using device: cuda:0
[2024-03-12 10:46:11] INFO chat_module.py:356: Downloading model from HuggingFace: HF://mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
...
[2024-03-12 10:46:50] INFO download.py:144: Downloaded https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/resolve/main/params_shard_105.bin to /tmp/tmpm10tw_0m/tmp/params_shard_105.bin
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 106/106 [00:38<00:00,  2.77it/s]
[2024-03-12 10:46:50] INFO download.py:145: Moving /tmp/tmpm10tw_0m/tmp to /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
[2024-03-12 10:46:50] INFO chat_module.py:765: Model lib not found. Now compiling model lib on device...
[2024-03-12 10:46:50] INFO jit.py:34: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-03-12 10:46:50] INFO jit.py:93: Compiling using commands below:
[2024-03-12 10:46:50] INFO jit.py:94: /usr/bin/python3 -m mlc_chat compile /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC --opt 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0' --overrides 'sliding_window_size=4096;prefill_chunk_size=4096;attention_sink_size=4;tensor_parallel_shards=1' --device cuda:0 --output /tmp/tmp60mjxvl4/lib.so
[2024-03-12 10:46:51] INFO auto_config.py:69: Found model configuration: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json
[2024-03-12 10:46:51] INFO auto_target.py:83: Detecting target device: cuda:0
[2024-03-12 10:46:51] INFO auto_target.py:85: Found target: {"thread_warp_size": 32, "arch": "sm_86", "max_threads_per_block": 1024, "max_num_threads": 1024, "kind": "cuda", "max_shared_memory_per_block": 49152, "tag": "", "keys": ["cuda", "gpu"]}
[2024-03-12 10:46:51] INFO auto_target.py:102: Found host LLVM triple: x86_64-redhat-linux-gnu
[2024-03-12 10:46:51] INFO auto_target.py:103: Found host LLVM CPU: znver3
[2024-03-12 10:46:51] INFO auto_target.py:269: Generating code for CUDA architecture: sm_86
[2024-03-12 10:46:51] INFO auto_target.py:270: To produce multi-arch fatbin, set environment variable MLC_MULTI_ARCH. Example: MLC_MULTI_ARCH=70,72,75,80,86,87,89,90
[2024-03-12 10:46:51] INFO auto_config.py:153: Found model type: mistral. Use `--model-type` to override.
Compiling with arguments:
  --config          MistralConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=32000, position_embedding_base=1000000.0, num_key_value_heads=8, head_dim=128, sliding_window_size=4096, prefill_chunk_size=4096, attention_sink_size=4, tensor_parallel_shards=1, kwargs={})
  --quantization    GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7)
  --model-type      mistral
  --target          {"thread_warp_size": 32, "host": {"mtriple": "x86_64-redhat-linux-gnu", "tag": "", "kind": "llvm", "mcpu": "znver3", "keys": ["cpu"]}, "arch": "sm_86", "max_threads_per_block": 1024, "libs": ["thrust"], "max_num_threads": 1024, "kind": "cuda", "max_shared_memory_per_block": 49152, "tag": "", "keys": ["cuda", "gpu"]}
  --opt             flashinfer=1;cublas_gemm=0;faster_transformer=1;cudagraph=0
  --system-lib-prefix ""
  --output          /tmp/tmp60mjxvl4/lib.so
  --overrides       context_window_size=None;sliding_window_size=4096;prefill_chunk_size=4096;attention_sink_size=4;max_batch_size=None;tensor_parallel_shards=1
[2024-03-12 10:46:51] INFO config.py:106: Overriding sliding_window_size from 4096 to 4096
[2024-03-12 10:46:51] INFO config.py:106: Overriding prefill_chunk_size from 4096 to 4096
[2024-03-12 10:46:51] INFO config.py:106: Overriding attention_sink_size from 4 to 4
[2024-03-12 10:46:51] INFO config.py:106: Overriding tensor_parallel_shards from 1 to 1
[2024-03-12 10:46:51] INFO compile.py:136: Creating model from: MistralConfig(hidden_size=4096, intermediate_size=14336, num_attention_heads=32, num_hidden_layers=32, rms_norm_eps=1e-05, vocab_size=32000, position_embedding_base=1000000.0, num_key_value_heads=8, head_dim=128, sliding_window_size=4096, prefill_chunk_size=4096, attention_sink_size=4, tensor_parallel_shards=1, kwargs={})
[2024-03-12 10:46:51] INFO compile.py:155: Exporting the model to TVM Unity compiler
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Function 0x5631067665f0 is annotated as pure but contains an impure call: R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_k_cache2, squeeze64, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)).  Please set relax.force_pure to true or use a pure operator variant (e.g., call_pure_packed) if it is necessary to override this judgment.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_k_cache2, squeeze64, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_v_cache2, squeeze65, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_1_self_attn_k_cache2, squeeze66, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_1_self_attn_v_cache2, squeeze67, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_2_self_attn_k_cache2, squeeze68, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_2_self_attn_v_cache2, squeeze69, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_3_self_attn_k_cache2, squeeze70, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_3_self_attn_v_cache2, squeeze71, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_4_self_attn_k_cache2, squeeze72, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_4_self_attn_v_cache2, squeeze73, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_5_self_attn_k_cache2, squeeze74, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_5_self_attn_v_cache2, squeeze75, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_6_self_attn_k_cache2, squeeze76, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_6_self_attn_v_cache2, squeeze77, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_7_self_attn_k_cache2, squeeze78, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_7_self_attn_v_cache2, squeeze79, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_8_self_attn_k_cache2, squeeze80, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_8_self_attn_v_cache2, squeeze81, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_9_self_attn_k_cache2, squeeze82, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_9_self_attn_v_cache2, squeeze83, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_10_self_attn_k_cache2, squeeze84, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_10_self_attn_v_cache2, squeeze85, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_11_self_attn_k_cache2, squeeze86, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_11_self_attn_v_cache2, squeeze87, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_12_self_attn_k_cache2, squeeze88, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_12_self_attn_v_cache2, squeeze89, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_13_self_attn_k_cache2, squeeze90, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_13_self_attn_v_cache2, squeeze91, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_14_self_attn_k_cache2, squeeze92, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_14_self_attn_v_cache2, squeeze93, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_15_self_attn_k_cache2, squeeze94, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_15_self_attn_v_cache2, squeeze95, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_16_self_attn_k_cache2, squeeze96, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_16_self_attn_v_cache2, squeeze97, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_17_self_attn_k_cache2, squeeze98, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_17_self_attn_v_cache2, squeeze99, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_18_self_attn_k_cache2, squeeze100, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_18_self_attn_v_cache2, squeeze101, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_19_self_attn_k_cache2, squeeze102, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_19_self_attn_v_cache2, squeeze103, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_20_self_attn_k_cache2, squeeze104, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_20_self_attn_v_cache2, squeeze105, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_21_self_attn_k_cache2, squeeze106, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_21_self_attn_v_cache2, squeeze107, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_22_self_attn_k_cache2, squeeze108, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_22_self_attn_v_cache2, squeeze109, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_23_self_attn_k_cache2, squeeze110, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_23_self_attn_v_cache2, squeeze111, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_24_self_attn_k_cache2, squeeze112, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_24_self_attn_v_cache2, squeeze113, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_25_self_attn_k_cache2, squeeze114, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_25_self_attn_v_cache2, squeeze115, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_26_self_attn_k_cache2, squeeze116, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_26_self_attn_v_cache2, squeeze117, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_27_self_attn_k_cache2, squeeze118, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_27_self_attn_v_cache2, squeeze119, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_28_self_attn_k_cache2, squeeze120, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_28_self_attn_v_cache2, squeeze121, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_29_self_attn_k_cache2, squeeze122, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_29_self_attn_v_cache2, squeeze123, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_30_self_attn_k_cache2, squeeze124, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_30_self_attn_v_cache2, squeeze125, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_31_self_attn_k_cache2, squeeze126, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_31_self_attn_v_cache2, squeeze127, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Function 0x563106515240 is annotated as pure but contains an impure call: R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_k_cache1, squeeze, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)).  Please set relax.force_pure to true or use a pure operator variant (e.g., call_pure_packed) if it is necessary to override this judgment.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_k_cache1, squeeze, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_0_self_attn_v_cache1, squeeze1, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_1_self_attn_k_cache1, squeeze2, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_1_self_attn_v_cache1, squeeze3, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_2_self_attn_k_cache1, squeeze4, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_2_self_attn_v_cache1, squeeze5, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_3_self_attn_k_cache1, squeeze6, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_3_self_attn_v_cache1, squeeze7, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_4_self_attn_k_cache1, squeeze8, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_4_self_attn_v_cache1, squeeze9, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_5_self_attn_k_cache1, squeeze10, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_5_self_attn_v_cache1, squeeze11, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_6_self_attn_k_cache1, squeeze12, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_6_self_attn_v_cache1, squeeze13, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_7_self_attn_k_cache1, squeeze14, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_7_self_attn_v_cache1, squeeze15, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_8_self_attn_k_cache1, squeeze16, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_8_self_attn_v_cache1, squeeze17, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_9_self_attn_k_cache1, squeeze18, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_9_self_attn_v_cache1, squeeze19, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_10_self_attn_k_cache1, squeeze20, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_10_self_attn_v_cache1, squeeze21, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_11_self_attn_k_cache1, squeeze22, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_11_self_attn_v_cache1, squeeze23, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_12_self_attn_k_cache1, squeeze24, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_12_self_attn_v_cache1, squeeze25, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_13_self_attn_k_cache1, squeeze26, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_13_self_attn_v_cache1, squeeze27, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_14_self_attn_k_cache1, squeeze28, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_14_self_attn_v_cache1, squeeze29, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_15_self_attn_k_cache1, squeeze30, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_15_self_attn_v_cache1, squeeze31, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_16_self_attn_k_cache1, squeeze32, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_16_self_attn_v_cache1, squeeze33, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_17_self_attn_k_cache1, squeeze34, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_17_self_attn_v_cache1, squeeze35, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_18_self_attn_k_cache1, squeeze36, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_18_self_attn_v_cache1, squeeze37, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_19_self_attn_k_cache1, squeeze38, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_19_self_attn_v_cache1, squeeze39, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_20_self_attn_k_cache1, squeeze40, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_20_self_attn_v_cache1, squeeze41, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_21_self_attn_k_cache1, squeeze42, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_21_self_attn_v_cache1, squeeze43, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_22_self_attn_k_cache1, squeeze44, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_22_self_attn_v_cache1, squeeze45, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_23_self_attn_k_cache1, squeeze46, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_23_self_attn_v_cache1, squeeze47, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_24_self_attn_k_cache1, squeeze48, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_24_self_attn_v_cache1, squeeze49, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_25_self_attn_k_cache1, squeeze50, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_25_self_attn_v_cache1, squeeze51, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_26_self_attn_k_cache1, squeeze52, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_26_self_attn_v_cache1, squeeze53, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_27_self_attn_k_cache1, squeeze54, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_27_self_attn_v_cache1, squeeze55, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_28_self_attn_k_cache1, squeeze56, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_28_self_attn_v_cache1, squeeze57, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_29_self_attn_k_cache1, squeeze58, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_29_self_attn_v_cache1, squeeze59, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_30_self_attn_k_cache1, squeeze60, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_30_self_attn_v_cache1, squeeze61, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_31_self_attn_k_cache1, squeeze62, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
[10:46:52] /workspace/tvm/src/relax/analysis/well_formed.cc:125: Warning: This IR is not well formed: Impure function call R.call_packed("vm.builtin.attention_kv_cache_window_override_with_sinks", model_layers_31_self_attn_v_cache1, squeeze63, R.prim_value(4096), R.prim_value(4), sinfo_args=(R.Object,)) occurs within a dataflow block.
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/__main__.py", line 47, in <module>
    main()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/__main__.py", line 24, in main
    cli.main(sys.argv[2:])
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/compile.py", line 131, in main
    compile(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/interface/compile.py", line 230, in compile
    _compile(args, model_config)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/interface/compile.py", line 156, in _compile
    mod, named_params, ext_mods = model.export_tvm(
  File "/home/autoqa/.local/lib/python3.10/site-packages/tvm/relax/frontend/nn/core.py", line 489, in export_tvm
    mod, params, ext_mods = Exporter(debug=debug).build(spec)
  File "/home/autoqa/.local/lib/python3.10/site-packages/tvm/relax/frontend/nn/exporter.py", line 139, in build
    assert rx.analysis.well_formed(mod)
AssertionError
Traceback (most recent call last):
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/chat_module.py", line 756, in __init__
    self.model_lib_path = _get_lib_module_path(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/chat_module.py", line 578, in _get_lib_module_path
    raise FileNotFoundError(err_msg)
FileNotFoundError: Cannot find the model library that corresponds to `None`.
`None` is either provided in the `chat_config` you passed in, or specified in /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json.
We searched over the following possible paths: 
- None-cuda.so
- dist/prebuilt/lib/None-cuda.so
- dist/HF://mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/None-cuda.so
- /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/None-cuda.so
- /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/None-cuda.so
If you would like to directly specify the model library path, you may consider passing in the `ChatModule.model_lib_path` parameter.
Please checkout https://github.com/mlc-ai/notebooks/blob/main/mlc-llm/tutorial_chat_module_getting_started.ipynb for an example on how to load a model.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/__main__.py", line 47, in <module>
    main()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/__main__.py", line 36, in main
    cli.main(sys.argv[2:])
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/chat.py", line 41, in main
    chat(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/interface/chat.py", line 133, in chat
    cm = ChatModule(model, device, chat_config=config, model_lib_path=model_lib_path)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/chat_module.py", line 771, in __init__
    jit.jit(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/interface/jit.py", line 122, in jit
    _run_jit(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/interface/jit.py", line 95, in _run_jit
    subprocess.run(cmd, check=True)
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'mlc_chat', 'compile', '/home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC', '--opt', 'flashinfer=1;cublas_gemm=1;faster_transformer=1;cudagraph=0', '--overrides', 'sliding_window_size=4096;prefill_chunk_size=4096;attention_sink_size=4;tensor_parallel_shards=1', '--device', 'cuda:0', '--output', '/tmp/tmp60mjxvl4/lib.so']' returned non-zero exit status 1.

MasterJH5574 · 2024-03-12T16:54:18Z

Thank you! This was fixed just an hour ago 9df8f03

Sing-Li · 2024-03-12T16:54:48Z

K - will try it out again after nightly 🙏

Sing-Li · 2024-03-12T17:26:02Z

@MasterJH5574 I hand patched the changes - and it compiled the model fine ! 🙏

However, when I invoke serve again with the newly compiled model - there is still the problem with "print kvcache in JSON" failing -- it seems to be an injected argument to my simpler invocation ....

$ python3 -m mlc_chat.serve.server  --model "/home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC"  --model-lib-path "/home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so"
[2024-03-12 11:22:25] INFO auto_device.py:76: Found device: cuda:0
[2024-03-12 11:22:26] INFO auto_device.py:85: Not found device: rocm:0
[2024-03-12 11:22:27] INFO auto_device.py:85: Not found device: metal:0
[2024-03-12 11:22:27] INFO auto_device.py:76: Found device: vulkan:0
[2024-03-12 11:22:28] INFO auto_device.py:85: Not found device: opencl:0
[2024-03-12 11:22:28] INFO auto_device.py:33: Using device: cuda:0
[2024-03-12 11:22:28] INFO chat_module.py:373: Using model folder: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC
[2024-03-12 11:22:28] INFO chat_module.py:374: Using mlc chat config: /home/autoqa/.cache/mlc_chat/model_weights/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json
[2024-03-12 11:22:28] INFO chat_module.py:516: Using library model: /home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 194, in <module>
    main()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 188, in main
    _print_kv_cache_metadata_in_json(metadata)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/cli/model_metadata.py", line 125, in _print_kv_cache_metadata_in_json
    print(json.dumps(metadata["kv_cache"]))
KeyError: 'kv_cache'
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/server/__main__.py", line 56, in <module>
    args: argparse.Namespace = parse_args_and_initialize()
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/server/__main__.py", line 46, in parse_args_and_initialize
    engine = async_engine.AsyncThreadedEngine(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/async_engine.py", line 151, in __init__
    kv_cache_config.max_total_sequence_length = _estimate_max_total_sequence_length(
  File "/home/autoqa/.local/lib/python3.10/site-packages/mlc_chat/serve/engine.py", line 176, in _estimate_max_total_sequence_length
    kv_cache_metadata_str = subprocess.check_output(cmd, universal_newlines=True)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'mlc_chat.cli.model_metadata', '/home/autoqa/.cache/mlc_chat/model_lib/635eeba1d562a81be3d2d543b8cf71dd.so', '--print-kv-cache-metadata-in-json']' returned non-zero exit status 1.

MasterJH5574 · 2024-03-12T17:27:49Z

Ah just noted that you are running Mistral. The mistral support in MLC serve is still on going. We will get it done within one week. Meanwhile, the fix mentioned above will address the issue for compiling Mistral for chat use in Python.

Sing-Li · 2024-03-12T17:30:06Z

That's great. Thanks @MasterJH5574 Is the Llama2 serve completed now? (or which one can I test instead) 🙏

Sing-Li · 2024-03-12T17:40:53Z

Hmm. Quick test of Llama2-7b shows chat model won't compile at this time (thanks to the quick new flow). Thanks, I will track github for progress.

MasterJH5574 · 2024-03-12T17:57:08Z

Yeah supporting JIT compiling in serve is what we are also working on. Ideally it should be finished within one week. At this moment we still need to manually compile the models first.

The serve flow for Llama2 is completed. Here is a gist on how we can launch server for Mixtral for your reference https://gist.github.com/MasterJH5574/ea1ba901938338a27434c18cdcd3f935

We will update the documentations soon so that we don't need to share gist every time 😂

Sing-Li · 2024-03-12T22:40:55Z

Ah that explains everything 🙏 I'll close the other issue. And will wait for JIT inclusion in serve and the CLI update. I intend to update the docker PR to use this latest flow as soon as it is ready - #1271

MasterJH5574 · 2024-04-06T02:13:34Z

Just want to give a (late) update here. The serve CLI and JIT was finished in #2014, thanks to @shreygupta2809 and @Kartik14. I think we are good to conclude this issue now :-)

Sing-Li · 2024-04-10T03:33:43Z

Awesome! Thanks @MasterJH5574 @shreygupta2809 and @Kartik14

The is amazing. I tested with Llama2, Mistral, and Gemma they all worked well.

But did crash with phi as well as openfunction v1/v2 - tagged as separate issue #2113

Sing-Li added the bug Confirmed bugs label Mar 10, 2024

Sing-Li mentioned this issue Mar 12, 2024

[Question] how to run serve on SLM flow? #1914

Closed

MasterJH5574 closed this as completed Apr 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] SLM running serve on known-good chat model crashes on `_print_kv_cache_metadata_in_json` #1921

[Bug] SLM running serve on known-good chat model crashes on `_print_kv_cache_metadata_in_json` #1921

Sing-Li commented Mar 10, 2024 •

edited

Loading

MasterJH5574 commented Mar 11, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024 •

edited

Loading

Sing-Li commented Mar 12, 2024 •

edited

Loading

MasterJH5574 commented Apr 6, 2024

Sing-Li commented Apr 10, 2024

[Bug] SLM running serve on known-good chat model crashes on _print_kv_cache_metadata_in_json #1921

[Bug] SLM running serve on known-good chat model crashes on _print_kv_cache_metadata_in_json #1921

Comments

Sing-Li commented Mar 10, 2024 • edited Loading

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

MasterJH5574 commented Mar 11, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

Sing-Li commented Mar 12, 2024

MasterJH5574 commented Mar 12, 2024 • edited Loading

Sing-Li commented Mar 12, 2024 • edited Loading

MasterJH5574 commented Apr 6, 2024

Sing-Li commented Apr 10, 2024

[Bug] SLM running serve on known-good chat model crashes on `_print_kv_cache_metadata_in_json` #1921

[Bug] SLM running serve on known-good chat model crashes on `_print_kv_cache_metadata_in_json` #1921

Sing-Li commented Mar 10, 2024 •

edited

Loading

MasterJH5574 commented Mar 12, 2024 •

edited

Loading

Sing-Li commented Mar 12, 2024 •

edited

Loading