Could you give me some guidance on how I can adapt this to vllm like llava? #532

Yingshu-Li · 2024-09-28T04:27:01Z

Hello,

I’m currently exploring how to visualize the heatmap on LLAVA or other kinds of multimodal large language model to understand the model’s focus during text generation. I am familiar with using Grad-CAM for single-target classification tasks. However, with LLAVA generating complete sentences, I’m unsure how to obtain heatmaps for individual words. Could you provide any guidance or advice on how to approach this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could you give me some guidance on how I can adapt this to vllm like llava? #532

Could you give me some guidance on how I can adapt this to vllm like llava? #532

Yingshu-Li commented Sep 28, 2024

Could you give me some guidance on how I can adapt this to vllm like llava? #532

Could you give me some guidance on how I can adapt this to vllm like llava? #532

Comments

Yingshu-Li commented Sep 28, 2024