You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for the awesome work. I was wondering if there is a quantized version of prismatic, or if I can quantize the LLM backbone at least. I saw that for inference, it is loading the weights using load_state_dict, so I am not sure how to approach quantization. Any insight would be helpful. Thanks!
The text was updated successfully, but these errors were encountered:
show981111
changed the title
Inference speed using pre-trained backbone, not from prismatic vlm checkpoint.
Quantization support
Apr 6, 2024
This is a good question -- I would love to support this, but don't have too much experience loading LLMs in 4-bit/8-bit precision. If you can link me to some code for loading e.g., LLaMa-2 in 8-bit precision, I can see what would make sense!
Hi, thank you for the awesome work. I was wondering if there is a quantized version of prismatic, or if I can quantize the LLM backbone at least. I saw that for inference, it is loading the weights using
load_state_dict
, so I am not sure how to approach quantization. Any insight would be helpful. Thanks!The text was updated successfully, but these errors were encountered: