Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Pixtral tokens & outputs? #46

Open
kanishkanarch opened this issue Sep 11, 2024 · 5 comments
Open

How to use Pixtral tokens & outputs? #46

kanishkanarch opened this issue Sep 11, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@kanishkanarch
Copy link

Python -VV

Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]

Pip Freeze

kanishk@anarch[~/mistral] > pip freeze
annotated-types==0.7.0
appdirs==1.4.4
asttokens==2.4.1
attrs==24.2.0
certifi==2024.8.30
charset-normalizer==3.3.2
cityscapesScripts==2.2.2
coloredlogs==15.0.1
contourpy==1.2.0
cycler==0.12.1
decorator==5.1.1
executing==2.0.1
filelock==3.13.1
fonttools==4.49.0
fsspec==2024.2.0
graphviz==0.20.3
huggingface-hub==0.24.6
humanfriendly==10.0
idna==3.8
ipython==8.22.1
jedi==0.19.1
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
keyboard==0.13.5
kiwisolver==1.4.5
MarkupSafe==2.1.5
matplotlib==3.8.3
matplotlib-inline==0.1.6
mistral_common==1.4.0
mplcyberpunk==0.7.1
mpmath==1.3.0
networkx==3.2.1
numpy==1.26.4
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
opencv-python==4.9.0.80
packaging==23.2
parso==0.8.3
pexpect==4.9.0
pillow==10.4.0
progressbar==2.5
prompt-toolkit==3.0.43
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.9.1
pydantic_core==2.23.3
pygame==2.5.2
Pygments==2.17.2
pyparsing==3.1.2
pyquaternion==0.9.9
python-dateutil==2.9.0.post0
PyYAML==6.0.2
pyzmq==23.2.1
qbstyles==0.1.4
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
sentencepiece==0.2.0
six==1.16.0
stack-data==0.6.3
sympy==1.12
tiktoken==0.7.0
torch==2.2.1
tqdm==4.66.2
traitlets==5.14.1
triton==2.2.0
typing==3.7.4.3
typing_extensions==4.12.2
urllib3==2.2.2
wcwidth==0.2.13
XPlaneApi==0.0.6
xplaneconnect @ file:///home/kanishk/X-Plane%2010/Resources/plugins/XPlaneConnect/XPlaneConnect
zmq==0.0.0

Reproduction Steps

  1. Run any one of the example code snippets given in the release documentation.

Expected Behavior

The Pixtral model should output some form of visualizable/interactive data, or additional code snippets of how to use the output tokens.

Additional Context

The mistral_common.multimodal module doesn't seem to have any function to make sense of the data output by the tokenizer, if I didn't overlook anything. I tried the open the output image(s) but they must have some read function according to the selected open function below.
image

TLDR: I have no clue how to use the output image

image

Suggested Solutions

Suggestions:

  1. Addition of modules to interact with multimodal data
  2. WebUI API, like Gradio
@kanishkanarch kanishkanarch added the bug Something isn't working label Sep 11, 2024
@maoki109
Copy link

I second this issue, would also like to know how to get the output in a human readable format.

@pandora-s-git
Copy link
Contributor

Hi @maoki109 and @kanishkanarch ,
Mistral Common does not perform inference; it primarily handles most of the tokenization before the data is fed to the model. Essentially, it is the preliminary step before sending the request to the actual model, where your text input and images are tokenized/encoded, ready to be used as input to the model. Therefore, it is completely normal that your outputs may not be easy to understand, since they are token IDs only. Usually, what you can do is then feed them to the model, which can be hosted with, for example, Mistral Inference. There is an example of code in this Hugging Face space that makes use of both Mistral Common and Mistral Inference to host and communicate with the model -> space"

@maoki109
Copy link

I see, thanks @pandora-s-git! I misunderstood.

Does anyone know where I can find documentation on how to use the tokens and images outputs from Mistral Common/Pixtral in Python inference? Similar to the Instruction Following example below, but with multimodal inputs.

from mistral_inference.transformer import Transformer
from mistral_inference.generate import generate

from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
from mistral_common.protocol.instruct.messages import UserMessage
from mistral_common.protocol.instruct.request import ChatCompletionRequest


tokenizer = MistralTokenizer.from_file("./mistral-nemo-instruct-v0.1/tekken.json")  # change to extracted tokenizer file
model = Transformer.from_folder("./mistral-nemo-instruct-v0.1")  # change to extracted model dir

prompt = "How expensive would it be to ask a window cleaner to clean all windows in Paris. Make a reasonable guess in US Dollar."

completion_request = ChatCompletionRequest(messages=[UserMessage(content=prompt)])

tokens = tokenizer.encode_chat_completion(completion_request).tokens

out_tokens, _ = generate([tokens], model, max_tokens=1024, temperature=0.35, eos_id=tokenizer.instruct_tokenizer.tokenizer.eos_id)
result = tokenizer.instruct_tokenizer.tokenizer.decode(out_tokens[0])

print(result)

@pandora-s-git
Copy link
Contributor

You can take a look at the code source of the space Ive mentionned previously 🙌

@kanishkanarch
Copy link
Author

Thanks a lot, @pandora-s-git. The online API works perfectly fine.

image

I also tried the 'API code' but got the following error. The docs say that one needs to pass the "face tokens" when working locally, but where do I pass them?

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants