Skip to content

Commit

Permalink
[Inference API] Add image-text-to-text task and fix generate script (
Browse files Browse the repository at this point in the history
…#1440)

* Add image-text-to-text task and fix generate script

* Run generate script

* fix chat-completion and image-text-to-text docs

* fix typo

* Fix chat completion package reference links

* regenerate inference api docs
  • Loading branch information
hanouticelina authored Oct 16, 2024
1 parent 6a102ac commit 0bcbf66
Show file tree
Hide file tree
Showing 24 changed files with 332 additions and 86 deletions.
2 changes: 2 additions & 0 deletions docs/api-inference/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@
title: Image Segmentation
- local: tasks/image-to-image
title: Image to Image
- local: tasks/image-text-to-text
title: Image-Text to Text
- local: tasks/object-detection
title: Object Detection
- local: tasks/question-answering
Expand Down
10 changes: 5 additions & 5 deletions docs/api-inference/tasks/audio-classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@ For more details about the `audio-classification` task, check out its [dedicated

### Recommended models

- [ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition](https://huggingface.co/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition): An emotion recognition model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=audio-classification&sort=trending).

### Using the API

Expand All @@ -39,19 +40,18 @@ This is only a subset of the supported models. Find the model that suits you bes

<curl>
```bash
curl https://api-inference.huggingface.co/models/<REPO_ID> \
curl https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition \
-X POST \
--data-binary '@sample1.flac' \
-H "Authorization: Bearer hf_***"

```
</curl>

<python>
```py
import requests

API_URL = "https://api-inference.huggingface.co/models/<REPO_ID>"
API_URL = "https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition"
headers = {"Authorization": "Bearer hf_***"}

def query(filename):
Expand All @@ -71,7 +71,7 @@ To use the Python client, see `huggingface_hub`'s [package reference](https://hu
async function query(filename) {
const data = fs.readFileSync(filename);
const response = await fetch(
"https://api-inference.huggingface.co/models/<REPO_ID>",
"https://api-inference.huggingface.co/models/ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition",
{
headers: {
Authorization: "Bearer hf_***"
Expand Down
5 changes: 2 additions & 3 deletions docs/api-inference/tasks/automatic-speech-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ For more details about the `automatic-speech-recognition` task, check out its [d
- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).

### Using the API

Expand All @@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/openai/whisper-large-v3 \
-X POST \
--data-binary '@sample1.flac' \
-H "Authorization: Bearer hf_***"

```
</curl>

Expand Down Expand Up @@ -108,7 +107,7 @@ To use the JavaScript client, see `huggingface.js`'s [package reference](https:/
| **inputs*** | _string_ | The input audio data as a base64-encoded string. If no `parameters` are provided, you can also provide the audio data as a raw bytes payload. |
| **parameters** | _object_ | Additional inference parameters for Automatic Speech Recognition |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;return_timestamps** | _boolean_ | Whether to output corresponding timestamps with the generated text |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generate** | _object_ | Ad-hoc parametrization of the text generation process |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;generation_parameters** | _object_ | Ad-hoc parametrization of the text generation process |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;temperature** | _number_ | The value used to modulate the next token probabilities. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_k** | _integer_ | The number of highest probability vocabulary tokens to keep for top-k-filtering. |
| **&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;top_p** | _number_ | If set to float < 1, only the smallest set of most probable tokens with probabilities that add up to top_p or higher are kept for generation. |
Expand Down
106 changes: 97 additions & 9 deletions docs/api-inference/tasks/chat-completion.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,20 +14,23 @@ For more details, check out:

## Chat Completion

Generate a response given a list of messages.
This is a subtask of [`text-generation`](./text_generation) designed to generate responses in a conversational context.


Generate a response given a list of messages in a conversational context, supporting both conversational Language Models (LLMs) and conversational Vision-Language Models (VLMs).
This is a subtask of [`text-generation`](https://huggingface.co/docs/api-inference/tasks/text-generation) and [`image-text-to-text`](https://huggingface.co/docs/api-inference/tasks/image-text-to-text).

### Recommended models

#### Conversational Large Language Models (LLMs)

- [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it): A text-generation model trained to follow instructions.
- [meta-llama/Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct): Very powerful text generation model trained to follow instructions.
- [microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct): Small yet powerful text generation model.
- [HuggingFaceH4/starchat2-15b-v0.1](https://huggingface.co/HuggingFaceH4/starchat2-15b-v0.1): Strong coding assistant model.
- [mistralai/Mistral-Nemo-Instruct-2407](https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407): Very strong open-source large language model.

#### Conversational Vision-Language Models (VLMs)

- [meta-llama/Llama-3.2-11B-Vision-Instruct](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct): Powerful vision language model with great visual understanding and reasoning capabilities.
- [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct): Strong image-text-to-text model.

### Using the API

Expand All @@ -37,6 +40,8 @@ The API supports:
* Using grammars, constraints, and tools.
* Streaming the output

#### Code snippet example for conversational LLMs


<inferencesnippet>

Expand All @@ -59,18 +64,15 @@ curl 'https://api-inference.huggingface.co/models/google/gemma-2-2b-it/v1/chat/c
```py
from huggingface_hub import InferenceClient

client = InferenceClient(
"google/gemma-2-2b-it",
token="hf_***",
)
client = InferenceClient(api_key="hf_***")

for message in client.chat_completion(
model="google/gemma-2-2b-it",
messages=[{"role": "user", "content": "What is the capital of France?"}],
max_tokens=500,
stream=True,
):
print(message.choices[0].delta.content, end="")

```

To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
Expand All @@ -89,7 +91,93 @@ for await (const chunk of inference.chatCompletionStream({
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
```
To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
</js>
</inferencesnippet>
#### Code snippet example for conversational VLMs
<inferencesnippet>
<curl>
```bash
curl 'https://api-inference.huggingface.co/models/meta-llama/Llama-3.2-11B-Vision-Instruct/v1/chat/completions' \
-H "Authorization: Bearer hf_***" \
-H 'Content-Type: application/json' \
-d '{
"model": "meta-llama/Llama-3.2-11B-Vision-Instruct",
"messages": [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"}},
{"type": "text", "text": "Describe this image in one sentence."}
]
}
],
"max_tokens": 500,
"stream": false
}'

```
</curl>
<python>
```py
from huggingface_hub import InferenceClient

client = InferenceClient(api_key="hf_***")

image_url = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"

for message in client.chat_completion(
model="meta-llama/Llama-3.2-11B-Vision-Instruct",
messages=[
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": image_url}},
{"type": "text", "text": "Describe this image in one sentence."},
],
}
],
max_tokens=500,
stream=True,
):
print(message.choices[0].delta.content, end="")
```
To use the Python client, see `huggingface_hub`'s [package reference](https://huggingface.co/docs/huggingface_hub/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion).
</python>
<js>
```js
import { HfInference } from "@huggingface/inference";

const inference = new HfInference("hf_***");
const imageUrl = "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg";

for await (const chunk of inference.chatCompletionStream({
model: "meta-llama/Llama-3.2-11B-Vision-Instruct",
messages: [
{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": imageUrl}},
{"type": "text", "text": "Describe this image in one sentence."},
],
}
],
max_tokens: 500,
})) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
```
To use the JavaScript client, see `huggingface.js`'s [package reference](https://huggingface.co/docs/huggingface.js/inference/classes/HfInference#chatcompletion).
Expand Down
3 changes: 1 addition & 2 deletions docs/api-inference/tasks/feature-extraction.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ For more details about the `feature-extraction` task, check out its [dedicated p

- [thenlper/gte-large](https://huggingface.co/thenlper/gte-large): A powerful feature extraction model for natural language processing tasks.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=feature-extraction&sort=trending).

### Using the API

Expand All @@ -45,7 +45,6 @@ curl https://api-inference.huggingface.co/models/thenlper/gte-large \
-d '{"inputs": "Today is a sunny day and I will get some ice cream."}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer hf_***"

```
</curl>

Expand Down
3 changes: 1 addition & 2 deletions docs/api-inference/tasks/fill-mask.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ For more details about the `fill-mask` task, check out its [dedicated page](http
- [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased): The famous BERT model.
- [FacebookAI/xlm-roberta-base](https://huggingface.co/FacebookAI/xlm-roberta-base): A multilingual model trained on 100 languages.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=fill-mask&sort=trending).

### Using the API

Expand All @@ -41,7 +41,6 @@ curl https://api-inference.huggingface.co/models/google-bert/bert-base-uncased \
-d '{"inputs": "The answer to the universe is [MASK]."}' \
-H 'Content-Type: application/json' \
-H "Authorization: Bearer hf_***"

```
</curl>

Expand Down
4 changes: 2 additions & 2 deletions docs/api-inference/tasks/image-classification.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,9 @@ For more details about the `image-classification` task, check out its [dedicated
### Recommended models

- [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224): A strong image classification model.
- [facebook/deit-base-distilled-patch16-224](https://huggingface.co/facebook/deit-base-distilled-patch16-224): A robust image classification model.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-classification&sort=trending).

### Using the API

Expand All @@ -39,7 +40,6 @@ curl https://api-inference.huggingface.co/models/google/vit-base-patch16-224 \
-X POST \
--data-binary '@cats.jpg' \
-H "Authorization: Bearer hf_***"

```
</curl>

Expand Down
3 changes: 1 addition & 2 deletions docs/api-inference/tasks/image-segmentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ For more details about the `image-segmentation` task, check out its [dedicated p

- [nvidia/segformer-b0-finetuned-ade-512-512](https://huggingface.co/nvidia/segformer-b0-finetuned-ade-512-512): Semantic segmentation model trained on ADE20k benchmark dataset with 512x512 resolution.

This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).
Explore all available models and find the one that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=image-segmentation&sort=trending).

### Using the API

Expand All @@ -39,7 +39,6 @@ curl https://api-inference.huggingface.co/models/nvidia/segformer-b0-finetuned-a
-X POST \
--data-binary '@cats.jpg' \
-H "Authorization: Bearer hf_***"

```
</curl>

Expand Down
Loading

0 comments on commit 0bcbf66

Please sign in to comment.