Skip to content
This repository has been archived by the owner on Apr 16, 2024. It is now read-only.

Claude 3 image query docs #234

Merged
merged 8 commits into from
Mar 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
98 changes: 96 additions & 2 deletions docs/griptape-framework/drivers/image-query-drivers.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,67 @@

Image Query Drivers are used by [Image Query Engines](../engines/image-query-engines.md) to execute natural language queries on the contents of images. You can specify the provider and model used to query the image by providing the Engine with a particular Image Query Driver.

!!! info
All Image Query Drivers default to a `max_tokens` of 256. It is recommended that you set this value to correspond to the desired response length.

## AnthropicImageQueryDriver

!!! info
To tune `max_tokens`, see [Anthropic's documentation on image tokens](https://docs.anthropic.com/claude/docs/vision#image-costs) for more information on how to relate token count to response length.

The [AnthropicImageQueryDriver](../../reference/griptape/drivers/image_query/anthropic_image_query_driver.md) is used to query images using Anthropic's Claude 3 multi-modal model. Here is an example of how to use it:

```python
from griptape.drivers import AnthropicImageQueryDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader

driver = AnthropicImageQueryDriver(
model="claude-3-sonnet-20240229",
max_tokens=1024,
)

engine = ImageQueryEngine(
image_query_driver=driver,
)

with open("tests/assets/mountain.png", "rb") as f:
image_artifact = ImageLoader().load(f.read())

engine.run("Describe the weather in the image", [image_artifact])
```

You can also specify multiple images with a single text prompt. This applies the same text prompt to all images specified, up to a max of 20. However, you will still receive one text response from the model currently.

```python
from griptape.drivers import AnthropicImageQueryDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader

driver = AnthropicImageQueryDriver(
model="claude-3-sonnet-20240229",
max_tokens=1024,
)

engine = ImageQueryEngine(
image_query_driver=driver,
)

with open("tests/assets/mountain.png", "rb") as f:
image_artifact1 = ImageLoader().load(f.read())

with open("tests/assets/cow.png", "rb") as f:
image_artifact2 = ImageLoader().load(f.read())

result = engine.run("Describe the weather in the image", [image_artifact1, image_artifact2])

print(result)
```

## OpenAiVisionImageQueryDriver

!!! info
This Driver defaults to using the `gpt-4-vision-preview` model. As other multimodal models are released, they can be specified using the `model` field. While the `max_tokens` field is optional, it is recommended to set this to a value that corresponds to the desired response length. Without an explicit value, the model will default to very short responses. See [OpenAI's documentation](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) for more information on how to relate token count to response length.
To tune `max_tokens`, see [OpenAI's documentation](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them) for more information on how to relate token count to response length.

The [OpenAiVisionImageQueryDriver](../../reference/griptape/drivers/image_query/openai_vision_image_query_driver.md) is used to query images using the OpenAI Vision API. Here is an example of how to use it:

Expand All @@ -16,7 +73,7 @@ from griptape.loaders import ImageLoader

driver = OpenAiVisionImageQueryDriver(
model="gpt-4-vision-preview",
max_tokens=200,
max_tokens=256,
)

engine = ImageQueryEngine(
Expand All @@ -28,3 +85,40 @@ with open("tests/assets/mountain.png", "rb") as f:

engine.run("Describe the weather in the image", [image_artifact])
```

## AmazonBedrockImageQueryDriver

The [Amazon Bedrock Image Query Driver](../../reference/griptape/drivers/image_query/amazon_bedrock_image_query_driver.md) provides multi-model access to image query models hosted by Amazon Bedrock. This Driver manages API calls to the Bedrock API, while the specific Model Drivers below format the API requests and parse the responses.

### Claude

The [BedrockClaudeImageQueryModelDriver](../../reference/griptape/drivers/image_query_model/bedrock_claude_image_query_model_driver.md) provides support for Claude models hosted by Bedrock.

```python
from griptape.drivers import AmazonBedrockImageQueryDriver, BedrockClaudeImageQueryModelDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader
import boto3

session = boto3.Session(
region_name="us-west-2"
)

driver = AmazonBedrockImageQueryDriver(
image_query_model_driver=BedrockClaudeImageQueryModelDriver(),
model="anthropic.claude-3-sonnet-20240229-v1:0",
session=session
)

engine = ImageQueryEngine(
image_query_driver=driver
)

with open("tests/assets/mountain.png", "rb") as f:
image_artifact = ImageLoader().load(f.read())


result = engine.run("Describe the weather in the image", [image_artifact])

print(result)
```
7 changes: 5 additions & 2 deletions docs/griptape-framework/engines/image-query-engines.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,20 @@

The [Image Query Engine](../../reference/griptape/engines/image_query/image_query_engine.md) is used to execute natural language queries on the contents of images. You can specify the provider and model used to query the image by providing the Engine with a particular [Image Query Driver](../drivers/image-query-drivers.md).

All Image Query Drivers default to a `max_tokens` of 256. You can tune this value based on your use case and the [Image Query Driver](../drivers/image-query-drivers.md) you are providing.

```python
from griptape.drivers import OpenAiVisionImageQueryDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader
from griptape.loaders import ImageLoader

driver = OpenAiVisionImageQueryDriver(
model="gpt-4-vision-preview",
max_tokens=256
)

engine = ImageQueryEngine(
image_query_driver=driver,
image_query_driver=driver
)

with open("tests/assets/mountain.png", "rb") as f:
Expand Down
Binary file added tests/assets/cow.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading