fix(anthropic): add instrumentation for Anthropic prompt caching #2175

dinmukhamedm · 2024-10-20T23:21:49Z

This PR addresses #1838 and is an alternative to the stale (and in the current form not working) #1858.

This is not a draft, it is a final working implementation. However, there are a couple open questions, that I will leave as comments to the relevant pieces of code.

Below is a screenshot of the new resulting attributes flattened to a YAML format:

I have added tests that cover my changes.
If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
(If applicable) I have updated the documentation accordingly.

dinmukhamedm · 2024-10-20T23:23:16Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

 token_histogram.record(
- prompt_tokens,
+ input_tokens,


I am open to discussing whether this must be input tokens (i.e. all tokens sent by the user) or prompt tokens (i.e. the new tokens that anthropic has neither written to nor read from cache)

dinmukhamedm · 2024-10-20T23:23:32Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

 token_histogram.record(
- prompt_tokens,
+ input_tokens,


.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

packages/opentelemetry-instrumentation-anthropic/tests/data/1024+tokens.txt

dinmukhamedm · 2024-10-20T23:31:41Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

@@ -74,6 +74,11 @@ class SpanAttributes:
 LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
 LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

+ # Anthropic


I am afraid that this PR won't pass tests, because pyproject.toml in the anthropic package looks at 0.4.1, basically the PyPI version, but my changes rely on the new attributes here. Let me know if I need to merge this first somehow

Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

dinmukhamedm · 2024-10-20T23:35:43Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

+ LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
+ LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
+ LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"


The logic I followed when naming this attributes was the following:

Feature: Support Prompt Caching #1858 (comment) this comment

llm is apparently less standardized than gen_ai, so we are free to experiment here

I haven't found a separate documentation for gen_ai.anthropic attributes like they have for gen_ai.openai here

So I thought, a good in-between would be llm.anthropic.usage.*, but I am open to additional thoughts.

Also, I am not sure if restricting this to anthropic is a good idea, because both google-generativeai and openai have at least some overlap with this.

I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

I agree generally, but given the context I provide in #2175 (comment) and the fact that what used to be called gen_ai.usage.prompt_tokens is now apparently called gen_ai.usage.input_tokens "officially", we need to decide what to call these, especially the TOTAL_INPUT_TOKENS one.

So, for context, OpenAI does not return cache_creation_input_tokens as they don't charge for the operation. Their usage block looks like this:

usage: { total_tokens: 2306, prompt_tokens: 2006, completion_tokens: 300, prompt_tokens_details: { cached_tokens: 1920, audio_tokens: 0, }, completion_tokens_details: { reasoning_tokens: 0, audio_tokens: 0, } }

Gemini returns the number of cache_creation_input_tokens from the cache.create call, and they charge for storage of cache tokens per hour.

So I think, overall the formula is this:

input_tokens = cached_tokens + uncached_tokens cached_tokens = cache_read_tokens # generally cached_tokens = cache_read_tokens OR cache_creation_tokens # for Anthropic

and cache_read_tokens and cache_creation_tokens are charged for at vastly different prices.

With this in mind, the main open question is what are gen_ai.usage.prompt_tokens and gen_ai.usage.input_tokens bound to mean? Is it total input tokens or is it the number of uncached tokens charged for regular price? While the former makes more sense to me, I think this will sort of break any cost calculation implementations depending on this library?

Done! @nirga please check my latest commit

dinmukhamedm · 2024-10-21T03:34:06Z

Also, I was surprised to learn that system messages are not instrumented in Anthropic, so I opened #2187. Also willing to contribute, unless anybody else picks that up faster

nirga

Excellent work @dinmukhamedm! Left one comment, and I'll manually publish the semconv package so the CI will pass here.

...opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/streaming.py

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

nirga · 2024-10-21T09:51:33Z

.../opentelemetry-instrumentation-anthropic/opentelemetry/instrumentation/anthropic/__init__.py

+ else:
+ cache_creation_tokens = 0
+
+ input_tokens = prompt_tokens + cache_read_tokens + cache_creation_tokens


Aren't you double-calculating the number of input tokens like this? From my understanding cache_read_tokens+cache_creation_tokens should be exactly the number of tokens in the input. Or is it the case that either prompt_tokens is set (for non-cached requests) OR cache_read_tokens+cache_creation_tokens?

Nope, Anthropic fills all three in. If it is a cache write, cache_read_tokens == 0. If it as a cache read, cache_creation_tokens == 0. prompt_tokens are tokens from uncached parts of the messages. For cache writes and reads is always about 3-4. I am assuming this is some control tokens or stop sequences or something. You can see some numbers I hard-coded in the tests here.

For example, if I send two text blocks of sizes 1200 and 100 tokens in one message and only direct Anthropic to cache the first one, the usage will be:

{"cache_read_input_tokens": 0, "cache_creation_input_tokens": 1200, "input_tokens": 104, "output_tokens": ...} for the first call, and

{"cache_read_input_tokens": 1200, "cache_creation_input_tokens": 0, "input_tokens": 104, "output_tokens": ...} for the second

My intuition is that we should keep the input_tokens constant across providers - so it should always be the number of tokens in the input - regardless if some of them were cached and some weren't

(which is what I think you did - right?)

Yes, except for one place where I accidentally added just uncached tokens. Fixed in the last commit now.

nirga · 2024-10-21T09:52:05Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

@@ -74,6 +74,11 @@ class SpanAttributes:
 LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
 LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

+ # Anthropic


Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

nirga · 2024-10-21T09:53:00Z

packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py

+ LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
+ LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
+ LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"


I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

[traceloop#1838] add caching for anthropic

0cabaae

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. new instrumentation python Pull requests that update Python code testing labels Oct 20, 2024

dinmukhamedm commented Oct 20, 2024

View reviewed changes

dinmukhamedm changed the title ~~fix(anthropic): add instrumentation for Anthropic~~ fix(anthropic): add instrumentation for Anthropic prompt caching Oct 21, 2024

nirga reviewed Oct 21, 2024

View reviewed changes

dinmukhamedm added 2 commits October 22, 2024 13:19

update gen_ai.usage and change prompt tokens to always show input tokens

939b26e

fix lint

e6f506d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

dinmukhamedm commented Oct 20, 2024

dinmukhamedm Oct 20, 2024

dinmukhamedm Oct 20, 2024

dinmukhamedm Oct 20, 2024

nirga Oct 21, 2024

dinmukhamedm Oct 20, 2024 •

edited

Loading

nirga Oct 21, 2024

dinmukhamedm Oct 21, 2024

dinmukhamedm Oct 22, 2024

dinmukhamedm commented Oct 21, 2024

nirga left a comment

nirga Oct 21, 2024

dinmukhamedm Oct 21, 2024

nirga Oct 22, 2024

nirga Oct 22, 2024

dinmukhamedm Oct 22, 2024

nirga Oct 21, 2024

nirga Oct 21, 2024

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

Are you sure you want to change the base?

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

Conversation

dinmukhamedm commented Oct 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dinmukhamedm Oct 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dinmukhamedm commented Oct 21, 2024

nirga left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dinmukhamedm Oct 20, 2024 •

edited

Loading