Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(anthropic): add instrumentation for Anthropic prompt caching #2175

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dinmukhamedm
Copy link
Contributor

This PR addresses #1838 and is an alternative to the stale (and in the current form not working) #1858.

This is not a draft, it is a final working implementation. However, there are a couple open questions, that I will leave as comments to the relevant pieces of code.

Below is a screenshot of the new resulting attributes flattened to a YAML format:

anthropic_token_count_attributes

  • I have added tests that cover my changes.
  • If adding a new instrumentation or changing an existing one, I've added screenshots from some observability platform showing the change.
  • PR name follows conventional commits format: feat(instrumentation): ... or fix(instrumentation): ....
  • (If applicable) I have updated the documentation accordingly.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. new instrumentation python Pull requests that update Python code testing labels Oct 20, 2024
token_histogram.record(
prompt_tokens,
input_tokens,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am open to discussing whether this must be input tokens (i.e. all tokens sent by the user) or prompt tokens (i.e. the new tokens that anthropic has neither written to nor read from cache)

token_histogram.record(
prompt_tokens,
input_tokens,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -74,6 +74,11 @@ class SpanAttributes:
LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

# Anthropic
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid that this PR won't pass tests, because pyproject.toml in the anthropic package looks at 0.4.1, basically the PyPI version, but my changes rely on the new attributes here. Let me know if I need to merge this first somehow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

Comment on lines 78 to 80
LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"
Copy link
Contributor Author

@dinmukhamedm dinmukhamedm Oct 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic I followed when naming this attributes was the following:

  1. Feature: Support Prompt Caching #1858 (comment) this comment
  2. llm is apparently less standardized than gen_ai, so we are free to experiment here
  3. I haven't found a separate documentation for gen_ai.anthropic attributes like they have for gen_ai.openai here

So I thought, a good in-between would be llm.anthropic.usage.*, but I am open to additional thoughts.

Also, I am not sure if restricting this to anthropic is a good idea, because both google-generativeai and openai have at least some overlap with this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree generally, but given the context I provide in #2175 (comment) and the fact that what used to be called gen_ai.usage.prompt_tokens is now apparently called gen_ai.usage.input_tokens "officially", we need to decide what to call these, especially the TOTAL_INPUT_TOKENS one.

So, for context, OpenAI does not return cache_creation_input_tokens as they don't charge for the operation. Their usage block looks like this:

usage: {
  total_tokens: 2306,
  prompt_tokens: 2006,
  completion_tokens: 300,
  
  prompt_tokens_details: {
    cached_tokens: 1920,
    audio_tokens: 0,
  },
  completion_tokens_details: {
    reasoning_tokens: 0,
    audio_tokens: 0,
  }
}

Gemini returns the number of cache_creation_input_tokens from the cache.create call, and they charge for storage of cache tokens per hour.

So I think, overall the formula is this:

input_tokens = cached_tokens + uncached_tokens
cached_tokens = cache_read_tokens # generally
cached_tokens = cache_read_tokens OR cache_creation_tokens # for Anthropic

and cache_read_tokens and cache_creation_tokens are charged for at vastly different prices.

With this in mind, the main open question is what are gen_ai.usage.prompt_tokens and gen_ai.usage.input_tokens bound to mean? Is it total input tokens or is it the number of uncached tokens charged for regular price? While the former makes more sense to me, I think this will sort of break any cost calculation implementations depending on this library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! @nirga please check my latest commit

@dinmukhamedm dinmukhamedm changed the title fix(anthropic): add instrumentation for Anthropic fix(anthropic): add instrumentation for Anthropic prompt caching Oct 21, 2024
@dinmukhamedm
Copy link
Contributor Author

Also, I was surprised to learn that system messages are not instrumented in Anthropic, so I opened #2187. Also willing to contribute, unless anybody else picks that up faster

Copy link
Member

@nirga nirga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work @dinmukhamedm! Left one comment, and I'll manually publish the semconv package so the CI will pass here.

else:
cache_creation_tokens = 0

input_tokens = prompt_tokens + cache_read_tokens + cache_creation_tokens
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you double-calculating the number of input tokens like this? From my understanding cache_read_tokens+cache_creation_tokens should be exactly the number of tokens in the input. Or is it the case that either prompt_tokens is set (for non-cached requests) OR cache_read_tokens+cache_creation_tokens?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, Anthropic fills all three in. If it is a cache write, cache_read_tokens == 0. If it as a cache read, cache_creation_tokens == 0. prompt_tokens are tokens from uncached parts of the messages. For cache writes and reads is always about 3-4. I am assuming this is some control tokens or stop sequences or something. You can see some numbers I hard-coded in the tests here.

For example, if I send two text blocks of sizes 1200 and 100 tokens in one message and only direct Anthropic to cache the first one, the usage will be:

  • {"cache_read_input_tokens": 0, "cache_creation_input_tokens": 1200, "input_tokens": 104, "output_tokens": ...} for the first call, and
  • {"cache_read_input_tokens": 1200, "cache_creation_input_tokens": 0, "input_tokens": 104, "output_tokens": ...} for the second

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intuition is that we should keep the input_tokens constant across providers - so it should always be the number of tokens in the input - regardless if some of them were cached and some weren't

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(which is what I think you did - right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, except for one place where I accidentally added just uncached tokens. Fixed in the last commit now.

@@ -74,6 +74,11 @@ class SpanAttributes:
LLM_OPENAI_API_VERSION = "gen_ai.openai.api_version"
LLM_OPENAI_API_TYPE = "gen_ai.openai.api_type"

# Anthropic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I need to manually publish the semconv package cause poetry doesn't support deeply nested local dependencies :/
python-poetry/poetry#2270

Comment on lines 78 to 80
LLM_ANTHROPIC_CACHE_CREATION_INPUT_TOKENS = "llm.anthropic.usage.cache_creation_input_tokens"
LLM_ANTHROPIC_CACHE_READ_INPUT_TOKENS = "llm.anthropic.usage.cache_read_input_tokens"
LLM_ANTHROPIC_TOTAL_INPUT_TOKENS = "llm.anthropic.usage.total_input_tokens"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue these should be gen_ai and without anthropic. Most of the "official" semantic conventions came after OpenLLMetry and were inspired but the work we've done here - so we should set the preferred path moving forward!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new instrumentation python Pull requests that update Python code size:XXL This PR changes 1000+ lines, ignoring generated files. testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants