Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

azure.monitor.opentelemetry logs exception even when the exception is handled in a try/catch #37747

Open
leopasta-enable opened this issue Oct 7, 2024 · 11 comments
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Distro Monitor OpenTelemetry Distro needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.

Comments

@leopasta-enable
Copy link

  • Package Name: azure.monitor.opentelemetry
  • Package Version: 1.4.0
  • Operating System: Debian Linux (docker image python:3.11-bookworm)
  • Python Version: 3.11

Describe the bug
When using azure opentelemetry, a blob storage operation that errors inside a try/catch is still logged as an Exception

To Reproduce
Steps to reproduce the behavior:
Have a method like:

def main():
    configure_azure_monitor(
        logger_name="AzureTestLogger",
        connection_string="InstrumentationKey=....",
        credential=DefaultAzureCredential(),
    )

    logger = logging.getLogger("AzureTestLogger")
    log_handler = logging.StreamHandler()
    logger.addHandler(log_handler)
    logger.setLevel("DEBUG")

    logger.info(f"logger level is set to: {logging.getLevelName(logger.level)}")
  
    blob_service_client = set_storage()
    blob = blob_service_client.get_blob_client("temp", "blob_that_dont_exists.txt")

    try:
        blob.delete_blob()
    except Exception:
        print("That was fine, nothing to see here")

This logs an entry similar to:
"timestamp [UTC]",message,severityLevel,itemType,customDimensions,customMeasurements,"operation_Name","operation_Id","operation_ParentId","operation_SyntheticSource","session_Id","user_Id","user_AuthenticatedId","user_AccountId","application_Version","client_Type","client_Model","client_OS","client_IP","client_City","client_StateOrProvince","client_CountryOrRegion","client_Browser","cloud_RoleName","cloud_RoleInstance",appId,appName,iKey,sdkVersion,itemId,itemCount,"_ResourceId",problemId,handledAt,type,assembly,method,outerType,outerMessage,outerAssembly,outerMethod,innermostType,innermostMessage,innermostAssembly,innermostMethod,details
"10/7/2024, 2:14:05.173 PM",,,exception,"{""_MS.ResourceAttributeId"":""ce93da00-cb04-49ee-9f08-c54cc90c2c28""}",,,d81f7ad205600d5eaba48f98864ed236,f6decbb27f751b59,,,,,,,PC,Other,"#1 SMP Fri Mar 29 12:21:27 UTC 2024","0.0.0.0",Belper,Derbyshire,"United Kingdom",Other,"unknown_service",e28ae3742d83,"97337a28-7fba-4482-9a7f-847ca9fc4340","/subscriptions/ed5e47ec-141a-422f-abba-b69ef358c6fa/resourcegroups/dataplatform_uk_test_1/providers/microsoft.insights/components/appi-dataeng-test-ukwest-001","a3f4fd25-e33f-42df-a270-4bec07e84ce7","ulm_py3.11.9:otel1.24.0:ext1.0.0b24","73fc59f6-84b6-11ef-ac1d-000d3a86db37",1,"/subscriptions/ed5e47ec-141a-422f-abba-b69ef358c6fa/resourcegroups/dataplatform_uk_test_1/providers/microsoft.insights/components/appi-dataeng-test-ukwest-001",ResourceNotFoundError,,ResourceNotFoundError,Unknown,Unknown,ResourceNotFoundError,"The specified blob does not exist.
RequestId:af47a6b9-501e-004e-12c3-18e526000000
Time:2024-10-07T14:14:05.1863327Z
ErrorCode:BlobNotFound
Content: BlobNotFoundThe specified blob does not exist.
RequestId:af47a6b9-501e-004e-12c3-18e526000000
Time:2024-10-07T14:14:05.1863327Z",,,ResourceNotFoundError,"The specified blob does not exist.
RequestId:af47a6b9-501e-004e-12c3-18e526000000
Time:2024-10-07T14:14:05.1863327Z
ErrorCode:BlobNotFound
Content: BlobNotFoundThe specified blob does not exist.
RequestId:af47a6b9-501e-004e-12c3-18e526000000
Time:2024-10-07T14:14:05.1863327Z",,,"[{""outerId"":""0"",""message"":""The specified blob does not exist.\nRequestId:af47a6b9-501e-004e-12c3-18e526000000\nTime:2024-10-07T14:14:05.1863327Z\nErrorCode:BlobNotFound\nContent: BlobNotFoundThe specified blob does not exist.\nRequestId:af47a6b9-501e-004e-12c3-18e526000000\nTime:2024-10-07T14:14:05.1863327Z"",""type"":""ResourceNotFoundError"",""id"":""0"",""rawStack"":""Traceback (most recent call last):\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/opentelemetry/trace/init.py"", line 570, in use_span\n yield span\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator.py"", line 89, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/storage/blob/_blob_client.py"", line 1211, in delete_blob\n process_storage_error(error)\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/storage/blob/_shared/response_handlers.py"", line 184, in process_storage_error\n exec(""raise error from None"") # pylint: disable=exec-used # nosec\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n File """", line 1, in \n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/storage/blob/_blob_client.py"", line 1209, in delete_blob\n self._client.blob.delete(**options)\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/core/tracing/decorator.py"", line 89, in wrapper_use_tracer\n return func(*args, **kwargs)\n ^^^^^^^^^^^^^^^^^^^^^\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/storage/blob/_generated/operations/_blob_operations.py"", line 2116, in delete\n map_error(status_code=response.status_code, response=response, error_map=error_map)\n File ""/app/datalake_channel_extraction/.venv/lib/python3.11/site-packages/azure/core/exceptions.py"", line 164, in map_error\n raise error\nazure.core.exceptions.ResourceNotFoundError: The specified blob does not exist.\nRequestId:af47a6b9-501e-004e-12c3-18e526000000\nTime:2024-10-07T14:14:05.1863327Z\nErrorCode:BlobNotFound\nContent: BlobNotFoundThe specified blob does not exist.\nRequestId:af47a6b9-501e-004e-12c3-18e526000000\nTime:2024-10-07T14:14:05.1863327Z\n""}]"

Expected behavior
As the delete_blob have a try/catch and is handling the exception, I would expect it not to be logged in App insights.
As it is, those handled exceptions are triggering our exception alerts.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context

@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team. Storage Storage Service (Queues, Blobs, Files) labels Oct 7, 2024
Copy link

github-actions bot commented Oct 7, 2024

Thanks for the feedback! We are routing this to the appropriate team for follow-up. cc @jalauzon-msft @vincenttran-msft.

@pvaneck pvaneck added Monitor - Distro Monitor OpenTelemetry Distro and removed Storage Storage Service (Queues, Blobs, Files) labels Oct 7, 2024
@viktor-morin
Copy link

viktor-morin commented Oct 10, 2024

We are having same issue.
Been testing all sort of ways of disable the namespace azure without success
https://learn.microsoft.com/en-us/azure/developer/python/sdk/azure-sdk-logging

example:

logger = logging.getLogger("azure")
logger.setLevel(logging.NOTSET)

results:
Image

even tried to set the flag logging_enabled=False in the method without any difference.

    table_client = TableClient.from_connection_string(
        conn_str=connection_string, table_name=table_name,
    )

    try:
        result = table_client.get_entity(partition_key="1", row_key="1", logging_enable=False)
    except ResourceNotFoundError as e:
        print(e)
        pass

@nsphung
Copy link

nsphung commented Oct 22, 2024

We are running into the same issue as well. This is how it'll look like in Azure Application Inights

Image

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/opentelemetry/trace/__init__.py", line 590, in use_span
    yield span
  File "/opt/venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 88, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/azure/storage/blob/aio/_container_client_async.py", line 1035, in delete_blob
    await blob.delete_blob( # type: ignore
  File "/opt/venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 81, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 618, in delete_blob
    process_storage_error(error)
  File "/opt/venv/lib/python3.10/site-packages/azure/storage/blob/_shared/response_handlers.py", line 189, in process_storage_error
    exec("raise error from None")   # pylint: disable=exec-used # nosec
  File "<string>", line 1, in <module>
  File "/opt/venv/lib/python3.10/site-packages/azure/storage/blob/aio/_blob_client_async.py", line 616, in delete_blob
    await self._client.blob.delete(**options)
  File "/opt/venv/lib/python3.10/site-packages/azure/core/tracing/decorator_async.py", line 88, in wrapper_use_tracer
    return await func(*args, **kwargs)
  File "/opt/venv/lib/python3.10/site-packages/azure/storage/blob/_generated/aio/operations/_blob_operations.py", line 691, in delete
    map_error(status_code=response.status_code, response=response, error_map=error_map)
  File "/opt/venv/lib/python3.10/site-packages/azure/core/exceptions.py", line 112, in map_error
    raise error
azure.core.exceptions.ResourceNotFoundError: The specified blob does not exist.
RequestId:2d9cf5c3-c01e-0029-2b4d-24633c000000
Time:2024-10-22T06:38:44.7113524Z
ErrorCode:BlobNotFound
Content: <?xml version="1.0" encoding="utf-8"?><Error><Code>BlobNotFound</Code><Message>The specified blob does not exist.
RequestId:2d9cf5c3-c01e-0029-2b4d-24633c000000
Time:2024-10-22T06:38:44.7113524Z</Message></Error>

We are using "azure-monitor-opentelemetry==1.6.2" version. If this could help.

@viktor-morin
Copy link

Also would like to mention following issue since they are related in some sense: #37919

@lzchen lzchen self-assigned this Oct 22, 2024
@lzchen
Copy link
Member

lzchen commented Oct 22, 2024

Spans in OpenTelemetry Python by default record an exception span event when the span ends if there was an exception thrown within the instrumentation (regardless of whether the exception was caught and handled). Azure monitor opentelemetry exporter exports span events that are exceptions to appinsights, so this is expected behavior. We allow turning off this behavior explicitly when starting a span through record_exception=False during initialization but it is up to the instrumentation (azure-core in this case) to allow configuration for that.

cc @pvaneck for this functionality

@viktor-morin
Copy link

Spans in OpenTelemetry Python by default record an exception span event when the span ends if there was an exception thrown within the instrumentation (regardless of whether the exception was caught and handled). Azure monitor opentelemetry exporter exports span events that are exceptions to appinsights, so this is expected behavior. We allow turning off this behavior explicitly when starting a span through record_exception=False during initialization but it is up to the instrumentation (azure-core in this case) to allow configuration for that.

cc @pvaneck for this functionality

The example shown above by me and @leopasta-enable, I don't see the explicit usage of any spans?

@lzchen
Copy link
Member

lzchen commented Oct 23, 2024

@viktor-morin

The spans are created automatically via azure-core-tracing. The azure monitor opentelemetry package takes a dependency on this to automatically capture telemetry related to azure-sdk client libraries.

@viktor-morin
Copy link

@viktor-morin

The spans are created automatically via azure-core-tracing. The azure monitor opentelemetry package takes a dependency on this to automatically capture telemetry related to azure-sdk client libraries.

@lzchen

So in our case when we catch the exception in a try-catch we still get two exceptions logged to Azure.
How do we solve this meaning:

  • If we would like to not log any exception (since we catch them), how do we configure it?
  • And we like to log once exception, how do we configure that?

@lzchen
Copy link
Member

lzchen commented Oct 24, 2024

@pvaneck could probably speak to more if azure-core-tracing can be configured on a per-span basis.

If there is no way to do it from azure-core-tracing side, on the OpenTelemetry side, you can configure a custom span processor to process the spans with span events coming in and drop the span events for specific exceptions you don't want. An example of how to create a custom span processor for a different use case (for adding a span attribute) can be found here. The idea would be the same, override the on_end method and instead of adding a new attribute to the span, check the span for span events with name="Exception" and configure as you'd like.

@pvaneck
Copy link
Member

pvaneck commented Oct 25, 2024

@pvaneck could probably speak to more if azure-core-tracing can be configured on a per-span basis.

Currently, per-span configuration is fairly limited for spans created through the azure-core and the azure-core-tracing-opentelemetry plugin, but this is probably something that can be extended to support more span configuration options like record_exception (looks like we are recording exceptions through here since we use use_span when using context managers). This would require changes in both azure-core and azure-core-tracing-opentelemetry to support this feature. Don't think I can differentiate an exception that is handled, since the span will be ended before the user sees the exception in their code. I'll need to look deeper into implementation details first.

Although maybe a bit cumbersome, the option of a custom span processor that @lzchen suggests is something that has been used in other cases, like filtering out certain spans based on name or URL. Custom processors probably provide the most control over what gets sent to Application Insights. For something like this, I think the following would work, but you'd need to know the associated span names beforehand. Typically, this is just <ClientClass>.<method_name> based on what is generating the exception event.

from opentelemetry.sdk.trace import SpanProcessor
from azure.monitor.opentelemetry import configure_azure_monitor


class CustomSpanProcessor(SpanProcessor):

    SPAN_NAMES = {"BlobClient.delete_blob"}

    def on_end(self, span: ReadableSpan) -> None:
        if span.name in self.SPAN_NAMES:
            # Clear events from the span (for these spans, if there is an event, it's likely just an exception) 
            # Requires access to private _events attribute, though. :/
            # Could also maybe loop through events and remove based on error type.
            span._events = tuple()
        super().on_end(span)


custom_span_processor = CustomSpanProcessor()

configure_azure_monitor(
    span_processors=[custom_span_processor]
)

@viktor-morin
Copy link

@pvaneck could probably speak to more if azure-core-tracing can be configured on a per-span basis.

Currently, per-span configuration is fairly limited for spans created through the azure-core and the azure-core-tracing-opentelemetry plugin, but this is probably something that can be extended to support more span configuration options like record_exception (looks like we are recording exceptions through here since we use use_span when using context managers). This would require changes in both azure-core and azure-core-tracing-opentelemetry to support this feature. Don't think I can differentiate an exception that is handled, since the span will be ended before the user sees the exception in their code. I'll need to look deeper into implementation details first.

Although maybe a bit cumbersome, the option of a custom span processor that @lzchen suggests is something that has been used in other cases, like filtering out certain spans based on name or URL. Custom processors probably provide the most control over what gets sent to Application Insights. For something like this, I think the following would work, but you'd need to know the associated span names beforehand. Typically, this is just <ClientClass>.<method_name> based on what is generating the exception event.

from opentelemetry.sdk.trace import SpanProcessor
from azure.monitor.opentelemetry import configure_azure_monitor

class CustomSpanProcessor(SpanProcessor):

SPAN_NAMES = {"BlobClient.delete_blob"}

def on_end(self, span: ReadableSpan) -> None:
    if span.name in self.SPAN_NAMES:
        # Clear events from the span (for these spans, if there is an event, it's likely just an exception) 
        # Requires access to private _events attribute, though. :/
        # Could also maybe loop through events and remove based on error type.
        span._events = tuple()
    super().on_end(span)

custom_span_processor = CustomSpanProcessor()

configure_azure_monitor(
span_processors=[custom_span_processor]
)

Thanks @pvaneck
I guess we can look into this, but sounds strange to me that there is no easy way to disable this.
To figure out each associated span names beforehand is quite troublesome.
As for example, this logging flag does nothing than?

result = table_client.get_entity(partition_key="1", row_key="1", logging_enable=False)

Related question in my other reported issue #37919, so I guess one exception here is log as expected and the other one is the span that we disucss in this thread?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. Monitor - Distro Monitor OpenTelemetry Distro needs-team-attention Workflow: This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Service Attention Workflow: This issue is responsible by Azure service team.
Projects
None yet
Development

No branches or pull requests

5 participants