Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout related fixes #320

Merged
merged 1 commit into from
Jul 23, 2024
Merged

Conversation

olivier-matz-6wind
Copy link
Contributor

This pull request contains 3 small fixes related to timeouts, seen during a watch operation.
I don't know if they can all be applied, especially the one in rest.py since this file is generated.

@tomplus
Copy link
Owner

tomplus commented Jul 22, 2024

Thanks for the PR.

Can I ask what issue you want to resolve? _request_timeout and timeout_seconds are different - first is for the aiohttp client and second is provided by API itself.

@olivier-matz-6wind
Copy link
Contributor Author

Hi @tomplus ,

Thank you for your feedback. Sorry, you are right, I should have started with the problem.
I want to watch a custom resource using a dynamic client. The steps below describe how to reproduce the issue.

Here is the definition of a dummy custom resource (define-crd.yaml):

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: my-objects.example.com
spec:
  group: example.com
  scope: Namespaced
  names:
    plural: my-objects
    singular: my-object
    kind: MyObject
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object

Here is a simple yaml file to add a custom resource called test1 (create-crd.yaml):

apiVersion: example.com/v1
kind: MyObject
metadata:
  name: test1

And here is a simple test application to watch these CRDs (watch_crd.py):

"""Watch K8s custom resource event stream using dynamic client."""

import asyncio
import logging
import sys

from kubernetes_asyncio import client, config, dynamic, watch

LOG = logging.getLogger(__name__)


async def watch_crd(api_version, kind):
    """Watch for custom resources events in all namespaces."""
    async with client.ApiClient() as api:
        dyn_client = await dynamic.DynamicClient(api)
        watcher = watch.Watch()
        hbr_net_api = await dyn_client.resources.get(api_version=api_version, kind=kind)
        async for event in dyn_client.watch(resource=hbr_net_api, watcher=watcher):
            etype, obj = event["type"], event["object"]
            LOG.debug(f"{etype} {kind} {obj.metadata.name}")


def task_done_cb(task):
    if task.cancelled():
        return
    exc = task.exception()
    if exc:
        LOG.error("exception %r", task, exc_info=exc)


async def async_main():
    await config.load_kube_config()
    task = asyncio.create_task(watch_crd("example.com/v1", "MyObject"))
    task.add_done_callback(task_done_cb)
    await task


def main():
    logging.basicConfig(
        stream=sys.stderr,
        level=logging.DEBUG,
        format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
    )
    loop = asyncio.get_event_loop()
    loop.run_until_complete(async_main())
    loop.close()

if __name__ == "__main__":
    main()

Here is the result of the test:

# kubectl apply -f define-crd.yaml 
# { sleep 30; kubectl apply -f create-crd.yaml; } &
# python3 watch_crd.py 
[1] 17848
2024-07-22 22:10:05,675 - asyncio - DEBUG - Using selector: EpollSelector
2024-07-22 22:10:05,679 - kubernetes_asyncio.config.kube_config - DEBUG - kubeconfig loader - current-context kubernetes-admin@kubernetes, cluster kubernetes, user kubernetes-admin, provider None
2024-07-22 22:10:05,680 - kubernetes_asyncio.config.kube_config - DEBUG - Try to load user token
2024-07-22 22:10:05,680 - kubernetes_asyncio.config.kube_config - DEBUG - Try to use username and password
myobject.example.com/test1 created
2024-07-22 22:10:35,510 - __main__ - DEBUG - ADDED MyObject test1
2024-07-22 22:15:06,441 - __main__ - ERROR - exception <Task finished name='Task-2' coro=<watch_crd() done, defined at /root/watch_crd.py:12> exception=TimeoutError()>
Traceback (most recent call last):
  File "/root/watch_crd.py", line 18, in watch_crd
    async for event in dyn_client.watch(resource=hbr_net_api, watcher=watcher):
  File "/usr/local/lib/python3.10/dist-packages/kubernetes_asyncio/dynamic/client.py", line 217, in watch
    async for event in watcher.stream(
  File "/usr/local/lib/python3.10/dist-packages/kubernetes_asyncio/watch/watch.py", line 133, in __anext__
    return await self.next()
  File "/usr/local/lib/python3.10/dist-packages/kubernetes_asyncio/watch/watch.py", line 154, in next
    line = await self.resp.content.readline()
  File "/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py", line 317, in readline
    return await self.readuntil()
  File "/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py", line 351, in readuntil
    await self._wait("readuntil")
  File "/usr/local/lib/python3.10/dist-packages/aiohttp/streams.py", line 311, in _wait
    with self._timer:
  File "/usr/local/lib/python3.10/dist-packages/aiohttp/helpers.py", line 735, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError

The backtrace appears 5mn after the program start, and is the same than the one observed in #259.

After applying the first patch, there is no TimeoutError exception anymore, but the behavior is not what expected. Every 5mn, a new "add" log is displayed, however the object is added only once:

# { sleep 30; kubectl apply -f create-crd.yaml; } &
# python3 watch_crd.py 
[1] 19819
2024-07-22 22:16:45,317 - asyncio - DEBUG - Using selector: EpollSelector
2024-07-22 22:16:45,322 - kubernetes_asyncio.config.kube_config - DEBUG - kubeconfig loader - current-context kubernetes-admin@kubernetes, cluster kubernetes, user kubernetes-admin, provider None
2024-07-22 22:16:45,322 - kubernetes_asyncio.config.kube_config - DEBUG - Try to load user token
2024-07-22 22:16:45,322 - kubernetes_asyncio.config.kube_config - DEBUG - Try to use username and password
myobject.example.com/test1 created
2024-07-22 22:17:15,149 - __main__ - DEBUG - ADDED MyObject test1
2024-07-22 22:21:45,453 - __main__ - DEBUG - ADDED MyObject test1
2024-07-22 22:26:46,452 - __main__ - DEBUG - ADDED MyObject test1
...

The next 2 patches aim to solve this new issue, but indeed I confused _request_timeout and timeout_seconds. Moreover, the problem disappears only if we explicitly pass timeout=0 to the watch() method.
Do you have an idea about how it can properly be fixed?

Thanks,

Copy link
Owner

@tomplus tomplus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your example helped me to understand the root cause.

On this page you can find differences between 2 types of timeout: https://github.com/kubernetes-client/python/blob/master/examples/watch/timeout-settings.md

DynamicClient shouldn't set timeout_second if it's not provided. Let's leave _request_timout unchanged (rest.py, client.py).

If timeout_second is not set Watch() should wait forever and reconnect if necessary. You get duplicated events because there is a bug and resource_version is not handled properly. I'll commit a fix for it in another PR.

kubernetes_asyncio/watch/watch.py Outdated Show resolved Hide resolved
Currently, when a dynamic client watcher is used, timeout_seconds is
always passed to watcher.stream() from DynamicClient.watch(). Its value
is None when the timeout argument was not passed by the user.

The current check done on TimeoutError in the Watch.watch() method when
awaiting from self.resp.content.readline() prevents to reopen the
request in this case because it behaves differently if timeout_seconds
was set to None or if timeout_seconds was not passed to the function.

Do not pass the timeout_seconds argument in DynamicClient.watch() if
timeout was not specified by the user.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
@tomplus
Copy link
Owner

tomplus commented Jul 23, 2024

This has to be merged with your changes too: #321

@olivier-matz-6wind
Copy link
Contributor Author

Hi,

Thank you for the explanations and the reconnect fix.
I will retest the same use case with a patch in DynamicClient.

Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 28.11%. Comparing base (03f4dd8) to head (5c5e6a3).

Files Patch % Lines
kubernetes_asyncio/dynamic/client.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #320      +/-   ##
==========================================
- Coverage   28.11%   28.11%   -0.01%     
==========================================
  Files         779      779              
  Lines       92119    92122       +3     
==========================================
  Hits        25899    25899              
- Misses      66220    66223       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomplus tomplus merged commit b7ec53c into tomplus:master Jul 23, 2024
7 of 8 checks passed
@tomplus
Copy link
Owner

tomplus commented Jul 23, 2024

I've merged my PR so you can test it from master branch. I'll release new version soon.

@olivier-matz-6wind
Copy link
Contributor Author

I've merged my PR so you can test it from master branch. I'll release new version soon.

Hi @tomplus,

I confirm my problem is solved with the master branch. Many thanks!

@tomplus
Copy link
Owner

tomplus commented Jul 23, 2024

I'm happy to hear that. Thank you for your contribution too!

tomplus pushed a commit that referenced this pull request Jul 31, 2024
Currently, when a dynamic client watcher is used, timeout_seconds is
always passed to watcher.stream() from DynamicClient.watch(). Its value
is None when the timeout argument was not passed by the user.

The current check done on TimeoutError in the Watch.watch() method when
awaiting from self.resp.content.readline() prevents to reopen the
request in this case because it behaves differently if timeout_seconds
was set to None or if timeout_seconds was not passed to the function.

Do not pass the timeout_seconds argument in DynamicClient.watch() if
timeout was not specified by the user.

Signed-off-by: Olivier Matz <olivier.matz@6wind.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants