-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Durable orchestration take a long time to start with their activities. #2813
Comments
I had some look into various diagnostics, i think it has to do with the algorithm that is in place when the function scales down. I think this is different from the standard functions (|Non isolated). It seems that when the instances go up, everything is fine. But when they scale down, I get a pause of 5 minutes on the process. This can happens also on .net 6.0 with non isolated functions. But the behaviour is much more smoother on 6.0 and I get way way less 5 minutes pauses. It seems like the scale controller will scale down when the instance is still picking up a lot of messages from the queue and putting them in the table storage with status pending. Then the pending instances have to wait for five minutes to be served. This would be logical for messages with processing, but status pending is waiting to be processed (so only if it's beeing processed ther should be a lock)..... Maybe the underlying mechanism should release the locks on pending messages first before killing the proces ? When i put in a bunch of messages about 60 percent of them, finish fast and 30% takes a long time to complete. Maybe there is some kql query I could run to get the information you need.... |
I am willing to give the developers who will work on this access to my account so they can trigger the functions and test for themselves. |
Hi @well0549 - can you please confirm what DurableTask package versions you're using? It's interesting that you mentioned the scale controller, as I was just looking at another thread that made a connection between .NET isolated and scale controller issues (microsoft/durabletask-netherite#398) though that was in regards to failures. Still, trying to see if there's a connection, or if it's a coincidence. On our end - what we need to investigate are usually: A timerange in UTC when the issue was observed on Azure, the instanceID of the affected orchestrator, and the name of the app |
it's all c#, .net 8.0 running on windows consumption.
I am currentl;y setting up application insights again an will provide the details soon |
One other thing i noticed is that the mentioned issue also assigns an instance id himself to : ScheduleNewOrchestrationInstanceAsync just like I do. |
@davidmrdavid -> Regarding your question regarding the version of durabletask, There is no reference anywhere to DurableTask. The references in my project are: |
@davidmrdavid it seems the message is picked up twice at exactly the same moment ? Dequeue count = 2 ? |
Hi @well0549: I have yet to take a detailed look, but I want to follow up to this comment first: #2813 (comment) You mentioned that you have no reference to any |
Did several test runs with 1.1.3. Performance is more stable. I still see some 5 minutes pauses but i think this is on par with .net6 not isolated. (I think it now only happens when the instances cale down.) The performance is much better. And the reason to go to the preview was to clean up the orchestration history. But that works with 1.1.3 as well. |
guess this is no longer relevant, (as long as the preview is not released) |
so closed |
Hey @well0549 I am working on verification of the performance on dotnet isolated worker with package 1.2.0-rc.1. I queried with the instanceid 8e6ada22112f43108bc079f7e752dbfd and storage account eig-eigdemo-cleanup provided, but I only found few records which is not enough for investigations. I guess you might delete the history from your storage account, so I wonder do you have some other examples for me to identify the root cause? Thank you! |
Unfortunately i have to clean up a lot because my visual studio subscription will fill up with all the storage costs (and all other costs) so i regularly delete everything and create new. Since i went back to the non release packahe everything works a lot smoother now. I am currently running Microsoft.Azure.Functions.Worker.Extensions.DurableTask 1.1.4 and besides the occasional 5 minute hick-up, i dont the the behaviour anymore that I saw on the RC. I am willing to switch back to the RC, but then I must know when you will investigate, Because because of all the costs related to all the tests i clean up regularly |
Description
I have 12 durable functions that process messages from a queue. The process is quite simple,
It's all consumption plan, .netcore8, isolated.
Now the problem is that the messages get consumed by the queue trigger, and put in the taskhub instances table with the status pending. And some of them stay there in that state for about 8 to 10 minutes. (with the status pending) nothing is happening.
Expected behavior
If a message is put on a queue to be processed by a durable orchestration, the function could be asleep so it could take some time to process the message. But once consumed from the queue, the orchestration should kick in immediately and start processing the activities.
If I compare this to the current 6.0 solution (not isolated) the processing is very slow.
Actual behavior
Orchestration is created and a message is inserted into the TaskHub instances with the status pending. It can take up to 10 minutes before the message is picked up for processing.
Known workarounds
none
App Details
I also have a question about this here : #2808
But i think this is a bug because it behaves so much worse than .net 6.0
The text was updated successfully, but these errors were encountered: