You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Netherite does not impose any hard limits on the size of messages or histories. But of course, the question remains what happens as messages or histories get very large, i.e. what breaks first. I created this issue to track discussion, testing, and documentation around this question.
Some thoughts on this:
Everything in the system (i.e. not only the specific orchestrations which contain large messages or histories) is likely to slow down substantially when storage bandwidth, or inter-partition bandwidth (e.g. event hub throughput) becomes a bottleneck. The system should keep working under such circumstances but may be too slow to meet its intended purpose.
All in-flight messages are kept in memory (in the outboxes and the session buffers on the source and destination partitions respectively), so we may hit OOM when using large messages and not processing them quickly.
Netherite keeps the in-memory instance caching in line with the specified cache limits. If workers need to process histories or messages that exceed the memory limits of the cache, the result is thrashing which makes a system perform horribly. It is therefore important to increase the cache size when trying to handle such situations.
Page blobs have a maximum size of 1TB, and all of the data in the task hub partition, including all instance states, histories, and in-flight messages, have to fit into that. Also, the FASTER log can be quite a bit larger than the data it stores because it may contain multiple versions of orchestration states. FASTER does run compaction periodically but it remains to be determined what expansion factors would be typical. I would guess something like 3x.
A lot of that is just my guesses, we need experiments to validate these statements.
The text was updated successfully, but these errors were encountered:
does the event hub message size limit of 1 MB plays a role for large inputs (entity states)?
In our application, it's not uncommon for the state to grow near to (or even higher than) 1MB in size...
@sebastianburckhardt is correct in saying that large messages (the data that is serialized between the event hub and orchestrators and activities) does certainly affect latency.
In my tests with an actual application (not a benchmark, but real-world stuff) we trigger off changes in cosmos, but then for expediency we have to gel together several documents. Our messages can balloon to many megabytes.
Then we have a choice of either slowing down because of the event hubs transit time, or going quicker based on the storage blobs "lookaside" storage, but then again increasing the load on the durable functions Netherite storage account that contains the taskhub.
We also have some experience with querying large histories and there we have found that status and purge history queries can take a good deal of time. To combat this, we had to place such queries out of the critical time windows within our system. We were also concerned what a large status or history query or purge would do to the utilization on our taskhub storage account and we had some scant evidence that suggested that these operations interfered with the "real time" work of running orchestrators and activities.
Netherite does not impose any hard limits on the size of messages or histories. But of course, the question remains what happens as messages or histories get very large, i.e. what breaks first. I created this issue to track discussion, testing, and documentation around this question.
Some thoughts on this:
Everything in the system (i.e. not only the specific orchestrations which contain large messages or histories) is likely to slow down substantially when storage bandwidth, or inter-partition bandwidth (e.g. event hub throughput) becomes a bottleneck. The system should keep working under such circumstances but may be too slow to meet its intended purpose.
All in-flight messages are kept in memory (in the outboxes and the session buffers on the source and destination partitions respectively), so we may hit OOM when using large messages and not processing them quickly.
Netherite keeps the in-memory instance caching in line with the specified cache limits. If workers need to process histories or messages that exceed the memory limits of the cache, the result is thrashing which makes a system perform horribly. It is therefore important to increase the cache size when trying to handle such situations.
Page blobs have a maximum size of 1TB, and all of the data in the task hub partition, including all instance states, histories, and in-flight messages, have to fit into that. Also, the FASTER log can be quite a bit larger than the data it stores because it may contain multiple versions of orchestration states. FASTER does run compaction periodically but it remains to be determined what expansion factors would be typical. I would guess something like 3x.
A lot of that is just my guesses, we need experiments to validate these statements.
The text was updated successfully, but these errors were encountered: