Tech-paper typo: missed footnote; and some rumbling #485

the-Arioch · 2024-10-05T00:49:48Z

https://lmax-exchange.github.io/disruptor/disruptor.html

Ooops.

Also, could you a bit extend the tech paper on cache contention and on the multiple (and variable number) both consumers and producers, and why you feel Java woulkd need tuples? While it maybe not immediately part of the whitepaper, but since uyou made mentions of those ocncerns, perhaps it would be nice if you express your views more verbosely there, or perhaps give leads ot some follow-up articles.

Why would you feel a queue of values is better than queue of pointers? Won't it increase memory copying necessarily to "insert" and to "grab" the job units, compare to inherently atomic pointer (register-size usualy) operation? Won't it either increase memory use (if a queue slot size been rounded up to next closest 2^N) or complicate address calculation for array[i] access (if slot size is only rounded up to memory alignment granularity)? Granted, the mentioned head and tail variables fo not have to be index variables, as it was implied in the paper, they could be pointers themselves, almost always reducing the array access to mere addition (pointer += slot-size) - yet only in "bare metal" old school languages, not in Java-likes. IOW i fail to see what a benefit you could get from getting tuples into the language unless you also resurrect and reintroduce pointers, with all their dangers. Then, perhaps I can immediately only see one problem with pointer-slots (as contrasted with value-slots and excessive memory copying they seems to enforce): pointer-slots can be smaller than a cache line, like 32-bit pointers/programs on a 64-bit CPU, or a 64-bit CPU with 128-wide cache line. Then two or four "pointer slots" would be in the same cache line leading to the contention of adjacent writer (or readers) anyway. But perhaps this can be avoided by interleaving the ring buffer, or by adding post-pointer slack to the slot size.
This is a great observation that a ring would tend to be near-empty or near full. It is obvious, but it is also somehow "hidden in plain sight". Basically you said that any such program/data structure should be mostly optimized for those two edge cases and for easy transitions between them. I guess many "daatflow orchestrators" try to dynamically manage it by rebalancing "consumers" and "producers" (which are class methods or first class functions) in the boundaries of a fixed OS-threads pool (which size is usually derived from hardware threads available, but with Java green-threads maybe your just can not reach those). It would seem to be counter-intuitive to keep a queue configuration fixed, when either N readers or N writers would always be just spinlocking. It would seem to rather to re-configure some OS threads from executing producer-functions to consumer functions or back, when the ring-buffer is approaching those extreme states. But this would probably need a multi-read multi-write buffers, you seem to dislike, or periodically switching the buffer between 1R+MW <-> MR+1W configurations, without loosing data and long delays. Which in the end might be about as "complex program" as MR+MW queues themselves. Or maybe your higher level queues internally could have two rings, configured for 1W+MR and 1R+MW cases, and to quickly switch between those rings? dunno. I just feel uncomfortable thinking i would be confined to ether enforce 1W design or enforce 1R design on every pipeline stage.
However i feel it would be good to see more verbose description of what you meant by "write contention neat tail" or "read contention near head" in the 2.5. The Problems of Queues chapter. Because later you write about the "magic ring buffer" and somehow the cache contention is magically gone! However those variables tail/head/size would be allocated close to one another, probably belonging to the same cache line, and hence suffering from the very same contention! Since you put that much of emphasis on that contention it would feel natural to see chapters 3.2 and 3.3 explaining in details while no cache contention would happen with your invention. But they just avoid the topic it feels. Frankly i believe if tail and head variables get ended in the same cache line - then you have that problematic contention no matter how you would manage the slots themselves. Maybe this is a misconception, then a whitepaper would make good to despell it.
With the emphasis you put on hardware threads having their own L1/L2 caches, it would seem natural to describe if the Disruptor could provide immediate "links" between a producer and a consumer, within the same OS+CPU "worker thread", otherwise you would often see consumer and producer belonging to different cores and doomed to have relatively slow cache-to-cache copying of job units. It would feel natural to add into the design some "thread manager" that would make sure (in 1W+MR case) the producer function directly passes control to the consumer function (and the "producer" role passes to another OS+CPU thread). Granted, with Jave green threads this might be impossible to manipulate OS thread at all. This whole bulletpoint was inspired by some frameworks for non-managed languages (which naturally expose low-level threads, and tail/head pointers) and it seems to be refreshed by a recent "Scheduling threads like Thomas Jefferson" article also advocating for such a "control flow direct link" approach.
the "3.4. Batching Effect" chapter lost me completely. One would think that every reasonble queue does have that property, of consumer fetching the next work unit without too much of "concurrency"? Maybe you could give some example of frequently used but badly designed queues where this is not true. Frankly that chapter, perhaps wrongly, sounded like bragging about something that everyoen has.

Thanks you for the interesting and though-provoking whitepaper.

The text was updated successfully, but these errors were encountered:

the-Arioch · 2024-10-05T01:07:11Z

your linked blog posts says

one major difference to the way we've implemented ours - we don't have a pointer to the end

and

Deciding whether it's OK to wrap or not is managed outside of the data structure itself

This implies you have the head(end) pointer (or the size variable, which would be the same, size == (tail - head) mod N, so any two variables are required, the third is derived) outside the ring buffer.

This might indeed made head/tail cache contentin impossible, and then i believe this makes it a must for a whitepaper, it is a crucial detail then. Though it frightens me, i'd rather just positioned both head and tail vars still inside the data structure, just on the opposite sides of the ring buffer itself :-)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tech-paper typo: missed footnote; and some rumbling #485

Tech-paper typo: missed footnote; and some rumbling #485

the-Arioch commented Oct 5, 2024 •

edited

Loading

the-Arioch commented Oct 5, 2024

Tech-paper typo: missed footnote; and some rumbling #485

Tech-paper typo: missed footnote; and some rumbling #485

Comments

the-Arioch commented Oct 5, 2024 • edited Loading

the-Arioch commented Oct 5, 2024

the-Arioch commented Oct 5, 2024 •

edited

Loading