Performance improvements for Random.Shuffle() #83305

wainwrightmark · 2023-03-11T18:21:37Z

This improves the performance of shuffling a span by using a different method.

Brief Explanation

Shuffling a span of length n requires sequentially swapping each element with a random element from the span up to and including itself.

For example, to shuffle a span of length 3:
The first swap is trivial - the first element has a 100% chance of being swapped with itself.
The second element is swapped with a random element of the first two elements so it is either swapped with the first element or left in place with equal probability.
The third element is swapped with a random one of the first three elements so it has a 1/3 chance of being swapped with the first element, 1/3 chance of being swapped with the second element and 1/3 chance of not being swapped.

You should be able to convince yourself the each of the six possible orderings are equally likely to be produced.

Note that the old implementation did this backwards - doing the big swaps first and ending with the two element swap. Both directions are equally good at shuffling the slice but will produce different values for different seeds.

The old implementation generated a new random number for each swap - incurring the cost of not only incrementing the rng but also of the bias checks and potential rejection. This article about this is very informative

The new implementation in this PR reduces the number of random numbers generated by grouping the swaps and only generating one random number for each group.

For example the first twelve swaps (including the trivial one where the first element is always 'swapped' with itself) can be grouped together and represented by one random number in the range [0, 479001600) (12 factorial).
This makes sense as there are 12! ways to order 12 elements so each possible random number represents one of those orderings.
We can deconstruct the random number into swaps by doing successive divmod operations - the randomly generated number has an equal chance of being 0 mod 2 or mod 2, then after division by 2, it has an equal chance of being 0,1, or 2 mod 3 and so on.

For longer spans we create additional groups - the next seven swaps are represented by a random number in the range [0,253955520), the upper bound of which is 19! / 12!
Note that the group sizes are determined to be as large as possible whilst still fitting into a 32 bit integer. It would be possible to get larger groups by using 64 bit integers but I have found that this leads to worse performance.

The division and modulo operations involved are expensive but this process seems to be about twice as fast as the old implementation. (See benchmark results below)

Changes

I have changed the Random.Shuffle code to use the new method.
This is a value breaking change - Shuffle will now produce a different but still random ordering. I have updated the tests to reflect this.
I've also added a new test Shuffle_Array_Fairness which checks that shuffling isn't obviously unfair but this test is not necessarily useful or needed.

Issues

C# doesn't have an efficient way to do checked multiplication so I am using a big switch statement instead to calculate the group sizes.
Random doesn't seem to have a method to produce random unsigned integers so I have to use integers instead which makes some of the group sizes smaller and this may be affecting performance.

Performance

I have found some optimizations since initially posting the issue and now the performance gains seem to be about 2x

Method	Mean	Error	StdDev	Ratio
Shuffle_10_Old	104.85 ns	0.701 ns	0.656 ns	1.0
Shuffle_100_Old	940.88 ns	5.155 ns	4.822 ns	1.0
Shuffle_1000_Old	9,508.29 ns	62.160 ns	55.103 ns	1.0
Shuffle_10_New	50.07 ns	0.372 ns	0.348 ns	0.48
Shuffle_100_New	347.25 ns	2.303 ns	2.042 ns	0.37
Shuffle_1000_New	4,452.33 ns	51.625 ns	48.290 ns	0.47

dotnet-issue-labeler · 2023-03-11T18:21:45Z

I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label.

src/libraries/System.Private.CoreLib/src/System/Random.cs

ghost · 2023-04-03T07:58:02Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Closes #82838