Potential 1.5x performance improvement for `Random.Shuffle()` #82838

wainwrightmark · 2023-03-01T15:39:21Z

Description

Hi. This is not a performance problem per se but a potential for quite a big improvement.

I noticed that a Shuffle() method has been added to the Random class.

I recently did some performance optimizations in the rust rand crate (library) which made it run about 1.5x times faster . I've had a little go and gotten similar results in C# (see benchmarks below).

How it works

Shuffling an array of length n essentially involves making n random swaps between elements.
The way I've made it faster is to, instead of generating a new 32 bit random number for each swap, group the swaps together into the largest groups possible and only generate one random number per group. (I'm simplifying - counting calls to Random.Next(n) here which themselves may call _prng.Sample() more than once)

For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow Next(n) function.

Data

These are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000

Method	Mean	Error	StdDev	Median
Shuffle_10_Old	101.64 ns	0.469 ns	0.439 ns	101.73 ns
Shuffle_10_New	57.38 ns	0.663 ns	0.620 ns	57.24 ns
Shuffle_100_Old	906.13 ns	9.934 ns	9.292 ns	910.72 ns
Shuffle_100_New	594.07 ns	11.824 ns	12.142 ns	587.11 ns
Shuffle_1000_Old	9,189.39 ns	84.756 ns	79.280 ns	9,196.53 ns
Shuffle_1000_New	6,733.79 ns	132.529 ns	172.325 ns	6,844.89 ns

The new versions are all about 1.5x as fast as their equivalents.

Configuration

dotnet 8.0.100-preview.1.23115.2
Windows 11 Pro 22000.1574
x64
12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz

Breaking Changes

This change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability.

Pull Request

I'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.

The text was updated successfully, but these errors were encountered:

dotnet-issue-labeler · 2023-03-01T15:39:35Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

ghost · 2023-03-01T15:47:23Z

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Hi. This is not a performance problem per se but a potential for quite a big improvement.

I noticed that a Shuffle() method has been added to the Random class.

I recently did some performance optimizations in the rust rand crate (library) which made it run about 1.5x times faster . I've had a little go and gotten similar results in C# (see benchmarks below).

How it works

Shuffling an array of length n essentially involves making n random swaps between elements.
The way I've made it faster is to, instead of generating a new 32 bit random number for each swap, group the swaps together into the largest groups possible and only generate one random number per group. (I'm simplifying - counting calls to Random.Next(n) here which themselves may call _prng.Sample() more than once)

For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow Next(n) function.

Data

These are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000

Method	Mean	Error	StdDev	Median
Shuffle_10_Old	101.64 ns	0.469 ns	0.439 ns	101.73 ns
Shuffle_10_New	57.38 ns	0.663 ns	0.620 ns	57.24 ns
Shuffle_100_Old	906.13 ns	9.934 ns	9.292 ns	910.72 ns
Shuffle_100_New	594.07 ns	11.824 ns	12.142 ns	587.11 ns
Shuffle_1000_Old	9,189.39 ns	84.756 ns	79.280 ns	9,196.53 ns
Shuffle_1000_New	6,733.79 ns	132.529 ns	172.325 ns	6,844.89 ns

The new versions are all about 1.5x as fast as their equivalents.

Configuration

dotnet 8.0.100-preview.1.23115.2
Windows 11 Pro 22000.1574
x64
12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz

Breaking Changes

This change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability.

Pull Request

I'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.

Author:	wainwrightmark
Assignees:	-
Labels:	`area-System.Runtime`, `tenet-performance`, `untriaged`
Milestone:	-

EgorBo · 2023-03-01T17:51:10Z

the results of shuffling with the same seed would be different (but still random!)

Presumably, it's fine to change the algorithm when seed is not specified by user, e.g. new Random() or Random.Shared

stephentoub · 2023-03-01T17:56:18Z

This has overlap with #82286; both are effectively looking at reducing the number of random numbers generated as part of an operation that may need multiple generated. You might coordinate with @sakno on a single approach to try for both GetItems and Shuffle.

I am curious, though, regarding the approach outlined here, whether it would preserve the quality / properties of the random numbers being generated. We don't, for example, just use % as part of Next(int), because it skews the distribution inappropriately (in addition to the cost involved), and will instead use retries or more recently the fastrange algorithm.

sakno · 2023-03-01T21:12:39Z

% can be replaced with & only if the length of choices is a power of 2, if we want to have unbiased selection of the items. This is why I chose NextBytes to pre-populate vector of random values instead of % when length is not a power of 2. Anyway, fastrange is still applied to that vector. Also, this approach produces the same result for the same seed.

wainwrightmark · 2023-03-01T21:37:43Z

This has overlap with #82286; both are effectively looking at reducing the number of random numbers generated as part of an operation that may need multiple generated. You might coordinate with @sakno on a single approach to try for both GetItems and Shuffle.

Thanks for the suggestion. GetItems is quite different from Shuffle (it's sampling the same uniform distribution repeatedly and the results are independent so it can be parallelized) so there probably isn't much potential for direct code reuse. That said, the type of optimization being suggested in that PR could potentially be applied here. I am ultimately committing the same sin of generating 64 bit integers and then throwing away half the bits. Unfortunately everything I've tried (including @sakno 's code) to alleviate that problem has ended up making it a lot slower, probably because I'm already massively reducing the number of bits I need to generate. I could easily have missed something though.

I am curious, though, regarding the approach outlined here, whether it would preserve the quality / properties of the random numbers being generated. We don't, for example, just use % as part of Next(int), because it skews the distribution inappropriately (in addition to the cost involved), and will instead use retries or more recently the fastrange algorithm.

This approach is completely unbiased (assuming the underlying prng is).
I'm essentially using the existing Next method (using the single argument form where you provide an upper bound) to generate a random element of the symmetric group (essentially the set of possible permutations) and then applying that permutation to the array so every possible result is equally likely.

stephentoub · 2023-03-01T21:48:41Z

I'm essentially using the existing Next method (using the single argument form where you provide an upper bound) to generate a random element of the symmetric group

I must be misunderstanding your suggestion then.

Please feel free to put up a PR with your suggested change and we can collectively evaluate it with the exact code in front of us.

wainwrightmark · 2023-03-08T19:12:01Z

Please feel free to put up a PR with your suggested change and we can collectively evaluate it with the exact code in front of us.

I am working on this. I've just had a few personal issues come up in the last week...

stephentoub · 2023-07-06T10:25:36Z

Tried and decided against in #83305

wainwrightmark added the tenet-performance Performance related issue label Mar 1, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Mar 1, 2023

danmoseley added the area-System.Runtime label Mar 1, 2023

wainwrightmark mentioned this issue Mar 11, 2023

Performance improvements for Random.Shuffle() #83305

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 11, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 6, 2023

stephentoub closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2023

ghost removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2023

ghost locked as resolved and limited conversation to collaborators Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential 1.5x performance improvement for `Random.Shuffle()` #82838

Potential 1.5x performance improvement for `Random.Shuffle()` #82838

wainwrightmark commented Mar 1, 2023

dotnet-issue-labeler bot commented Mar 1, 2023

ghost commented Mar 1, 2023

Description

How it works

Data

Configuration

Breaking Changes

Pull Request

EgorBo commented Mar 1, 2023

stephentoub commented Mar 1, 2023

sakno commented Mar 1, 2023

wainwrightmark commented Mar 1, 2023

stephentoub commented Mar 1, 2023

wainwrightmark commented Mar 8, 2023

stephentoub commented Jul 6, 2023 •

edited

Loading

Potential 1.5x performance improvement for Random.Shuffle() #82838

Potential 1.5x performance improvement for Random.Shuffle() #82838

Comments

wainwrightmark commented Mar 1, 2023

Description

How it works

Data

Configuration

Breaking Changes

Pull Request

dotnet-issue-labeler bot commented Mar 1, 2023

ghost commented Mar 1, 2023

Description

How it works

Data

Configuration

Breaking Changes

Pull Request

EgorBo commented Mar 1, 2023

stephentoub commented Mar 1, 2023

sakno commented Mar 1, 2023

wainwrightmark commented Mar 1, 2023

stephentoub commented Mar 1, 2023

wainwrightmark commented Mar 8, 2023

stephentoub commented Jul 6, 2023 • edited Loading

Potential 1.5x performance improvement for `Random.Shuffle()` #82838

Potential 1.5x performance improvement for `Random.Shuffle()` #82838

stephentoub commented Jul 6, 2023 •

edited

Loading