Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential 1.5x performance improvement for Random.Shuffle() #82838

Closed
wainwrightmark opened this issue Mar 1, 2023 · 9 comments
Closed

Potential 1.5x performance improvement for Random.Shuffle() #82838

wainwrightmark opened this issue Mar 1, 2023 · 9 comments
Labels

Comments

@wainwrightmark
Copy link

Description

Hi. This is not a performance problem per se but a potential for quite a big improvement.

I noticed that a Shuffle() method has been added to the Random class.

I recently did some performance optimizations in the rust rand crate (library) which made it run about 1.5x times faster . I've had a little go and gotten similar results in C# (see benchmarks below).

How it works

Shuffling an array of length n essentially involves making n random swaps between elements.
The way I've made it faster is to, instead of generating a new 32 bit random number for each swap, group the swaps together into the largest groups possible and only generate one random number per group. (I'm simplifying - counting calls to Random.Next(n) here which themselves may call _prng.Sample() more than once)

For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow Next(n) function.

Data

These are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000

Method Mean Error StdDev Median
Shuffle_10_Old 101.64 ns 0.469 ns 0.439 ns 101.73 ns
Shuffle_10_New 57.38 ns 0.663 ns 0.620 ns 57.24 ns
Shuffle_100_Old 906.13 ns 9.934 ns 9.292 ns 910.72 ns
Shuffle_100_New 594.07 ns 11.824 ns 12.142 ns 587.11 ns
Shuffle_1000_Old 9,189.39 ns 84.756 ns 79.280 ns 9,196.53 ns
Shuffle_1000_New 6,733.79 ns 132.529 ns 172.325 ns 6,844.89 ns

The new versions are all about 1.5x as fast as their equivalents.

Configuration

dotnet 8.0.100-preview.1.23115.2
Windows 11 Pro 22000.1574
x64
12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz

Breaking Changes

This change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability.

Pull Request

I'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.

@wainwrightmark wainwrightmark added the tenet-performance Performance related issue label Mar 1, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 1, 2023
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost
Copy link

ghost commented Mar 1, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Hi. This is not a performance problem per se but a potential for quite a big improvement.

I noticed that a Shuffle() method has been added to the Random class.

I recently did some performance optimizations in the rust rand crate (library) which made it run about 1.5x times faster . I've had a little go and gotten similar results in C# (see benchmarks below).

How it works

Shuffling an array of length n essentially involves making n random swaps between elements.
The way I've made it faster is to, instead of generating a new 32 bit random number for each swap, group the swaps together into the largest groups possible and only generate one random number per group. (I'm simplifying - counting calls to Random.Next(n) here which themselves may call _prng.Sample() more than once)

For an array of length 12, instead of generating 12 different random numbers, you generate one random number in the range [0,479001600) and use div and mod methods to work out which swaps to do. For longer arrays you use more groups and they get a bit smaller but the end result is a lot fewer calls to the comparatively slow Next(n) function.

Data

These are my benchmark results, testing the existing "Old" implementation vs my proposed "New" implementation for int arrays of length 10, 100, and 1000

Method Mean Error StdDev Median
Shuffle_10_Old 101.64 ns 0.469 ns 0.439 ns 101.73 ns
Shuffle_10_New 57.38 ns 0.663 ns 0.620 ns 57.24 ns
Shuffle_100_Old 906.13 ns 9.934 ns 9.292 ns 910.72 ns
Shuffle_100_New 594.07 ns 11.824 ns 12.142 ns 587.11 ns
Shuffle_1000_Old 9,189.39 ns 84.756 ns 79.280 ns 9,196.53 ns
Shuffle_1000_New 6,733.79 ns 132.529 ns 172.325 ns 6,844.89 ns

The new versions are all about 1.5x as fast as their equivalents.

Configuration

dotnet 8.0.100-preview.1.23115.2
Windows 11 Pro 22000.1574
x64
12th Gen Intel(R) Core(TM) i7-12700H 2.30 GHz

Breaking Changes

This change would not alter the API in any way but would be a value breaking change - the results of shuffling with the same seed would be different (but still random!) and the prng would almost always be advanced fewer steps. For this reason, if this is considered worth doing, I suggest doing it sooner rather than later to avoid annoying people who are reliant on value stability.

Pull Request

I'm happy to tidy up my code and make a pull request for this if it's considered worthwhile.

Author: wainwrightmark
Assignees: -
Labels:

area-System.Runtime, tenet-performance, untriaged

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Mar 1, 2023

the results of shuffling with the same seed would be different (but still random!)

Presumably, it's fine to change the algorithm when seed is not specified by user, e.g. new Random() or Random.Shared

@stephentoub
Copy link
Member

This has overlap with #82286; both are effectively looking at reducing the number of random numbers generated as part of an operation that may need multiple generated. You might coordinate with @sakno on a single approach to try for both GetItems and Shuffle.

I am curious, though, regarding the approach outlined here, whether it would preserve the quality / properties of the random numbers being generated. We don't, for example, just use % as part of Next(int), because it skews the distribution inappropriately (in addition to the cost involved), and will instead use retries or more recently the fastrange algorithm.

@sakno
Copy link
Contributor

sakno commented Mar 1, 2023

% can be replaced with & only if the length of choices is a power of 2, if we want to have unbiased selection of the items. This is why I chose NextBytes to pre-populate vector of random values instead of % when length is not a power of 2. Anyway, fastrange is still applied to that vector. Also, this approach produces the same result for the same seed.

@wainwrightmark
Copy link
Author

This has overlap with #82286; both are effectively looking at reducing the number of random numbers generated as part of an operation that may need multiple generated. You might coordinate with @sakno on a single approach to try for both GetItems and Shuffle.

Thanks for the suggestion. GetItems is quite different from Shuffle (it's sampling the same uniform distribution repeatedly and the results are independent so it can be parallelized) so there probably isn't much potential for direct code reuse. That said, the type of optimization being suggested in that PR could potentially be applied here. I am ultimately committing the same sin of generating 64 bit integers and then throwing away half the bits. Unfortunately everything I've tried (including @sakno 's code) to alleviate that problem has ended up making it a lot slower, probably because I'm already massively reducing the number of bits I need to generate. I could easily have missed something though.

I am curious, though, regarding the approach outlined here, whether it would preserve the quality / properties of the random numbers being generated. We don't, for example, just use % as part of Next(int), because it skews the distribution inappropriately (in addition to the cost involved), and will instead use retries or more recently the fastrange algorithm.

This approach is completely unbiased (assuming the underlying prng is).
I'm essentially using the existing Next method (using the single argument form where you provide an upper bound) to generate a random element of the symmetric group (essentially the set of possible permutations) and then applying that permutation to the array so every possible result is equally likely.

@stephentoub
Copy link
Member

I'm essentially using the existing Next method (using the single argument form where you provide an upper bound) to generate a random element of the symmetric group

I must be misunderstanding your suggestion then.

Please feel free to put up a PR with your suggested change and we can collectively evaluate it with the exact code in front of us.

@wainwrightmark
Copy link
Author

Please feel free to put up a PR with your suggested change and we can collectively evaluate it with the exact code in front of us.

I am working on this. I've just had a few personal issues come up in the last week...

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 11, 2023
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 6, 2023
@stephentoub
Copy link
Member

stephentoub commented Jul 6, 2023

Tried and decided against in #83305

@stephentoub stephentoub closed this as not planned Won't fix, can't repro, duplicate, stale Jul 6, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Jul 6, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Aug 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants