Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Cauchy distribution #474

Merged
merged 3 commits into from
May 30, 2018
Merged

Conversation

MaximoB
Copy link
Contributor

@MaximoB MaximoB commented May 23, 2018

This is for issue #368. Since this is my first contribution I limited the scope of the changes and didn't try to optimize the generation using a Ziggurat algorithm. The heavy tails of the Cauchy distribution seemed like a potential problem for a Ziggurat algorithm and .tan() is still reasonably fast.

@MaximoB
Copy link
Contributor Author

MaximoB commented May 23, 2018

Actually I could use some clarification on if rng.gen() generates numbers in [0, 1], (0, 1), or [0, 1)

@pitdicker
Copy link
Contributor

pitdicker commented May 23, 2018

Thank you, great first contribution!

You can use the Open01 distribution to sample from (0, 1). rng.gen() samples from [0, 1).

Otherwise looks good to me.
Wish I knew a bit more about the techniques on generating distributions, I'll find the time someday 😄. But this is a good start to at least offer the distribution, even though it could be made faster.

Copy link
Member

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job overall, but a few comments

impl Distribution<f64> for Cauchy {
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> f64 {
// sample from [0, 1)
let mut x: f64 = rng.gen::<f64>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to qualify the type both on x and in gen. Still, it's okay and the code is easy to read.

Copy link
Collaborator

@vks vks May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this sample Open01 instead? No, 0.0.tan() is fine.

Copy link
Contributor Author

@MaximoB MaximoB May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed I did that right after I pushed the commit and didn't want to push another one just to remove the redundant type specification.

Copy link
Member

@dhardy dhardy May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We sometimes prefer rebasing in PRs to keep the commits clean.

// repeat the drawing until we are in the range of possible values
if lresult >= 0.0 && lresult < float_n + 1.0 {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surely this clamp on the output is there for a reason and removing it doesn't make sense? It traps for the π/2 value but also for negatives and large results. (I don't know what is needed here, but do know this is not the same code.)

Copy link
Contributor Author

@MaximoB MaximoB May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought if it still passed the tests without the loop then it would be better to take it out.

Copy link
Member

@dhardy dhardy May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many things aren't exhaustively tested though. I don't really understand how this code works, and if you don't either I think we shouldn't adjust what it does. I think it's still possible to use the Cauchy code here but not sure whether it's worth it.

Copy link
Contributor Author

@MaximoB MaximoB May 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From some benchmarking I just did it looks like using Cauchy (with the loop put back in) is ~9000 nanoseconds (0.009 milliseconds) slower than the existing code because I check if the rng produced 0.5 before using it, whereas the existing code does not. If I remove the check it has similar performance as without Cauchy.

Since the Cauchy distribution is getting used in more than one place in the codebase I think there is a benefit to standardizing the generation of it.

x = rng.gen::<f64>();
}
// get standard cauchy random number
let comp_dev = (PI * x).tan();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method uses the standard 53 bits of precision. Since FP allows higher precision close to zero, we could consider directly constructing a float in the range (-π/2, π/2) with HighPrecision (#372) when available; it would be a little slower but may not be significantly so.

// repeat the drawing until we are in the range of possible values
if result >= 0.0 {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, yuor code is not equivalent since it allows sampling from negative values.

@vks
Copy link
Collaborator

vks commented May 24, 2018

Note that there is also the statrs implementation. The sampling is similar (they use a different parameter). I think it will be more interesting to look at their implementation if we decide to implement PDFs etc.

@dhardy
Copy link
Member

dhardy commented May 24, 2018

The only significant difference about the statrs version is that it subtracts 0.5 (effectively π/2), which has no effect on the result because tan repeats itself every π.

@MaximoB
Copy link
Contributor Author

MaximoB commented May 24, 2018

Yeah, I went with the domain [0, π) instead of (-π/2, π/2) because I wanted to avoid subtraction if I could.

@vks
Copy link
Collaborator

vks commented May 24, 2018 via email

@MaximoB
Copy link
Contributor Author

MaximoB commented May 24, 2018

They are extremely close in performance, so it's hard to tell but it does look like guarding against 0.5 is slightly faster than doing the subtraction. This comparison probably depends on the architecture the code is running on. The tiebreaker for me was that by eliminating a subtraction you also potentially get rid of some floating point errors.

@dhardy
Copy link
Member

dhardy commented May 25, 2018

There are two ways of generating in [0, 1); the method we used previously generated in [1, 2) then subtracted; in theory it should be possible to generate in (-π/2, π/2) with no performance loss (though 1 bit less precision I think).

@dhardy
Copy link
Member

dhardy commented May 25, 2018

The Open01 method still uses this code, so π * (rng.sample(Open01) - 0.5) might do the trick (possibly the compiler can combine the subtractions, but due to rounding it may still produce -π/2).

@dhardy dhardy merged commit c4d1446 into rust-random:master May 30, 2018
@MaximoB MaximoB deleted the add_cauchy_distribution branch May 30, 2018 14:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants