-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Cauchy distribution #474
Conversation
Actually I could use some clarification on if rng.gen() generates numbers in [0, 1], (0, 1), or [0, 1) |
Thank you, great first contribution! You can use the Otherwise looks good to me. |
00aa6b9
to
482633a
Compare
482633a
to
e956a48
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job overall, but a few comments
src/distributions/cauchy.rs
Outdated
impl Distribution<f64> for Cauchy { | ||
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> f64 { | ||
// sample from [0, 1) | ||
let mut x: f64 = rng.gen::<f64>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to qualify the type both on x
and in gen
. Still, it's okay and the code is easy to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this sample No, Open01
instead?0.0.tan()
is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I noticed I did that right after I pushed the commit and didn't want to push another one just to remove the redundant type specification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We sometimes prefer rebasing in PRs to keep the commits clean.
src/distributions/binomial.rs
Outdated
// repeat the drawing until we are in the range of possible values | ||
if lresult >= 0.0 && lresult < float_n + 1.0 { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this clamp on the output is there for a reason and removing it doesn't make sense? It traps for the π/2 value but also for negatives and large results. (I don't know what is needed here, but do know this is not the same code.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought if it still passed the tests without the loop then it would be better to take it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many things aren't exhaustively tested though. I don't really understand how this code works, and if you don't either I think we shouldn't adjust what it does. I think it's still possible to use the Cauchy code here but not sure whether it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some benchmarking I just did it looks like using Cauchy (with the loop put back in) is ~9000 nanoseconds (0.009 milliseconds) slower than the existing code because I check if the rng produced 0.5 before using it, whereas the existing code does not. If I remove the check it has similar performance as without Cauchy.
Since the Cauchy distribution is getting used in more than one place in the codebase I think there is a benefit to standardizing the generation of it.
x = rng.gen::<f64>(); | ||
} | ||
// get standard cauchy random number | ||
let comp_dev = (PI * x).tan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method uses the standard 53 bits of precision. Since FP allows higher precision close to zero, we could consider directly constructing a float in the range (-π/2, π/2)
with HighPrecision
(#372) when available; it would be a little slower but may not be significantly so.
src/distributions/poisson.rs
Outdated
// repeat the drawing until we are in the range of possible values | ||
if result >= 0.0 { | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, yuor code is not equivalent since it allows sampling from negative values.
Note that there is also the |
The only significant difference about the |
Yeah, I went with the domain [0, π) instead of (-π/2, π/2) because I wanted to avoid subtraction if I could. |
The substraction you are saving is possibly not worth the additional branch
you require to check for 0.5. Did you compare performance?
…On Thu, May 24, 2018, 18:33 MaximoB ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/distributions/binomial.rs
<#474 (comment)>
:
> loop {
- let mut comp_dev: f64;
- // we use the lorentzian distribution as the comparison distribution
- // f(x) ~ 1/(1+x/^2)
- loop {
- // draw from the lorentzian distribution
- comp_dev = (PI*rng.gen::<f64>()).tan();
- // shift the peak of the comparison ditribution
- lresult = expected + sq * comp_dev;
- // repeat the drawing until we are in the range of possible values
- if lresult >= 0.0 && lresult < float_n + 1.0 {
- break;
- }
From some benchmarking I just did it looks like using Cauchy (with the
loop put back in) is ~9000 nanoseconds slower than the existing code
because I check if the rng produced 0.5 before using it, whereas the
existing code does not. If I remove the check it has similar performance.
Since the Cauchy distribution is getting used in more than one place in
the codebase I think there is a benefit to standardizing the generation of
it.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#474 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtNXfnmS7MoEMdf4Lnt9NfiEUyI4-ks5t1uDhgaJpZM4UKjbn>
.
|
They are extremely close in performance, so it's hard to tell but it does look like guarding against 0.5 is slightly faster than doing the subtraction. This comparison probably depends on the architecture the code is running on. The tiebreaker for me was that by eliminating a subtraction you also potentially get rid of some floating point errors. |
154c99c
to
cc377b2
Compare
There are two ways of generating in |
The |
This is for issue #368. Since this is my first contribution I limited the scope of the changes and didn't try to optimize the generation using a Ziggurat algorithm. The heavy tails of the Cauchy distribution seemed like a potential problem for a Ziggurat algorithm and .tan() is still reasonably fast.