Added Cauchy distribution #474

MaximoB · 2018-05-23T14:22:45Z

This is for issue #368. Since this is my first contribution I limited the scope of the changes and didn't try to optimize the generation using a Ziggurat algorithm. The heavy tails of the Cauchy distribution seemed like a potential problem for a Ziggurat algorithm and .tan() is still reasonably fast.

MaximoB · 2018-05-23T16:28:19Z

Actually I could use some clarification on if rng.gen() generates numbers in [0, 1], (0, 1), or [0, 1)

pitdicker · 2018-05-23T17:58:56Z

Thank you, great first contribution!

You can use the Open01 distribution to sample from (0, 1). rng.gen() samples from [0, 1).

Otherwise looks good to me.
Wish I knew a bit more about the techniques on generating distributions, I'll find the time someday 😄. But this is a good start to at least offer the distribution, even though it could be made faster.

dhardy

Good job overall, but a few comments

dhardy · 2018-05-24T05:50:02Z

src/distributions/cauchy.rs

+impl Distribution<f64> for Cauchy {
+    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> f64 {
+        // sample from [0, 1)
+        let mut x: f64 = rng.gen::<f64>();


You don't need to qualify the type both on x and in gen. Still, it's okay and the code is easy to read.

~~Shouldn't this sample Open01 instead?~~ No, 0.0.tan() is fine.

Yeah, I noticed I did that right after I pushed the commit and didn't want to push another one just to remove the redundant type specification.

We sometimes prefer rebasing in PRs to keep the commits clean.

dhardy · 2018-05-24T06:05:10Z

src/distributions/binomial.rs

-                // repeat the drawing until we are in the range of possible values
-                if lresult >= 0.0 && lresult < float_n + 1.0 {
-                    break;
-                }


Surely this clamp on the output is there for a reason and removing it doesn't make sense? It traps for the π/2 value but also for negatives and large results. (I don't know what is needed here, but do know this is not the same code.)

Sorry, I thought if it still passed the tests without the loop then it would be better to take it out.

Many things aren't exhaustively tested though. I don't really understand how this code works, and if you don't either I think we shouldn't adjust what it does. I think it's still possible to use the Cauchy code here but not sure whether it's worth it.

From some benchmarking I just did it looks like using Cauchy (with the loop put back in) is ~9000 nanoseconds (0.009 milliseconds) slower than the existing code because I check if the rng produced 0.5 before using it, whereas the existing code does not. If I remove the check it has similar performance as without Cauchy.

Since the Cauchy distribution is getting used in more than one place in the codebase I think there is a benefit to standardizing the generation of it.

dhardy · 2018-05-24T06:09:13Z

src/distributions/cauchy.rs

+            x = rng.gen::<f64>();
+        }
+        // get standard cauchy random number
+        let comp_dev = (PI * x).tan();


This method uses the standard 53 bits of precision. Since FP allows higher precision close to zero, we could consider directly constructing a float in the range (-π/2, π/2) with HighPrecision (#372) when available; it would be a little slower but may not be significantly so.

dhardy · 2018-05-24T06:12:23Z

src/distributions/poisson.rs

-                    // repeat the drawing until we are in the range of possible values
-                    if result >= 0.0 {
-                        break;
-                    }


Again, yuor code is not equivalent since it allows sampling from negative values.

vks · 2018-05-24T11:40:47Z

Note that there is also the statrs implementation. The sampling is similar (they use a different parameter). I think it will be more interesting to look at their implementation if we decide to implement PDFs etc.

dhardy · 2018-05-24T14:01:03Z

The only significant difference about the statrs version is that it subtracts 0.5 (effectively π/2), which has no effect on the result because tan repeats itself every π.

MaximoB · 2018-05-24T14:14:18Z

Yeah, I went with the domain [0, π) instead of (-π/2, π/2) because I wanted to avoid subtraction if I could.

vks · 2018-05-24T17:47:06Z

The substraction you are saving is possibly not worth the additional branch you require to check for 0.5. Did you compare performance?

…

On Thu, May 24, 2018, 18:33 MaximoB ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In src/distributions/binomial.rs <#474 (comment)> : > loop { - let mut comp_dev: f64; - // we use the lorentzian distribution as the comparison distribution - // f(x) ~ 1/(1+x/^2) - loop { - // draw from the lorentzian distribution - comp_dev = (PI*rng.gen::<f64>()).tan(); - // shift the peak of the comparison ditribution - lresult = expected + sq * comp_dev; - // repeat the drawing until we are in the range of possible values - if lresult >= 0.0 && lresult < float_n + 1.0 { - break; - } From some benchmarking I just did it looks like using Cauchy (with the loop put back in) is ~9000 nanoseconds slower than the existing code because I check if the rng produced 0.5 before using it, whereas the existing code does not. If I remove the check it has similar performance. Since the Cauchy distribution is getting used in more than one place in the codebase I think there is a benefit to standardizing the generation of it. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#474 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACCtNXfnmS7MoEMdf4Lnt9NfiEUyI4-ks5t1uDhgaJpZM4UKjbn> .

MaximoB · 2018-05-24T19:42:54Z

They are extremely close in performance, so it's hard to tell but it does look like guarding against 0.5 is slightly faster than doing the subtraction. This comparison probably depends on the architecture the code is running on. The tiebreaker for me was that by eliminating a subtraction you also potentially get rid of some floating point errors.

dhardy · 2018-05-25T13:54:12Z

There are two ways of generating in [0, 1); the method we used previously generated in [1, 2) then subtracted; in theory it should be possible to generate in (-π/2, π/2) with no performance loss (though 1 bit less precision I think).

dhardy · 2018-05-25T13:57:04Z

The Open01 method still uses this code, so π * (rng.sample(Open01) - 0.5) might do the trick (possibly the compiler can combine the subtractions, but due to rounding it may still produce -π/2).

Added Cauchy distribution

a391f2c

MaximoB force-pushed the add_cauchy_distribution branch from 00aa6b9 to 482633a Compare May 23, 2018 19:45

Fixed mistake in the domain getting passed into the tangent function

e956a48

MaximoB force-pushed the add_cauchy_distribution branch from 482633a to e956a48 Compare May 23, 2018 21:16

dhardy reviewed May 24, 2018

View reviewed changes

Adding guard loops back to binomial and poission

cc377b2

MaximoB force-pushed the add_cauchy_distribution branch from 154c99c to cc377b2 Compare May 24, 2018 21:27

dhardy merged commit c4d1446 into rust-random:master May 30, 2018

MaximoB deleted the add_cauchy_distribution branch May 30, 2018 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added Cauchy distribution #474

Added Cauchy distribution #474

MaximoB commented May 23, 2018

MaximoB commented May 23, 2018 •

edited

Loading

pitdicker commented May 23, 2018 •

edited

Loading

dhardy left a comment

dhardy May 24, 2018

vks May 24, 2018 •

edited

Loading

MaximoB May 24, 2018 •

edited

Loading

dhardy May 24, 2018 •

edited

Loading

dhardy May 24, 2018

MaximoB May 24, 2018 •

edited

Loading

dhardy May 24, 2018 •

edited

Loading

MaximoB May 24, 2018 •

edited

Loading

dhardy May 24, 2018

dhardy May 24, 2018

vks commented May 24, 2018

dhardy commented May 24, 2018

MaximoB commented May 24, 2018 •

edited

Loading

vks commented May 24, 2018 via email

MaximoB commented May 24, 2018 •

edited

Loading

dhardy commented May 25, 2018

dhardy commented May 25, 2018 •

edited

Loading

Added Cauchy distribution #474

Added Cauchy distribution #474

Conversation

MaximoB commented May 23, 2018

MaximoB commented May 23, 2018 • edited Loading

pitdicker commented May 23, 2018 • edited Loading

dhardy left a comment

Choose a reason for hiding this comment

dhardy May 24, 2018

Choose a reason for hiding this comment

vks May 24, 2018 • edited Loading

Choose a reason for hiding this comment

MaximoB May 24, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy May 24, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy May 24, 2018

Choose a reason for hiding this comment

MaximoB May 24, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy May 24, 2018 • edited Loading

Choose a reason for hiding this comment

MaximoB May 24, 2018 • edited Loading

Choose a reason for hiding this comment

dhardy May 24, 2018

Choose a reason for hiding this comment

dhardy May 24, 2018

Choose a reason for hiding this comment

vks commented May 24, 2018

dhardy commented May 24, 2018

MaximoB commented May 24, 2018 • edited Loading

vks commented May 24, 2018 via email

MaximoB commented May 24, 2018 • edited Loading

dhardy commented May 25, 2018

dhardy commented May 25, 2018 • edited Loading

MaximoB commented May 23, 2018 •

edited

Loading

pitdicker commented May 23, 2018 •

edited

Loading

vks May 24, 2018 •

edited

Loading

MaximoB May 24, 2018 •

edited

Loading

dhardy May 24, 2018 •

edited

Loading

MaximoB May 24, 2018 •

edited

Loading

dhardy May 24, 2018 •

edited

Loading

MaximoB May 24, 2018 •

edited

Loading

MaximoB commented May 24, 2018 •

edited

Loading

MaximoB commented May 24, 2018 •

edited

Loading

dhardy commented May 25, 2018 •

edited

Loading