Skip to content

A Rust implementation of HyperLogLog trying to be parsimonious with memory.

License

Notifications You must be signed in to change notification settings

LucaCappelletti94/hyperloglog-rs

Repository files navigation

🧮 HyperLogLog-rs

downloads dependents CI license Latest version Documentation

This is a Rust library that provides an implementation of the HyperLogLog (HLL) algorithm, trying to be parsimonious with memory. It also provides an implementation based on the MLE algorithm, which is a more accurate version of the HyperLogLog algorithm but is slower.

You can use it to estimate the cardinality of large sets, and determine also the union and intersection of two sets.

Usage

Add this to your Cargo.toml:

[dependencies]
hyperloglog = "0.1"

Examples

use hyperloglog_rs::prelude::*;

let mut hll = PlusPlus::<Precision6, Bits5, <Precision6 as PackedRegister<Bits5>>::Array, twox_hash::XxHash>::default();
hll.insert(&1);
hll.insert(&2);

let mut hll2 = PlusPlus::<Precision6, Bits5, <Precision6 as PackedRegister<Bits5>>::Array, twox_hash::XxHash>::default();
hll2.insert(&2);
hll2.insert(&3);

let union = hll | hll2;

let estimated_cardinality: f64 = union.estimate_cardinality();
assert!(
estimated_cardinality >= 3.0_f64 * 0.9 &&
estimated_cardinality <= 3.0_f64 * 1.1,
"Expected cardinality to be around 3, got {}",
estimated_cardinality
);

let intersection_cardinality: f64 = hll.estimate_intersection_cardinality(&hll2);

assert!(
        intersection_cardinality >= 1.0_f64 * 0.9 &&
        intersection_cardinality <= 1.0_f64 * 1.1,
        "Expected intersection cardinality to be around 1, got {}",
        intersection_cardinality
);

Using MLE estimation

The MLE estimation for HyperLogLog counters by Otmar Ertl provides a more accurate estimation of the cardinality of a set, but it is slower than the standard HyperLogLog algorithm. Here is an example of how to use it:

#[cfg(feature = "mle")]
{
        use hyperloglog_rs::prelude::*;

        let mut hll1: MLE<PlusPlus::<Precision6, Bits5, <Precision6 as PackedRegister<Bits5>>::Array, twox_hash::XxHash>> = MLE::default();
        
        hll1.insert(&1);
        hll1.insert(&2);
        hll1.insert(&3);

        let mut hll2: MLE<PlusPlus::<Precision6, Bits5, <Precision6 as PackedRegister<Bits5>>::Array, twox_hash::XxHash>> = MLE::default();

        hll2.insert(&2);
        hll2.insert(&3);
        hll2.insert(&4);

        let estimated_cardinality: f64 = hll1.estimate_cardinality();
        assert!(
                estimated_cardinality >= 3.0_f64 * 0.9 &&
                estimated_cardinality <= 3.0_f64 * 1.1,
                "MLE: Expected cardinality to be around 3, got {}",
                estimated_cardinality
        );

        let estimate_intersection_cardinality: f64 = hll1.estimate_intersection_cardinality(&hll2);

        assert!(
                estimate_intersection_cardinality >= 2.0_f64 * 0.9 &&
                estimate_intersection_cardinality <= 2.0_f64 * 1.1,
                "MLE: Expected intersection cardinality to be around 2, got {}",
                estimate_intersection_cardinality
        );
}

No STD

This crate is designed to be as lightweight as possible and does not require any dependencies from the Rust standard library (std). As a result, it can be used in a bare metal or embedded context, where std may not be available. The only feature that requires std is the MLE estimation, which is optional.

Fuzzing

Fuzzing is a technique for finding security vulnerabilities and bugs in software by providing random input to the code. We make sure that our fuzz targets are continuously updated and run against the latest versions of the library to ensure that any vulnerabilities or bugs are quickly identified and addressed.

Learn more about how we fuzz here

Citations

Some relevant citations to learn more:

About

A Rust implementation of HyperLogLog trying to be parsimonious with memory.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published