Releases · apache/druid

We are pleased to announce a new Druid stable, version 0.6.73. New features include:

A production tested dimension cardinality estimation module

We recently open sourced our HyperLogLog module described in bit.ly/1fIEpjM and //bit.ly/1ebLnNI . Documentation has been added on how to use this module as an aggregator and as part of post aggregators.

Hash-based partitioning

We recently introduced a new sharding format for batch indexing. We use the HyperLogLog module to estimate the size of a data set and create partitions based on this size. In our tests, partitioning via this hash based method is both faster and leads to more evenly partitioned segments.

Cross-tier replication

We can now replicate segments across different tiers. This means that you can create a “hot” tier that loads a single copy of the data on more powerful hardware and a “cold” tier that loads another copy of the data on less powerful hardware. This can lead to significant reductions in infrastructure costs.

Nested GroupBy Queries

Thanks to an awesome contribution from Yuval Oren et. al, we can do multi-level aggregation with groupBys. More info here: https://groups.google.com/forum/#!topic/druid-development/8oL28iuC4Gw

GroupBy memory improvements

We’ve made improvements as to how multi-threaded groupBy queries utilize memory. This should help reduce memory pressure on nodes with concurrent, expensive groupBy queries.

Real-time ingestion stability improvements

We’ve seen some stability issues with real-time ingestion with a high number of concurrent persists and have added smarter throttling to handle this type of workload.

Additional features

multi-data center distribution (experimental)
request tracing
restore tasks (to restore archived segments)
memcached stability improvements
indexing service stability improvements
smarter autoscaling in the indexing service
numerous bug fixes
new documentation for production configurations

Things on our plate

Reducing CPU usage on the broker nodes when interacting with the cache (we are seeing query bottlenecks when merging too many results from memcached)
Having historical nodes populate memcached (so bySegment results are no longer returned and historical nodes can do their own local merging)
Consolidating batch and real-time ingestion schemas so we can move towards a simpler data ingestion model
Scaling groupBys with off-heap result merging
Improving real-time ingestion stability and performance by moving to more off-heap data structures
Autoscaling and sharding the real-time ingestion pipeline
Evaluating append only style updates for streaming data (https://github.com/metamx/druid/issues/418)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: apache/druid

Druid 0.6.73 - Stable

Druid 0.6.52 - Stable