Skip to content

Releases: apache/druid

Druid 0.6.73 - Stable

18 Jun 20:37
@fjy fjy
Compare
Choose a tag to compare

We are pleased to announce a new Druid stable, version 0.6.73. New features include:

A production tested dimension cardinality estimation module

We recently open sourced our HyperLogLog module described in bit.ly/1fIEpjM and //bit.ly/1ebLnNI . Documentation has been added on how to use this module as an aggregator and as part of post aggregators.

Hash-based partitioning

We recently introduced a new sharding format for batch indexing. We use the HyperLogLog module to estimate the size of a data set and create partitions based on this size. In our tests, partitioning via this hash based method is both faster and leads to more evenly partitioned segments.

Cross-tier replication

We can now replicate segments across different tiers. This means that you can create a “hot” tier that loads a single copy of the data on more powerful hardware and a “cold” tier that loads another copy of the data on less powerful hardware. This can lead to significant reductions in infrastructure costs.

Nested GroupBy Queries

Thanks to an awesome contribution from Yuval Oren et. al, we can do multi-level aggregation with groupBys. More info here: https://groups.google.com/forum/#!topic/druid-development/8oL28iuC4Gw

GroupBy memory improvements

We’ve made improvements as to how multi-threaded groupBy queries utilize memory. This should help reduce memory pressure on nodes with concurrent, expensive groupBy queries.

Real-time ingestion stability improvements

We’ve seen some stability issues with real-time ingestion with a high number of concurrent persists and have added smarter throttling to handle this type of workload.

Additional features

  • multi-data center distribution (experimental)
  • request tracing
  • restore tasks (to restore archived segments)
  • memcached stability improvements
  • indexing service stability improvements
  • smarter autoscaling in the indexing service
  • numerous bug fixes
  • new documentation for production configurations

Things on our plate

  • Reducing CPU usage on the broker nodes when interacting with the cache (we are seeing query bottlenecks when merging too many results from memcached)
  • Having historical nodes populate memcached (so bySegment results are no longer returned and historical nodes can do their own local merging)
  • Consolidating batch and real-time ingestion schemas so we can move towards a simpler data ingestion model
  • Scaling groupBys with off-heap result merging
  • Improving real-time ingestion stability and performance by moving to more off-heap data structures
  • Autoscaling and sharding the real-time ingestion pipeline
  • Evaluating append only style updates for streaming data (https://github.com/metamx/druid/issues/418)

Druid 0.6.52 - Stable

18 Jun 20:40
@fjy fjy
Compare
Choose a tag to compare
druid-0.6.52

[maven-release-plugin]  copy for tag druid-0.6.52