HyperLogLog
This website is designed to explore some of the concepts found in the following papers
- HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (2007)
- HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm (2013)
- personally I found this paper a little easier to read
The site consists of the following sections
- A section detailing how items are added to a HyperLogLog instance
- A section detailing how the count is appoximated
- A section detailing how HyperLogLog supports merging
- A section that describes where to go next for implementations and uses
Elevator pitch
HyperLogLog is a cardinality estimator designed to solve the count-distinct problem.
It is:
- ✅ Fast
- ✅ Memory efficient
- ✅ Parallellisable
- ✅ Commutative
It might not:
- ⚠️ Give an accurate count - the result could be +/- the actual value by a small margin.
Possible uses:
- Seeing roughly how many unique users visited a website
- Time series data where distinct operation is required
- Database query planning
- Exploratory analysis