Skip to content

Releases: gakhov/pdsa

Implement frequency algorithms

27 Aug 14:09
Compare
Choose a tag to compare
Pre-release

NEW

  • Count Sketch algorithm implementation
  • Count-Min Sketch algorithm implementation

Count Sketch and Count–Min Sketch are simple space-efficient probabilistic data structures
that are used to estimate frequencies of elements in data streams and can address the Heavy hitters problem.

Count Sketch was proposed by Moses Charikar, Kevin Chen, and Martin Farach-Colton in 2002.
Count–Min Sketch was presented in 2003 by Graham Cormode and Shan Muthukrishnan and published in 2005.

Maintenance release

12 Jul 22:05
Compare
Choose a tag to compare
Maintenance release Pre-release
Pre-release

FIXES

  • Fix README to indicate that the minimal required Cython version is 0.28 (Thank you @jseabold)
  • Explicitly require Cython 0.28+ and Python 3.5+ in setup.py
  • Fix cardinality tests (or at least come one step closer to the correct test approach without actual big data)

Implemented HyperLogLog algorithm

09 May 09:50
Compare
Choose a tag to compare
Pre-release

NEW

  • HyperLogLog algorithm implementation

HyperLogLog algorithm was proposed by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Frédéric Meunier in 2007. There is a number of modifications of the algorithm, but this implementation uses the original version with a 32-bit hash function.

FIXES

  • Fix cardinality estimation in Bloom filters
  • Correct the inline example for q-digest
  • Removed support for Python < 3.5

Implemented algorithms for cardinality and rank estimation

03 Nov 14:38
Compare
Choose a tag to compare

In this pre-release implemented the following algorithms:

Cardinality problem

  • Linear counter
  • Probabilistic counter (Flajolet–Martin algorithm)

Rank problem

  • Quantile Digest (q-digest)

Additionally, the overall code has been improved, added tests and examples how to use the implemented data structures.

Implement classical data structures for Membership queries

15 Aug 16:10
Compare
Choose a tag to compare

In this release first 2 classical data structures have been implemented:

  • Classical Bloom Filter
  • Counting Bloom Filter

In order to support memory-efficient representation we also developed the BitCounter and BitVector.