Skip to content

mitakas/flink-apriori-java

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

flink-apriori-java

Apriori Algorithm in Apache Flink

Algorithm

This project implements the Apriori algorithm as described in the 1994 paper "Fast Algorithms for Mining Association Rules" by Rakesh Agrawal and Ramakrishnan Srikant.

Paper

AGRAWAL, Rakesh, et al. Fast Algorithms for Mining Association Rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB. 1994. S. 487-499.

Project

Build the jar file using the following command:

mvn clean package -Pbuild-jar

This should produce a file called flink-apriori-java-1.0-SNAPSHOT.jar in the target directory.

Parameters

  • input location of the BMS-POS.dat file
  • output prints to stdout if not set
  • min-support a real number in the range (0,1]
  • itemset-size an integer in the range (1, Infinity]

Dependencies

Data

Download the KDD Cup 2000 Dataset. More info about the data here.

Preparation

After downloading the data, unpack the BMS-POS.dat file. Included in this repository is a checksum file for verifying the integrity of the file.

Steps:

  1. unzip -j KDDCup2000.zip assoc/BMS-POS.dat.gz
  2. gunzip BMS-POS.dat.gz
  3. sha1sum -c BMS-POS.dat.sha1

TODO

  • Tests
  • Implement the ItemSetCalculateFrequency RichMapFunction in a more efficient manner

License

Apache License 2.0

This project uses libraries licensed under Apache License 2.0

About

Apriori Algorithm in Apache Flink

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages