Skip to content

Major Bug Fix, Added Indexing 4-Grams, Better Labeling of Output

Compare
Choose a tag to compare
@prohippo prohippo released this 15 Sep 17:22
· 42 commits to master since this release
8727933

Fixes a major bug with mapping of 2- and 3-grams into overlapping ranges. This would add indexing noise, which would increase the number of clusters obtained with a text data set. The new code produces about the same number of clusters with the Google News text, but had fewer unclustered items. Also added 40 more indexing 4-grams and improved the labeling of output from diagnostic tools.