Major Bug Fix, Added Indexing 4-Grams, Better Labeling of Output
Fixes a major bug with mapping of 2- and 3-grams into overlapping ranges. This would add indexing noise, which would increase the number of clusters obtained with a text data set. The new code produces about the same number of clusters with the Google News text, but had fewer unclustered items. Also added 40 more indexing 4-grams and improved the labeling of output from diagnostic tools.