Skip to content

Releases: prohippo/ActiveWatch

More Built-in 4-Gram Indices and Literal Indices

21 Jan 06:45
Compare
Choose a tag to compare

This release adds 60 alphabetic 4-grams to the AW built-in index set. In the AW demonstration, this finally pushed the total probability mass of 4-grams over that of 3-grams. It restored the number of output clusters in thAt demonstration to previous levels around 80.

Fix Noisy Clusters in Google News Demonstration

20 Jan 00:43
Compare
Choose a tag to compare

The growth in AW built-in 4- and 5-grams has increased the likelihood of visible noisy matches with common 2- and 3-grams that so far have been missed in noise reduction efforts..

Continuing to Extend Alphabetic 4- and 5-Gram Indexing

17 Jan 21:08
Compare
Choose a tag to compare

Putting more n-grams into an AW index set will generally reduce noise. The latest additions include mostly common English 5-letter words listed by an ESL web site, which became alphabetic 5-grams. This has a big effect on clustering.

General Cleanup

16 Jan 03:02
Compare
Choose a tag to compare

This fixes small various small problems in inflectional and morphological stemming showing up in high-frequency alphabetic 3-grams. More 4- and 5-grams also were added to AW indexing. Also learned how to use MD to make the README release history more readable.

Upload Missing Source Files, New Modules, Extended Indexing

11 Jan 22:18
Compare
Choose a tag to compare

The repository should now finally have all the Java source files to compile the entire AW demonstration system. Sorry about the slip-up here. The latest release also includes two new AW processing modules for user-defined classification profiles to go alongside cluster-defined profiles. The number of alphabetic 4-gram indices is now up to 2,320. Documentation was cleaned up.

Extend 4- and 5-Gram Indexing

08 Jan 06:39
Compare
Choose a tag to compare

Still adding alphabetic 4- and 5-grams for indexing. Now up to 2,290 4-grams and 630 5-grams.

Add N-Gram Indexing options, More 4- and 5-Grams

06 Jan 16:53
Compare
Choose a tag to compare

This allows for experimental indexing without 5-grams or without 4- and 5-grams. The total of alphabetic 4- and 5-grams has increased again to reach indexing entropy of 11.6 bits.

Add Diagnostic Tools, Extend 4- and 5-Gram Indexing

01 Jan 22:46
Compare
Choose a tag to compare

This release adds the DPRO and DLST diagnostic tools, allowing users to view cluster profiles and match lists in more detail. These will make the AW demo more transparent. The AW built-in n-gram index set has again been extended to over 2,700 alphabetic 4- and 5-grams. The AW User Manual has been extended and cleaned up.

Big Extension of 4-Gram Indexing

30 Dec 15:06
Compare
Choose a tag to compare

This was to see how how much closer we could move the total number of 4-gram occurrences in the AW demonstration to the total number of 3-gram occurrences. There is still a ways to go here, but indexing entropy did increase again.

Minor Cleanup and Further Extension of Built-In 4- and 5-Grams

23 Dec 19:20
Compare
Choose a tag to compare

The number of AW built-in alphabetic 4-grams has reached 2,020. This probably will keep rising, but for philosophical and practical reasons, an AW upper limit for 4-grams has been set at 2,500. It is getting harder to find hundreds more alphabetic 4-grams that are common enough to make much difference in n-gram indexing.