Releases · prohippo/ActiveWatch

20 Feb 22:06

prohippo

v2.9.3

a9e8c43

Add 4-Gram Indices For Chemical Nomenclature Latest

Latest

This mostly prepares for a new AW demonstration with scientific text.

Assets 2

07 Feb 01:12

prohippo

v2.9.2.1

58b3bfd

Minor Substitution of 4-Grams

We need IOUS as an indexing feature. It replaces OXYL to keep the AW 4-gram count at 2,960. The change from v/2.9.2 tp v2.9/2.1 should top have only a minor effect on AW..

Assets 2

30 Jan 20:05

prohippo

v2.9.2

34342a1

Larger N-Gram Index Set

Another round of adding 4- and 5-grams for sharper finite indexing. This produces fewer clusters than before, but they should be less noisy. We are reaching thempoint of demising returns

Assets 2

01 Jan 03:52

prohippo

v2.9.1

0d220c8

v2.9.1

One more expansion of built-in AW indexing.

Assets 2

08 Oct 23:30

prohippo

v2.9

47f7880

More N-Grams, Extend Stemming Rules, Clean up Profile Generation

Miscellaneous improvements: add 40 4-grams and 10 5-grams, fix errors and omissions in morphological stemming rules, improve output of DCMS tool, simplify profile generation for keyword scanning and clean up source code.

Assets 2

21 Sep 02:22

prohippo

v2.8.4

6544c91

More 4- and 5-Grams, Generalize Sequential Scan of Vectors

This is a simple extension of v2.8.3, with 60 new 4-grams and 10 new 5-grams. It also upgrades AW sequential vector to be more like a search engine.

Assets 2

15 Sep 23:25

prohippo

v2.8.3

d897496

Extend Indexing, Update Core Modules, New and Modified Tools

Miscellaneous upgrade of AW capabilities: 20 new 4-gram indices and 50 new 5-gram indices; lar er link buffer for CLUSTR, and extend WATCHR to optionally show vectors with high-probability indices as well as low-probability ones; upgrade DSSG and DSRV, and add DKYS, DKTG, AND DTOP tools. The changes were in support of analysis of preaidential tweets (2017-2021).

Assets 2

15 Sep 17:22

prohippo

v2.8.2

8727933

Major Bug Fix, Added Indexing 4-Grams, Better Labeling of Output

Fixes a major bug with mapping of 2- and 3-grams into overlapping ranges. This would add indexing noise, which would increase the number of clusters obtained with a text data set. The new code produces about the same number of clusters with the Google News text, but had fewer unclustered items. Also added 40 more indexing 4-grams and improved the labeling of output from diagnostic tools.

Assets 2

29 Jun 03:43

prohippo

v2.8.1

593e5e8

v2.8.1

This release adds 60 more alphabetic 4-grams to the AW built-in index set. This finally nudges up indexing entropy a bit. A new tool DCMS now lets a user find out what profiles were matched by a given text segment. Documentation now refers to "user-defined n-grams" instead of "literal n-grams."

Assets 2

23 Mar 12:44

prohippo

v2.8

80e4e2d

Check Effect of Adding 100 Alphabetic 4-Gram Indices

We should be close to a point of diminishing returns for indexing of English text with more alphabetic 4-grams. Release v2.8 now has about 11 thousand 2-, 3-, 4-, and 5-gram built-in indices, plus up to another 2 thousand user-provided prefix and suffix indices (or literal n-grams). The entropy of indexing with the Google News sample seems to be be stuck at about 11.60 bits, but defining another 100 4-grams seems doable to see how big a finite set of indices might grow uo.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: prohippo/ActiveWatch

Add 4-Gram Indices For Chemical Nomenclature

Minor Substitution of 4-Grams

Larger N-Gram Index Set

v2.9.1

More N-Grams, Extend Stemming Rules, Clean up Profile Generation

More 4- and 5-Grams, Generalize Sequential Scan of Vectors

Extend Indexing, Update Core Modules, New and Modified Tools

Major Bug Fix, Added Indexing 4-Grams, Better Labeling of Output

v2.8.1

Check Effect of Adding 100 Alphabetic 4-Gram Indices