Skip to content

Releases: prohippo/ActiveWatch

Add Diagnostic Tools and Clarify Command Line Usage

20 Feb 20:45
Compare
Choose a tag to compare

This brings the DKYW diagnostic tool from original Java reimplementation into the new AW repository. DKYW was extracted from the KEYWDR main module to facilitate testing and experimentation. Added some usage advisories for DKYW and DQBE.

Add Diagnostic Tools Showing Details in Scaling Similarity Scores

31 Jan 22:46
Compare
Choose a tag to compare

The DSIM and DSMX tools were in the original C implementation of AW. They have been used to test the computation of statistically scaled similarity, but can also help in identifying which n-grams are contributing the most to a final match score. There are two tools here because DSIM works with a noise model for similarity expected for random pairs of text items, DSMX works with scores of random items with a single fixed profile.

Add Basic Search With Profiles to Testing Tools

22 Jan 15:31
Compare
Choose a tag to compare

The QSRV, DQBE, and DQBK test tools were in the original Java reimplementation of ActiveWatch over two decades ago. They implement a basic search capability helpful for checking newer AW main modules that contain code written since 2021. They can also provide a direct way of exploring an AW data set, though falling short of being a full-fledged search engine.

Add Analytic Capabilities

18 Jan 18:45
Compare
Choose a tag to compare

This release adds three modules (PLOTTR, RANKER, HUBBER) to support analysis of Twitter text data and reorganizes and cleans up various source code files. The new modules can also more generally be applied to non-Twitter data. They sit on top of the current AW automatic clustering capability and actually involve only a little new code..

Fix Inflectional Stemming Bugs

10 Jan 20:37
Compare
Choose a tag to compare

This corrects some long-standing problems with the code for inflectional stemming. These were in the original C source code for ActiveWatch, but were carried over in its incomplete migration to Java two decades ago. Although the change makes little appreciable difference, it will help to making report more easily interpreted.

Enlarge Batch Size for Clustering of Text Items

28 Dec 17:56
Compare
Choose a tag to compare

This is to facilitate analysis of Twitter data, which will usually have many items of short length. The previous batch limit of 8,192 has been increased to 16,384, and data structures have been expanded to accommodate more resulting clusters. Earlier batch limits reflect the capabilities of computing hardware decades ago.

Upgrade AW Clustering to Handle Twitter Data

17 Dec 21:45
Compare
Choose a tag to compare

Twitter data has short actual text and many different special markers that can add noise to indexing. This release extends default alphabetic 4-grams and literal n-grams to reduce noise in general and also makes it easier to control the AW clustering algorithm.

Fix Bug in Buffering UTF-8 Text Input

06 Dec 19:15
Compare
Choose a tag to compare

This release corrects a long-time bug in the buffering of UTF-8 text data input for analysis. The original C code for AW was ASCII only and had to be completely rewritten for its Java reimplementation. Processing errors in v2.3 become evident for text with emoticons, but problems could also arise with non-ASCII punctuation.

Add 4-Gram Indices, Fix Vector Squeezing Bug for Clustering

20 Nov 21:51
Compare
Choose a tag to compare

This adds 80 alphabetic 4-grams that probably should have been included previously for text indexing. Indexing entropy increased slightly as a result, indicating we are on the right track. A bug in the squeezing of index vectors before clustering was discovered because of the change in indexing and was fixed.

Fix Stopword Lookup Bug When Some Have Periods or Apostrophes

09 Nov 03:52
Compare
Choose a tag to compare

The bug was in the handling of periods or apostrophes in stopwords. In special cases, these could make ordinary stopwords unfindable in the AW stopword table. The bug first became noticeable in v2.0.