Skip to content

Releases: prohippo/ActiveWatch

Allow Users To Choose Maximum N in N-Gram Indexing

01 Nov 23:23
Compare
Choose a tag to compare

This release implements a user option to set the maximum n in builtin n-gram indexing to 2, 3, or 4 instead of the default of 5. This will introduce more noise into index vectors to give users a better appreciation of the impact of longer n-grams. All other AW processing remains the same.

Improved Phrase Extraction, Various Bug Fixes

28 Oct 22:22
Compare
Choose a tag to compare

The phrase extraction modules were added to AW in release v2.0 with minimal change in their original Java code. Release v2.1 makes a major overhaul of the procedures to join and split phrases in special cases. It also expands alphabetic 4-gram indexing and fixes bugs in stopword recognition and preservation of capitalization in normalizing text input.

AW Phrase Extraction Now Available

27 Sep 04:54
Compare
Choose a tag to compare

This release fixes a bug in updating an AW keyword HashTable object and in scoring a keyword with only a single lexical index. Previous AW implementations included phonetic indices; these have been replaced with builtinalphabetic 4- and 5-grams.. It is also an extensive reworking of AW phrase extraction to summarize clustering results. The ANALZR and PHRASR modules are now operational, although another more tuning of algorithms is planned for a forthcoming v2.1 AW release. Stay tuned.

Mqjor Cleanup Preparing for v2.0

15 Jul 07:39
Compare
Choose a tag to compare

This fixes issues with text segmentation and mapping of Unicode to a subset of ASCII. A typo was also fixed in the AW listing of alphabetic 4-grams. Documentation was updated. Version 2.0 will support phrase extraction to describe groups of similar text items.

Continue Testing and Debugging of AW Descriptive Phrases

06 Mar 06:23
Compare
Choose a tag to compare

Extraction of AW descriptive phrases requires deeper text analysis than n-gram indexing and many more Java classes. The original code, written from about 1998 to about 2002, has compiled poorly with the latest versions of Java. Major rewriting has been necessary for many modules.

Progress Toward Descriptive Phrase Extraction, Cleanup

21 Feb 06:41
Compare
Choose a tag to compare

The extraction of descriptive phrases for clusters was incomplete when work on AW was shelved two decades ago. Java code had been written for the principal classes required, but these were only partially tested at the component level. This functionality is not needed for the basic AW demonstration of clustering of text items, but would be a priority in the fielding of a system applicable to a text analysis of data with probable current interest. This will involve a major extension of AW capabilities and processing modules with more changes to come.

More Sources and Resources for AW Phrase Extraction

10 Feb 15:52
Compare
Choose a tag to compare

The implementation of phrase extraction as an extension of the AW clustering demonstration requires four new command-line modules, plus natural language processing support and associated resources. This is still incomplete and needs testing, but everything now compiles and runs without crashing. The basic AW clustering demonstration remains the same. More to come.

Support for AW Extraction of Descriptive Phrases for Clusters

03 Feb 03:57
Compare
Choose a tag to compare

The ANALZR module must preprocess text from which descriptive phrases are to be extracted by the PHRASR modul. This is quite complicated with many new source files involved. Documentation will come shortly. ANALZR + PHRASR are not required in the basic AW demonstration.

Reorganize Pages More Logically

27 Jan 03:43
Compare
Choose a tag to compare

This moves the source files Lines.java and Inputs.java into the aw package, since they will be required by two separate AW modules. Otherwise all functionality remains the same.

PHRASR Module In AW

25 Jan 15:31
Compare
Choose a tag to compare

PHRASR is a reporting module in the original Java implementation of ActiveWatch. It is an alternative to KEYWDR; the difference is that PHRASR generates descriptive phrases for clusters instead of keywords. This requires some definition files that were not saved on the CD-ROM backup for Java AW, and will have to be reconstructed before PHRASR is fully functional. Documentation to come later.