-
Notifications
You must be signed in to change notification settings - Fork 1
/
structure
20 lines (16 loc) · 1.02 KB
/
structure
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
* src/
** annotate.py --> annotates the tokens in the articles
** cleaner.py --> converts articles and dicts to plain text
** scraper.py --> downloads articles from siempremujer.com
** utils.py --> helper functions; build dict with articles' categories;
compute common/regional dictionaries;
** text_local.py --> modified version of nltk's Texts module which adds
return_concordances() to ConcordanceIndex;
* src/dictionaries/ --> raw version of dictionaries
* src/data/
** class_articles --> file containing article ids sorted by categories
** articles/ --> articles in html format
** articles-raw/ --> backup of articles/
** articles-plain/ --> plaintext articles
** dictionaries-clean/ --> dictionaries without annotations (simple word lists)
** dictionaries-common/ --> processed dictionaries; common + regional spanish dictionaries computed