Skip to content

Latest commit

 

History

History
34 lines (23 loc) · 1.98 KB

README.md

File metadata and controls

34 lines (23 loc) · 1.98 KB

estatnet

Module for Eurostat online glossaries' web scraping and semantic classification

About

This module will enable you to automatically scrape Eurostat online_"Statistics Explained_" and index the contents of these pages into some sort of knowledge graph. It will actually build a graph of inter-relationships between the pages while extracting existing semantic contents (documentation, concepts, glossary, ...).

documentation
status since 2018 – in construction
contributors
license EUPL

Description

Notes

Resources

  • Framework Scrapy for extracting data from online websites.
  • Natural language toolkit nltk to work with human language data.
  • Package NetworkX for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
  • Module py2neo for neo4j graph database, though the bolt driver neo4j-python-driver does the job.

References