Skip to content

An app that performs Keyword Extraction, Named Entity Recognition and Summarization of The New York Times Corpus using various statistical and Machine Learning algorithms

License

Notifications You must be signed in to change notification settings

nyeoWM/New-York-Times-Natural-Language-Processing-Toolkit

Repository files navigation

New-York-Times-Natural-Language-Processing-Toolkit

A standalone application that performs Keyword Extraction using KEA, Named Entity Recognition using Convolutional Neural Networks and Summarization of News Articles using TextRank

Further Documentation can be found in docs.

Installation Instructions

Obtaining the files and dependencies

  1. File can be obtained by cloning from github
git clone https://github.com/nyeoWM/New-York-Times-Natural-Language-Processing-Toolkit.git
  1. Note: we use Pipenv to manage our dependencies. If you do not have Pipenv installed, install through pip by running the following command in terminal (Mac Os, Linux) or Powershell (Windows). Pip installation instructions can be found here: Installing Pip
pip install pipenv
  1. Our Graphical user interface requires a python3 version installed with tkinter. If you are unsure if your python3 supports tkinter, the easiest way is to install tkinter using binaries from https://www.python.org/. You can test if tkinter is properly installed by running python3

Activate your python interpreter, then run the following command:

import tkinter

If it proceeds without error messages, you are good to go.

  1. Once you have Pipenv installed, you can install the dependencies directly from the pipfile using:
pipenv install

Running the program

  1. Activate the environment using
pipenv shell
  1. Run the guiUsing:
python3 guiNews.py

About

An app that performs Keyword Extraction, Named Entity Recognition and Summarization of The New York Times Corpus using various statistical and Machine Learning algorithms

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published