Skip to content

datasciencecampus/iati-partner-search

Repository files navigation

IATI Partner Search

Build Passing/Failing on TravisCI.com Docker Automated build Code Coverage Code style: black

A tool using NLP technology to match aid funders with potential implementers.

A more detailed break-down of the project can be found here.

Installation

To install the python packages, make sure that you have your virtual environment activated and run the following:

pip install invoke
invoke install-all

This will install all of the development and testing packages as well

Pre-Commit

Pre-commit is a helpful tool that will catch file errors when you try to commit work. This is helpful so that small bugs and typos aren't pushed to Github, and we don't have to wait for out automated tests to find them.

This is optional, but to initialize pre-commit, run the following:

pre-commit install

Testing

To run tests:

invoke test

To run linting, formatting and tests:

invoke ci

Using Docker

This repo provides a Dockerfile (app.Dockerfile), that you can build on your machine, which should provide an environment in which the code can execute.

Python Pipeline Development

We do not currently publish our images to DockerHub. You must build them on your machine. Make sure that the Docker VM is running, then run:

docker build -t iati_partner_search -f ./app.Dockerfile .

Or if you have invoke installed, run invoke build-dev-docker which will run this command on your behalf.

The -t iati_partner_search means that we're telling Docker that we want the image to be called iati_partner_search. The -f ./app.Dockerfile tells Docker which Dockerfile to use.

Once the image has been built, we can run a container:

docker run --name=ips -it -v ${pwd}:/iati-partner-search -p 5000:5000 iati_partner_search bash

to break this down:

  • --name=ips: tells what we will call this container when we want to start and stop it again.
  • -it:
  • -v ${pwd}:/iati-partner-search: tells Docker to share the files on your machine, with the Docker container.
  • -p 5000:5000: tells Docker that we want to map port 5000 on our machine to port 5000 of the container
  • iati_partner_search: refers to the image that we want to build the container from.
  • bash: is the process that want the container to run. In this case we're asking it to start the CLI. If instead we want to start the web application, do not include this, and it will be started automatically.

You can then stop and start the container by running docker stop ips and docker start ips respectively.

You can read more about Docker containers and this process here.

Get the Data

To download the raw data, run:

invoke download-data

Note the data is currently just over 1GB in size and so could take some time to download.

If you're not working from within the Docker container, you will also need to download the nltk data. Execute the following:

invoke download-nltk-data

Run the Flask application

After adding the required data and installing the required packages you will be able to run the web application on your own computer.

In the /data directory make sure you have

- all_downloaded_records.csv
- processed_records.csv
- term_document_matrix.pkl
- vectorizer.pkl

Then, using invoke, run

invoke build-docker

to build the docker and then

invoke run-docker

to run it.

After a few seconds of start up time it should be up and running. Navigate to localhost:5000 in your web browser to view the page.