Skip to content

avisionh/sqlquerygraph

Repository files navigation

sqlquerygraph

build status codecov CodeFactor License: MIT Code style: black

Parse your SQL queries and represent their structure as a graph.

Currently, we implement the ability of representing how each of the tables in a set of SQL query scripts depend on each other.

MATCH p=(r:Reporting)-[:HAS_TABLE_DEPENDENCY]->()-[:HAS_TABLE_DEPENDENCY]->()
WHERE r.table_name='user_activity'
RETURN p

Requirements

To run the code in here, ensure your system meets the following requirements:

  • Unix-like operating system (macOS, Linux, ...) - though it might work on Windows;
  • Python 3.8 or above; and
  • Poetry installed.
  • direnv installed, including shell hooks;

Set-up

For quickstart set-up of the project, run the below in your shell/terminal:

# 1. read project-specific environment variables
direnv allow

# 2. activate virtual environment and install package dependencies
poetry shell
poetry install

# 3. check adherence to good standards on every commit
pre-commit install

To then extract the tables and their dependencies from the example SQL scripts in the sql/ directory, run the below in your shell/terminal. It will generate .csv files of the tables and their dependencies. It will also generate .cypher files to enable you to import the data into neo4j, after you have added the .csv files to the database.

python sqlquerygraph.py -sd 'sql' -ed 'neo4j' -rd '<datasets, individually quoted and separated by commas, of tables in sql/ scripts>'

Run neo4j graph database

We use neo4j for this project to visualise the dependencies between tables. To install neo4j locally using Docker Compose, follow the below instructions:

  1. Install and open Docker (if already installed, just open the program).

    • For Mac OSX, install Docker and Docker Compose together here.
    • For Linux, install Docker here and then follow these instructions to install docker-compose.
    • For Windows, install Docker and Docker Compose together here.
  2. Create a new file, .secrets, in the directory where this README.md file sits, and store the following in there. This allows you to set the password for your local neo4j instance without exposing it.

    export NEO4J_AUTH=neo4j/<your_password>
    export NEO4J_USERNAME=neo4j
    export NEO4J_PASSWORD=<your_password>
    

    Then update your .env file to take in the new .secrets file you created by entering the below in your shell/terminal:

    direnv allow
  3. Build the Docker image and launch the container. Within this directory that has the docker-compose.yml file, run the below in your shell/terminal:

    docker-compose up

    You will know when it's ready when you get the following message in your terminal:

    app      | [INFO  wait] Host [neo4j:7687] is now available!
    app      | [INFO  wait] --------------------------------------------------------
    app      | [INFO  wait] docker-compose-wait - Everything's fine, the application can now start!
    app      | [INFO  wait] --------------------------------------------------------
    

    Then launch neo4j locally via opening your web browser and entering the following web address:

    The username and password will those specified in your .secrets file.

  4. When you have finished playing with your local neo4j instance, remember to stop it running by executing the below in your shell/terminal:

    docker-compose down
    # option if you want to delete data too
    docker-compose down --volumes

Acknowledgements

This builds on the excellent moz-sql-parser package.

With thanks also to the Google Cloud Public Dataset Program for which the SQL queries in this repo are based off the program's GitHub repos dataset.