The overall architecture of the project is summarized in the following diagram. The Parts in green are in the scope of this project. The red arrows indicate integrations between other parts of the proof of concept that are not in place yet.
The scope of this project comprises of following parts:
- Hyperledger Network (with 3 Organizations and chaincode for maintaining data ledger)
- A Proof of Concept Rest API that Each Organization can use
- Jupyterhub on Kubernetes (k8s) where each organization can contribute nodes
- Kubernetes Cluster with Jupyterhub and Argo spun up
- DataExplorer Extension for Jupyterhub
- Customized Jupyterflow (aliakbarrehman/jupyterflow) Forked from (hongkunyoo/jupyterflow)
- Data that is availabe for consumption has its meta data available on blockchain
- Data itself can be hosted on k8s nodes or Azure Fileshares
- User logs in to jupyterhub on k8s
- DataExplorer Extension reads from REST API which in turn interacts with blockchain network and chaincode
- When a data is selected to be used
- Data is leased from the organization (Lease expires after some time through chaincode)
- Encrypted meta data is read from the API (which decrypts it as well)
- Creates a directory structure for experimenting as well
- User writes code to consume data
- User writes a Jupyterflow workflow.yaml file of the following format
jobs:
- command: 'pwd'
- command: 'ls /home/jovyan/'
- command: 'ls -la /home/jovyan/uis'
volumes:
- id: '6de22e830ce4661cbeb6b9b2167c3f206dd946eb89beecd2f20bb17e85e64995'
name: 'uis'
path: '/home/jovyan/uis'
- command: 'ls'
# Job index starts at 1.
dags:
- 1 >> 2
- 1 >> 3
- 1 >> 4
- User runs command
jupyterflow run -f workflow.yaml
- Jupyterflow mounts the necessary data as a volume, generates an Argo workflow and triggers it
- Get a machine with a public IP that k8s cluster can call
- SSH into the machine
- Pre-Requisites Docker, Docker Compose, NodeJS and NPM
- Clone this repository (
git clone https://github.com/aliakbarrehman/jupyterflow && cd master-thesis-uis
) - Run
./spinUpNetwork.sh
that Installs the Pre-Reqs as well as spins up the network with keys in hyperledger-network/crypto-config and spins up Rest API as well as REDIS Cache and MySQL DB - Alternatively for more granular control you can install requirements and run
cd hyperledger-network
and then./start.sh
- Run
./start.network.sh
- Run
cd ../rest-api
and thendocker-compose up -d
- For testing import
resource/endpoints.json
and create some datablocks on blockchain
- On k8s machine
- Pre-Requisites Docker, NodeJS, NPM, Python, PIP, Kubernetes and helm
- Clone this repository (
git clone https://github.com/aliakbarrehman/jupyterflow && cd master-thesis-uis
) - Change directory to jupyterhub (
cd jupyterhub
) - Create a docker registry account and login to it
docker login hub.docker.com
- Run with your own registry name e.g
./buildJupyterhub aliakbarrehman/jupyterhub
. This script creates a k8s service account, packages extension fromjupyterhub/extension/thesis_extension
(Some usefull commands inextension/usefulCommandsForDevelopment.sh
for development), packages extension fromjupyterhub/jupyterflow
and builds a docker image with all the previous packages installed and configured (This image will be used for spinning up a users notebooks / jupyterhub instances and pushes that image) - Change
singleuser.image.name
andsingleuser.image.tag
injupyterhub/config.yaml
with your own - Change IP in
jupyterhub/blockchain-service.yaml
with the public IP of the hyperledger network - Change IP in
jupyterhub/extension/thesis_extension/src/utils/api.ts
with the public IP of the hyperledger network - Run
./runJupyterhub.sh
- Jupyterhub can be access at http://localhost and Argo can be access at https://localhost:2746/workflows
- After logging into Jupyterhub (for testing you can use user as username and password as password)
- Explore datasets available on DataExplorer
- Use it by writing code and workflow.yaml and running the workflow as described above
- View the workflow status on Argo at https://localhost:2746/workflows