Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space

This is the repository for the paper published in JCIM:
Semi-supervised Hierarchicical Drug Embedding in Hyperbolic Space

Abstract

Learning accurate drug representations is essential for tasks such as computational drug repositioning and prediction of drug side-effects. A drug hierarchy is a valuable source that encodes human knowledge of drug relations in a tree-like structure where drugs that act on the same organs, treat the same disease, or bind to the same biological target are grouped together. However, its utility in learning drug representations has not yet been explored, and currently described drug representations cannot place novel molecules in a drug hierarchy.

Here, we develop a semi-supervised drug embedding that incorporates two sources of information: (1) underlying chemical grammar that is inferred from chemical structures of drugs and drug-like molecules (unsupervised), and (2) hierarchical relations that are encoded in an expert-crafted hierarchy of approved drugs (supervised). We use the Variational Auto-Encoder (VAE) framework to encode the chemical structures of molecules and use the knowledge-based drug-drug similarity to induce the clustering of drugs in hyperbolic space. The hyperbolic space is amenable for encoding hierarchical relations. Both quantitative and qualitative results support that the learned drug embedding can accurately reproduce the chemical structure and induce the hierarchical relations among drugs. Furthermore, our approach can infer the pharmacological properties of novel molecules by retrieving similar drugs from the embedding space. We demonstrate that the learned drug embedding can be used to find new uses for existing drugs and to discover side-effects. We show that it significantly outperforms baselines in both tasks.

By

Ke Yu
Shyam Visweswaran
Kayhan Batmanghelich

Bibtex

@article{doi:10.1021/acs.jcim.0c00681,
author = {Yu, Ke and Visweswaran, Shyam and Batmanghelich, Kayhan},
title = {Semi-supervised Hierarchical Drug Embedding in Hyperbolic Space},
journal = {Journal of Chemical Information and Modeling},
volume = {60},
number = {12},
pages = {5647-5657},
year = {2020},
doi = {10.1021/acs.jcim.0c00681},
note = {PMID: 33140969},
URL = {https://doi.org/10.1021/acs.jcim.0c00681},
eprint = {https://doi.org/10.1021/acs.jcim.0c00681}
}

To train a new model (shell script):


#!/usr/bin/env bash

set -x

EXPERIMENT="my_model"
DATA_DIR="./data/fda_drugs"
DATA_FILE="smiles_set_clean.smi"
FDA_FILE="all_drugs.smi"
LOG_DIR="./logs"

mkdir -p ${LOG_DIR}

python main.py \
  --data_dir="${DATA_DIR}" \
  --data_file="${DATA_FILE}" \
  --fda_file="${FDA_FILE}" \
  --vocab_file="char_set_clean.pkl" \
  --atc_sim_file="drugs_sp_all.csv" \
  --checkpoint_dir="./experiments/EXP_TASK" \
  --experiment_name="${EXPERIMENT}" \
  --task="vae + atc" \
  --limit=0 \
  --batch_size=128 \
  --epochs=200 \
  --max_sequence_length=120 \
  --learning_rate=3e-4 \
  --max_norm=1e12 \
  --wd=0 \
  --manifold_type="Lorentz" \
  --prior_type="Standard" \
  --num_centroids=0 \
  --bidirectional=False \
  --num_layers=1 \
  --hidden_size=512 \
  --latent_size=64 \
  --word_dropout_rate=0.2 \
  --anneal_function="logistic" \
  --k=0.51 \
  --x0=29 \
  --C=1.0 \
  --num_workers=4 \
  --logging_steps=1 \
  --save_per_epochs=5 \
  --new_training=False \
  --new_annealing=False \
  --checkpoint="checkpoint_epoch000.model" \
  --trained_epochs=00 \
  --alpha=0.0 \
  --beta=0.015625 \
  --gamma=0.0 \
  --delta=11.0 \
  --nneg=11 \
  --fda_prop=0.2 >> ${LOG_DIR}/${EXPERIMENT}.log 2>&1

Python files:

main.py: pipeline of the training procedure
drugdata.py: dataloader of SMILES strings and the ATC hierarchy
hvae.py: hyperbolic VAE functions
lorentz.py: Lorentz model funtions
decode.py: SMILES reconstruction function
evae.py: classic VAE functions
metrics.py: dendrogram purity score
utils.py: utility functions

Data:

/DrugEmbedding/data/fda_drugs: SMILES strings and the ATC hierarchy table
/DrugEmbedding/data/repoDB: repoDB dataset
/DrugEmbedding/data/sider/deepchem: SIDER dataset
/DrugEmbedding/data/pdbbind: PDBbind dataset
/DrugEmbedding/data/tox21L: Tox21 dataset

Notebooks:

Sample code used for experiments in the paper

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
DrugEmbedding		DrugEmbedding
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space

Abstract

Bibtex

To train a new model (shell script):

Python files:

Data:

Notebooks:

About

Releases

Packages

Contributors 8

Languages

License

batmanlab/drugEmbedding

Folders and files

Latest commit

History

Repository files navigation

Semi-Supervised Hierarchical Drug Embedding in Hyperbolic Space

Abstract

Bibtex

To train a new model (shell script):

Python files:

Data:

Notebooks:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages