Skip to content

A simple-easy to follow tutorial on Drug Discovery with Machine Learning

Notifications You must be signed in to change notification settings

sumone-compbio/drug-discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Drug-Discovery

A simple easy-to-follow tutorial on Drug Discovery with Machine Learning. Here I chose the target protein PIK3CA phosphatidylinositol-4,5-bisphosphate 3-kinase catalytic subunit alpha (human) which is overexpressed in breast cancer. First, I fetched the data from the chembl of known experimentally validated inhibitors of our target. Using the smiles of each compound I calculated the rdkit descriptors which later on serve as features while the corresponding pic50 value is the the label. Since the pic50 is continuous, I trained a regression model. Below are the regression plots of train and test data comparing the R2 score of real vs predicted values.

Screenshot from 2024-05-30 16-19-54

The model seems to be overfitting. In the next tutorial, I will upload how to train a graph convolutional network (GCN) for the same task. GCNs are known to capture better expressivity as a graph is a better representation of a molecule compared to sequences, images, or data in tabular format. Later on, I will find novel compounds that might BE potential inhibitors for this target. I will also calculate its drug-likeness properties to filter out the best potential inhibitors.

About

A simple-easy to follow tutorial on Drug Discovery with Machine Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published