Skip to content

amirsalarsafaei/MHC-peptide-Binding-affinity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Solutions I had

  • Fine-tuning prot_bert, the model was trained just like bert, so I thought I could use [SEP] token to separate MHC and peptide sequences and use the output of token [CLS] at the beginning for the classifier head unfortunately due to lack of resources I only managed to run the model for 1.5 epochs because each epoch took 19 hours, with that said I achieved avg Precision of 90% and a good enough ROC-curve and my F1 score was about 80 percent which could drastically change with 4-5 more epochs.
  • The other solution that I didn't have time to try because of the time the first one took was using facebook ESM model to embed the sequences and then feed it to a neural network, although because of the huge demension I was going to use PCA to lower the dimension while keeping the important features in the data

File Formats

in EDA I searched and found out some info about MHCs and extracted some features from given MHC type, like allele group, etc. Then I cleaned the data and used in bert notebook to tokenize train and finally test the model. I used a dense layer with Relu activation and some drop out to prevent from over-fitting and a sigmoid function to create the answer in form of a probability. Because of the large model state (1.8 GB) I didn't include it in the uploaded files. And lastly in the evaluate-model notebook I evaluated the model using the test answers created from the bert notebook which is included with the solution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published