Skip to content

Modeling Plasmodium falciparum Diagnostic Test Sensitivity using Machine Learning with Histidine-Rich Protein 2 Variants

Notifications You must be signed in to change notification settings

colbyford/pfHRP_MLModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Modeling Plasmodium falciparum Diagnostic Test Sensitivity using Machine Learning with Histidine-Rich Protein 2 Variants

Colby T. Ford, Gezahegn Solomon Alemayehu, Kayla Blackburn, Karen Lopez,
Cheikh Cambel Dieng, Eugenia Lo, Lemu Golassa, and Daniel Janies

Abstract

Malaria, predominantly caused by Plasmodium falciparum, poses one of largest and most durable health threats in the world. Previously, simplistic regression-based models have been created to characterize malaria rapid diagnostic test performance, though these models often only include a couple genetic factors. Specifically, the Baker et al., 2005 model uses two types of particular repeats in histidine-rich protein 2 (PfHRP2) to describe a P. falciparum infection, though the efficacy of this model has waned over recent years due to genetic mutations in the parasite.

In this work, we use a dataset of 100 P. falciparum PfHRP2 genetic sequences collected in Ethiopia and derived a larger set of motif repeat matches for use in generating a series of diagnostic machine learning models. Here we show that the usage of additional and different motif repeats in more sophisticated machine learning methods proves effective in characterizing PfHRP2 diversity. Furthermore, we use machine learning model explainability methods to highlight which of the repeat types are most important with regards to rapid diagnostic test sensitivity, thereby showcasing a novel methodology for identifying potential targets for future versions of rapid diagnostic tests.

Important Supplementary Data

  • Model metrics for all trained models are in the /models folder. Note: The top performing models' .pkl files are also available.
  • PfHRP2 sample sequences, motif matches, and metadata are available in the pfHRP2_withMeta.csv file.
  • The histidine-based motif repeat finder is provided in the H_motif_finder.R R script.

Paper and Citation

Frontiers in Tropical Diseases: frontiersin.org/articles/10.3389/fitd.2021.707313

@article {Ford2021,
	author = {Ford, Colby T. and Alemayehu, Gezahegn Solomon and Blackburn, Kayla and Lopez, Karen and Dieng, Cheikh Cambel and Lo, Eugenia and Golassa, Lemu and Janies, Daniel},
	title = {Modeling Plasmodium falciparum Diagnostic Test Sensitivity using Machine Learning with Histidine-Rich Protein 2 Variants},
	publisher = {Frontiers},
	journal = {Frontiers in Tropical Diseases},
	volume = {2},
	pages = {28},
	month = {October},
	year = {2021},
	paper = {707313},
	doi = {10.3389/fitd.2021.707313},
	url = {https://www.frontiersin.org/article/10.3389/fitd.2021.707313},
	issn = {2673-7515}
	
}