Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Clinical trials lookup function #52

Merged
merged 13 commits into from
Aug 28, 2024
2 changes: 2 additions & 0 deletions src/dgipy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from .dgidb import (
get_categories,
get_clinical_trials,
get_drug,
get_drug_applications,
get_gene,
Expand All @@ -20,4 +21,5 @@
"get_gene_list",
"get_drug_applications",
"generate_app",
"get_clinical_trials",
]
72 changes: 72 additions & 0 deletions src/dgipy/dgidb.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
"""Provides methods for performing different searches in DGIdb"""

import logging
import os

import pandas as pd
Expand All @@ -9,6 +10,8 @@

import dgipy.queries as queries

_logger = logging.getLogger(__name__)

API_ENDPOINT_URL = os.environ.get("DGIDB_API_URL", "https://dgidb.org/api/graphql")


Expand Down Expand Up @@ -220,6 +223,75 @@ def get_drug_applications(
return result


def get_clinical_trials(
mcannon068nw marked this conversation as resolved.
Show resolved Hide resolved
terms: str | list,
) -> pd.DataFrame: # TODO: Better error handling for new_row?, use_pandas=False
"""Perform a look up for clinical trials data for drug or drugs of interest

:param terms: drug or drugs of interest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there any guarantees about what form a given term takes? are we hoping that the label provided by the normalizer is the term used by the trial? I just tried imatinib vs gleevec and I think they are both returning the same results

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In its current state, yes just essentially hoping that the label the normalizer gives is the correct search term to use. I imagine whether or not it works on clinical trials end (i.e. imatinib vs gleevec) will be determined by how well the clinical trial entries themselves have been tagged/organized for the appropriate interventions? The imatinib and gleevec example I can see someone remembering to associate both of those names with any clinical trial just because its so well studied, but maybe this isn't always the case and maybe some aliases will differ in results.

:return: all clinical trials data for drugs of interest in a DataFrame
"""
base_url = "https://clinicaltrials.gov/api/v2/studies?format=json"
rows_list = []

if isinstance(terms, str):
terms = [terms]
jsstevenson marked this conversation as resolved.
Show resolved Hide resolved

for drug in terms:
mcannon068nw marked this conversation as resolved.
Show resolved Hide resolved
intr_url = f"&query.intr={drug}"
full_uri = base_url + intr_url # TODO: + cond_url + term_url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the story here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I wonder if we can reduce the size of the response with the fields param in case there's anything that we aren't using

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://clinicaltrials.gov/data-api/api#:~:text=Studies-,Studies,-Returns%20data%20of

ClinicalTrials.gov api allows you to submit a bunch of different parameter types through the Studies API. For this implementation, I just had the Studies API as the base URL then wanted to leave room for constructing more specific queries if needed just by adding strings together basically. So intr_url corresponds to the query.intervention parameter, but I could also see a world where someone is also interested in supplying the query.condition parameter, or maybe just additional terms as a parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, just have the intr_url working.

All this being said, if there is a better way to do this I am all open for it

try:
r = requests.get(full_uri, timeout=20)
except requests.exceptions.RequestException as e:
_logger.error("Clinical trials lookup to URL %s failed: %s", full_uri, e)
raise e
if r.status_code == 200:
data = r.json()

for study in data["studies"]:
new_row = {}
new_row["search_term"] = drug
new_row["trial_id"] = study["protocolSection"]["identificationModule"][
"nctId"
]
new_row["brief"] = study["protocolSection"]["identificationModule"][
"briefTitle"
]
new_row["study_type"] = study["protocolSection"]["designModule"][
"studyType"
]
try:
new_row["min_age"] = study["protocolSection"]["eligibilityModule"][
"minimumAge"
]
except:
new_row["min_age"] = None

new_row["age_groups"] = study["protocolSection"]["eligibilityModule"][
"stdAges"
]
new_row["Pediatric?"] = "CHILD" in new_row["age_groups"]

new_row["conditions"] = study["protocolSection"]["conditionsModule"][
"conditions"
]
try:
new_row["interventions"] = study["protocolSection"][
"armsInterventionsModule"
]
except:
new_row["interventions"] = None

rows_list.append(new_row)
else:
_logger.error(
"Received status code %s from request to %s -- returning empty dataframe",
r.status_code,
full_uri,
)
return pd.DataFrame(rows_list)


def _process_drug(results: dict) -> pd.DataFrame:
drug_list = []
concept_list = []
Expand Down
Loading