Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chembl tranfsorm and dist #401

Open
wants to merge 6 commits into
base: develop
Choose a base branch
from
Open

chembl tranfsorm and dist #401

wants to merge 6 commits into from

Conversation

teslajoy
Copy link
Member

@teslajoy teslajoy commented May 8, 2024

No description provided.

@teslajoy teslajoy changed the title updated transform chembl tranfsorm and dist May 8, 2024
@teslajoy
Copy link
Member Author

teslajoy commented May 8, 2024

current code's transform.yml and dist_compute.plan passed on local machine over first 100 rows. Executing over all data on cloud.

@teslajoy
Copy link
Member Author

will add pubchem_id as identifier

SELECT 
    a.MOLREGNO,
    a.PREF_NAME,
    a.CHEMBL_ID,
    a.MAX_PHASE,
    a.STRUCTURE_TYPE,
    c.STANDARD_INCHI,
    c.STANDARD_INCHI_KEY,
    c.CANONICAL_SMILES,
    d.DOC_ID,
    d.PUBMED_ID,
    d.DOI,
    cr.SRC_ID,
    cr.SRC_COMPOUND_ID, 
    sr.SRC_SHORT_NAME, 
    sr.SRC_DESCRIPTION
FROM 
    MOLECULE_DICTIONARY as a
LEFT JOIN 
    COMPOUND_STRUCTURES as c ON a.MOLREGNO = c.MOLREGNO
LEFT JOIN 
    ACTIVITIES as p ON a.MOLREGNO = p.MOLREGNO
LEFT JOIN 
    DOCS as d ON p.DOC_ID = d.DOC_ID
LEFT JOIN 
    compound_records as cr ON a.MOLREGNO = cr.MOLREGNO
LEFT JOIN
    source as sr ON cr.SRC_ID = sr.SRC_ID;
if "PUBCHEM" in scr_short_name: add src_compound_id to identifier list value with src "https://pubchem.ncbi.nlm.nih.gov" 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant