Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle outdated Entrez gene entries #1065

Open
susannasiebert opened this issue Jun 6, 2024 · 1 comment
Open

Handle outdated Entrez gene entries #1065

susannasiebert opened this issue Jun 6, 2024 · 1 comment

Comments

@susannasiebert
Copy link
Contributor

susannasiebert commented Jun 6, 2024

Entrez sometimes moves genes to a different/new entrez ID: E.g. SMIM44 (entrez ID 122152363) has been replaced with entrez ID 122405565. We currently have both in the database but they should probably be handled in a way that the replaced gene can no longer be used.

We should have CIViCbot handle these - deprecate the gene feature entry and remove deprecated gene features from the typeahead. Add comment to the gene feature to explain why it was deprecated. If possible, try to note which gene feature it has been replaced by.

We should also flag variants (or evidence) under the gene feature that was deprecated.

@susannasiebert
Copy link
Contributor Author

susannasiebert commented Jul 2, 2024

Some more technical details:

  • The file to upload is app/jobs/update_entrez_symbols.rb which in turn calls out to app/lib/importer/entrez_symbols.rb. The first file notes the FTP path to the entrez file that we download and use for populating/updating the gene table. Entrez entries that are no longer used are no longer in that file so app/lib/importer/entrez_symbols.rb needs to keep track of civic gene entries that were processed and the ones that weren't processed after fully processing the file, need to be deprecated.
  • Deprecating a gene should use the existing DeprecateFeature activity (app/models/activities/deprecate_feature.rb) with the gene being passed along as the feature, civic bot as the deprecating_user, and the organization_id being nil. Civic bot is an actual user entry and the correct entry can be retrieved via the user ID constant (CIVICBOT_USER_ID) that is being kept in app/model/constants.rb. The note can be a sentence describing why the gene was deprecated. If possible, try to note which gene feature it has been replaced by, if we can figure out that part.
  • Entrez provides an API that may be able to help with finding the entrez gene that a deprecated gene was replaced by: e.g., https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=122152363 (where 122152363 is the entrez ID of the deprecated version of SMIM44) returns 122405565 as a current-id entry. So it looks like you could use this API to find out the entrez ID of the "active" gene and use that to in turn retrieve it's CIViC gene feature entry to mention in the activity note. #FID<civic_feature_idis the format we use to display feature entity tags in a notes and comment in the frontend. So the note could say something likeThe gene was replaced with Entrez gene 122405565 (#FID61361)`
  • Deprecation should happen even if there are linked un-deprecated variants to the gene. This is different to how we handle other feature types where we will simply auto-deprecate linked variants. With genes we will want to flag any un-deprecated variants that are linked to the gene. The simplest way to achieve this is to updated the DeprecateFeature activity and give it a mode (?) that is either flag_variants or deprecate_variants and have the activity do the appropriate action depending on the mode.
  • Additionally, with the info we got from the entrez API about the "active" gene, we could propose revisions to the linked variants to move the variant from deprecated gene a to active gene b (only needs to happen for variants that aren't already deprecated for a different reason). We will want to make this part of the existing DeprecateFeature activity but it's not currently set up to handle passing in a superseding feature so that would need to be added. This should be optional in the activity though since the same activity is used for deprecating other feature types though. It might also make sense to make a new DeprecateGene activity since - in combination with the variant handling mentioned above - there now is a bigger convergence between how gene deprecation and other feature deprecation is handled.

Once gene deprecation is enabled via the above there are a few additional things to do:

  • We need to update the frontend to make sure that deprecated genes are displayed correctly. In most places, this should already be handled because genes are really just a special type of features and most places handle everything as a generic feature. Features as a whole already support deprecation and have the appropriate handling done. On the top of my head, the only place that will need updating is the gene summary which will need to show the deprecated status and the additional information around deprecation. Have a look at the factor summary to see how we display it there and mirror it for genes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant