Skip to content

VEuPathDB/stable_identifier_authority

Repository files navigation

The assignment and tracking of stable identifiers in the VEupathDB project.

Project description

All new gene, transcript and translation stable identifiers must be generated by the OSID webservices and changes tracked in the production session service database.

Features

When gene models are created the must be assigned a stable identifier and if the new gene model arise from a split or merge of exsisting gene models a history of the stable identifiers must be kept so that the new gene models can be tracked back to the original gene models. Gene models that was edited should not have new stable IDs but the occurrence must be recorded in the history file to track version.

Patchbuilds

The allocation service can extraxt gene model event information from the output of gene_model_diff and using OSID update a corresponding GFF file with new stable IDs, in addition it will write the stable IDs history to a flat file. The OSID webservice is responsable for generating new stable IDs. The session service databse is responsable for recording which pipelines created and deleted stable IDs in the core databases. The event_input and event_output modules can be extented to include gene model changes from other pipelines.

New Organisms

The allocation service can assign new stable IDs to a GFF file. It is possible to define the biotype i.e gene,ncRNA and a subset of features to update. The config file has two parametes 'allowed_bio_types' and 'allowed_gene_models' each taking a file. The biotype file must contain one biotype per line. The name of the biotype must be the same as in field 2 of the GFF. The allowed_gene_models file must contain one top level ID per line i.e. only gene ids not mRNA and CDS. Handy command to get IDs: cat file.gff | perl -e 'while($line=){$line=~/ID=(.+?);/; print $1 . "\n"}'

Usage

To setup the allocation service each organsim needs a allocation_pipeline.conf. The session_service.conf needs connection infromation for the session service database.

To run the software:

python3 run_allocation_pipeline.py

python3 run_new_organism_allocation.py

  • schema: The database schema for the session service database.