Skip to content

compgenomics2019/Team2-GenePrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Team2-GenePrediction

Developed By: Sachin Kumar, Mansi Gupta, Vineeth Aljapur, Manu Tej Sharma Arrojwala, Mingming Cao

Overview

This pipeline of gene prediction is part of outbreak detection project of Computational Genomics course. The goal of this pipeline is identify the coding regions and non-coding regions of the genome. To do this we tried to quantitatively evaluate state of the art tools for our purpose and come up with a pipeline which gives out the best results according to our analysis. Beware, our pipeline has been optimized to work best on the Salmonella enterica genomes.

This pipeline includes tools like Prodigal, GeneMarkS2 for identifying genes and tools like Aragon, Infernal and RNAmmer for identifying non-coding regions of RNA. You can find further information on the tools from the links provided below.

Requirements

We recommend installing these dependencies from the links provided.

We recommend using conda to install latest version of python and other python modules.\

Python3 To get python3
GeneValidator To validate the results
Prodigal To predict the genes
GeneMark-S2 To predict the genes
Glimmer To predict the genes
Blast To validate the results\

Perl To ger Perl
Aragorn: Aragorn for tRNA and tmRNA prediction
RNAmmer: RNAmmer for rRNA prediction
Infernal: Infernal for misc_RNA prediction
Further, for gff file operations, bedtools are highly recommended.

QuickStart

conda create --name gene_pred python=3.7
conda activate gene_pred
git clone https://github.gatech.edu/compgenomics2019/Team2-GenePrediction
chmod 755 Team2-GenePrediction/gene_prediction.py
export PATH=$PWD/Team2-GenePrediction:$PATH

gene_prediction.py -h


# Usage
# gene_prediction.py -i Input [-h] [-f Format] [-g] [-q] [-v]
# Required Arguments:
# 	-i 		--Input			Input folder containing genome assemblies
# Optional Arguments:
# 	-h 		--help			echos help message and exits
#	 -f 		--Format			Output format (gff, gbk, sqn, sco)
#	 -g 		--genemark			To inculde GeneMark-S2 results
#	 -q 		--quiet			To supress text on terminal
# 	-v 		--verbose			To display running commands

If you have a folder named 'assemblies' containing all fasta files, you can run the above pipeline as described in the following example.

# check the contents of the file
ls assemblies
# CGT2006_contigs.fasta
# CGT2010_contigs.fasta
# CGT2044_contigs.fasta
# CGT2049_contigs.fasta
# CGT2060_contigs.fasta

gene_prediction.py -i assemblies -q

# check the generated output
ls output
# CGT2006_contigs_final.gff
# CGT2010_contigs_final.gff
# CGT2044_contigs_final.gff
# CGT2049_contigs_final.gff
# CGT2060_contigs_final.gff
# out_prod
# out_rna

ls output/out_prod
# CGT2006_contigs.gff
# CGT2010_contigs.gff
# CGT2044_contigs.gff
# CGT2049_contigs.gff
# CGT2060_contigs.gff
# nucl
# prot

ls output/out_rna
# aragorn
# CGT2006_contigs.rna_merge.gff
# CGT2010_contigs.rna_merge.gff
# CGT2044_contigs.rna_merge.gff
# CGT2049_contigs.rna_merge.gff
# CGT2060_contigs.rna_merge.gff
# infernal
# rnammer

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published