Skip to content

reorientateCircGenomes is an R package for the processing, reorientation and visualization of files with genomic information (gff and fna) for circular genomes.

Notifications You must be signed in to change notification settings

SonjaElena/reorientateCircGenomes

Repository files navigation


reorientateCircGenomes

image-01 reorientateCircGenomes is an R package that allows the processing of gff (genomic feature format) file types that have been obtained from NCBI or by Prokka, as well as the reorientation of these and fna (nucleic acid fasta) files based on proteinID or base pair position. With this package, gff and fna files can be used to generate a visualization of the circular genome including GC skew and indication of selected gene locations.

Dependencies

Installation

# Install reorientateCircGenomes from GitHub:
install.packages("devtools")
devtools::install_github("SonjaElena/reorientateCircGenomes")

Usage

Process a NCBI derived gff file

Process a gff file provided by NCBI into a data.frame.

processNCBIgff(path)
  • path Path to the unprocessed gff file generated downloaded from NCBI.

Return: The processed gff file as data.frame is returned including an additional column containing the alternative end, used for function circGenomePlot. The alternative end eliminates overlaps between genes.

Examples:

gff_unprocessed <- path_to_gff
gff <- processNCBIgff(gff_unprocessed)

Process a Prokka derived gff file

Process a gff file provided by Prokka into a data.frame.

processProkkagff(gff_object)
  • gff_object Path to the unprocessed gff file generated by Prokka.

Return: The processed gff file as data.frame is returned including an additional column containing the alternative end, used for function circGenomePlot. The alternative end eliminates overlaps between genes.

Examples:

gff_unprocessed <- path_to_gff
gff <- processProkkagff(gff_unprocessed)

Reorientate a gff file type

Reorientation of the start position of a gff file type based on ProteinID or base pair position. New start and end locations are added in two additional columns called 'Ostart' and 'Oend'.

reorientgff(gff_object, proteinID = NA, bplocation = bp_location, replicon = NA, Rep_size = fasta)
  • gff_object A gff file, processed with functions processProkkagff or processNCBIgff.
  • proteinID Supplies the start position at which the file should be reoriented; Defaults to NA. If the ProteinID is not found on the biggest replicon, the replicon must be supplied as well.
  • bplocation Supplies the start position at which the file should be reoriented in base pairs and needs to be identical with the start position of one of the proteins.
  • replicon Replicon to be reoriented. This option defaults to the largest replicon.
  • Rep_size Indicats the size of the Replicon to be used in base pairs. Alternatively, a genomic fasta sequence in fna format can be supplied.

Return: The reoriented gff file with three additional columns called 'Ostart' and 'Oend' containing the adjusted start and end base pair locations as well as a column called 'Oaltend' containing the reoriented end position based on the alternative end column, in case the function 'processNCBIgff' had been used beforehand.

Examples:

gff <- reorientgff(gff, "WP_012176686.1")
gff <- reorientgff(gff, bplocation = 1866, replicon = "CP000031.2")
gff <- reorientgff(gff, proteinID = "AAV97145.1", replicon = "CP000032.1")

# when no reorientation is required but the file should be used for the circular plot afterwards
gff <- reorientgff(gff, bplocation = 0, Rep_size = fna_path)

Reorientate a genomic fasta file

Adjust the start position of a fna file downloaded from NCBI based on base pair location of the proteinID. Reorientation based on proteinID requires the supply of a gff file.

reorientfna(fasta_object, replicon = NA, bplocation = NA, proteinID = NA, gff = NA)
  • fasta_object DNAStringSet of nucleotide sequences in fasta formate.

  • replicon Replicon to be reoriented. This option defaults to the largest replicon.

  • bplocation Location in base pairs that should be used as new start position.

  • proteinID ProteinID indicating the protein based on which the file should be reoriented. Must be accompanied by a processed gff file and either be located on the largest replicon or also be accompanied by an indication of the replicon to be used.

  • gff A processed gff file (e.g. using functions processProkkagff or processNCBIgff). Must be supplied when option ProteinID is selected.

Return: The reoriented DNA string set.

Examples:

fasta <- reorientfna(fasta_object = dna_list, proteinID = "AAV93333.1", gff = gff3)

Generation of a circular genomic plot

Generates a genomic plot indicating locations of regulators and showing the GC skew, based on gff file. Only one replicon should be supplied.

circGenomePlot(fasta_object, gff = gff, proteinID = proteinID, reorigff = FALSE)
  • fasta_object DNAStringSet of nucleotide sequences in fasta formate.

  • gff Processed gff file, has to be output of either functions processProkkagff or processNCBIgff, since a column with alternative end is generated that is used in this function. This file can have been reoriented afterwards with function reorientgff.

  • proteinID Vector of ProteinIDs to be indicated in the plot.

  • reorientgff If TRUE uses columns with names 'Ostart' and 'Oend' to obtain the base pair location that were generated using the reorientation functions above. Defaults to FALSE and uses columns with names 'start' and 'end'.

Return: List containing the circular plot and data.frame of regulator location. The genome plot consists of four rings. The outer ring shows the position of the provided genes (black) and the location of the first gene (red). The third and second ring each show the genes located on the plus and minus strand. The inner ring shows the GC skew. Whereby locations with negative and positive GC skew values are color coded with light or dark gray, respectively. A sliding window of 10,000 bp is used for the GC skew.

Examples:

vect <- c("AAV93333.1", "AAV93335.1")
plot <- circGenomePlot(fasta, gff3, vect, reorigff = TRUE)

About

reorientateCircGenomes is an R package for the processing, reorientation and visualization of files with genomic information (gff and fna) for circular genomes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages