Skip to content

ASaiM/group_humann2_uniref_abundances_to_GO

Repository files navigation

Group abundances of UniRef50 gene families obtained with HUMAnN2 to Gene Ontology (GO) slim terms with relative abundances

DOI Build Status bioconda-badge

Introduction

HUMAnN2 is a pipeline to profile the presence/absence and abundance of microbial pathways in community of microbiota sequencing data. One output is a file with UniRef50 gene family abundances. HUMAnN2 proposes a script to regroup Uniref50 to GO, but used GO terms are too precise to get a broad overview of the ontology content.

The tool described here contains scripts to group UniRef50 abundances obtained using main HUMANn2 script (Gene families) to GO slim terms. GO slim is a subset of the terms in the whole GO. For this tool, metagenomics GO slim terms developed by Jane Lomax and the InterPro group.

Script in this tool calls:

Installation

Using conda

$ conda install -c bioconda group_humann2_uniref_abundances_to_GO

It will manage installation of all dependencies.

Using code source

Get the code

Clone the repository:

$ git clone https://github.com/ASaiM/group_humann2_uniref_abundances_to_GO.git
$ cd group_humann2_uniref_abundances_to_GO

Install the requirements

This tool needs:

  • Git
  • Mercurial
  • VirtualEnv
  • Python with pip

Once these tools installed, you can run:

$ install_dependencies.sh

This script will launch a virtual environment and install:

$ pip install -r requirements.txt
$ git clone https://github.com/tanghaibao/goatools.git

Using Galaxy

A wrapper was also developed and is available on Galaxy ToolShed. It can be installed on any Galaxy instance.

Usage

$ ./group_humann2_uniref_abundances_to_GO.sh [OPTIONS] \ 
     -i humann2_gene_families_abundance \
     -m molecular_function_abundance \
     -b biological_process_abundance \
     -c cellular_component_abundance

To get more information about options:

$ ./group_humann2_uniref_abundances_to_GO.sh -h

Tests

This tool is tested at each change of the GitHub repository using Travis CI.

Build Status

In these tests, dependencies are installed and group_humann2_uniref_abundances_to_GO.sh is run on test data available in test-data directory:

  • A file with UniRef50 gene family abundances from HUMAnN2 (computed on gut microbiota data of lean women): humann2_gene_families.csv
  • A file with basic Gene Ontology, downloaded on 02/22/2016: go_02_22_2016.obo
  • A file with metagenomic slim Gene Ontology, downloaded on 02/22/2016: goslim_metagenomics_02_22_2016.obo
  • A file with humann2 correspondance between Uniref50 and GO, downloaded on 02/22/2016: map_infogo1000_uniref50_02_22_2016.txt

Generated outputs are compared to expected ones:

  • expected_molecular_function_abundances.txt with expected abundance of GO related to molecular functions
  • expected_biological_process_abundances.txt with expected abundance of GO related to biological processes
  • expected_cellular_component_abundances.txt with expected abundance of GO related to cellular components

You can check .travis.yml file for more information.

License

This tool is released under Apache 2 License. See the LICENSE file for details.

Citation

To cite this tool, a DOI is generated for each release using Zenodo.

Last release DOI and corresponding bibtex export:

@misc{berenice_batut_2016_50086,
  author       = {Bérénice Batut},
  title        = {{Group abundances of UniRef50 gene families 
                   obtained with HUMAnN2 to Gene Ontology (GO) slim
                   terms with relative abundances: release v1.2.0}},
  month        = apr,
  year         = 2016,
  doi          = {10.5281/zenodo.50086},
  url          = {http://dx.doi.org/10.5281/zenodo.50086}
}