Skip to content
Lucas Czech edited this page Jul 19, 2024 · 27 revisions

grenepipe logo

Grenepipe Wiki

Get started with grenepipe with either

and see the "Pages" bar on the right for the available documentation.

Layer Cake

Before running grenepipe, we recommend to get basic familiarity with the involved tools. Unfortunately, bioinformatics is a big stack of layered abstractions, tools built on top of other tools. As we cannot re-invent the wheel here, grenepipe is no exception from this. The Setup and Usage page will walk you through all steps to get started, but it will not explain what Snakemake or conda are, for example. Please use your favorite search engine to familiarize yourself with those as needed.

In short, the stack looks as follows:

  • grenepipe is a pipeline (or workflow) for variant calling from raw sample sequences. The bulk of the underlying grenepipe code describes which tools to run, in which order, with which settings, and which intermediate steps to make the tools work together.
  • Snakemake is a workflow management system. Its job is to take the workflow description of grenepipe, install the actual tools to be run, execute all steps in the correct order, run them on the cluster (or locally), detect when steps failed, and more.
  • conda (and its siblings miniconda, anaconda, mamba, micromamba, etc) is a package management system. It manages software tools in separate environments, to avoid dependency conflicts between tools. It is used by Snakemake to install the tools that we want to run, and is also where we get Snakemake from itself.
  • At the bottom of the stack are the actual tools we want in the pipeline (bwa, samtools, GATK, etc). These are installed automatically via conda by Snakemake in the versions required by grenepipe.

We also recommend to read the Snakemake documentation, in particular the Snakemake tutorial, or the Short tutorial first, at least for the parts describing the general setup and workflow of Snakemake pipelines. A basic understanding of how Snakemake works will help you in the (likely) case that some analysis step has an issue that needs to be figured out (some tool complaining about the data, some task not finishing, your computer cluster having a hiccup, etc).

Citation

When using grenepipe, please cite:

grenepipe: A flexible, scalable, and reproducible pipeline
to automate variant calling from sequence reads.

Lucas Czech and Moises Exposito-Alonso. Bioinformatics. 2022.
doi:10.1093/bioinformatics/btac600 [pdf]

Furthermore, please do not forget to cite all tools that you selected to be run for your analysis. See Citation and References for their references.

Clone this wiki locally