griffithlab · susannasiebert · Jul 21, 2023 · Jun 29, 2023 · Jun 29, 2023 · Jun 29, 2023
diff --git a/01-intro.Rmd b/01-intro.Rmd
@@ -5,21 +5,60 @@ ottrpal::set_knitr_image_path()
 
 # Introduction
 
-This course is currently under development. The topics to be covered are outlined below.
+This course has been developed recently (Summer 2023). We welcome any feedback at help@pvactools.org or by submission of [GitHub issues](https://github.com/griffithlab/pVACtools_Intro_Course/issues).
 
 ## Motivation
 
-Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines. This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization. pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions, and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant allele expression, peptide binding affinities, and determination whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows clinical users to efficiently generate, review, and interpret results, selecting candidate peptides for individual patient vaccine designs. Additional modules support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq and pVACfuse), prioritization, and selection using a graphical Web-based interface (pVACviz), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide vaccines. pVACtools is available at [https://www.pvactools.org](https://www.pvactools.org).
+Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines.
+This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
-This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
+This is a cross-disciplinary challenge involving genomics, proteomics, immunology, and computational approaches. We have built a computational
-This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
+This is a cross-disciplinary challenge involving genomics, proteomics, immunology, and computational approaches. We have built a computational
+framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization.
+pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions,
+and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework
+designed to facilitate the incorporation of additional algorithms. Prioritization of predicted peptides occurs by integrating diverse data, including mutant
+allele expression, peptide binding affinities, and determination of whether a mutation is clonal or subclonal. Interactive visualization via a Web interface allows
+users to efficiently generate, review, and interpret results, selecting candidate peptides for individual experiments or patient vaccine designs. Additional modules
+support design choices needed for competing vaccine delivery approaches. One such module optimizes peptide ordering to minimize junctional epitopes in DNA vector
+vaccines. Downstream analysis commands for synthetic long peptide vaccines are available to assess candidates for factors that influence peptide synthesis. All
+of the aforementioned steps are executed via a modular workflow consisting of tools for neoantigen prediction from somatic alterations (pVACseq, pVACfuse, and pVACbind),
+prioritization, and selection using a graphical Web-based interface (pVACview), and design of DNA vector–based vaccines (pVACvector) and synthetic long peptide
+vaccines. pVACtools is available at [http://www.pvactools.org](http://www.pvactools.org).
 
 ```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "pVACtools is a cancer immunotherapy tools suite"}
-ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g2491f283519_0_0")
+ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g22b1533a196_0_0")
 ```
 
-## Target Audience  
+## Background
 
-The course is intended for anyone seeking a better understanding of current best practices in cancer vaccine design and neoantigen prioritization using pVACtools. It assumes that the learner is familiar with basic biology, genetics and immunology concepts. 
+Neoantigens are unique peptide sequences generated from mutations acquired somatically in tumor cells. These antigens provide an avenue for tumor-specific immune
+cell recognition and have been found to be important targets for cancer immunotherapies [@Keskin2018; @Ott2017; @Hilf2018]. Effective neoantigens, presented by the
+major histocompatibility complex (MHC) and thus introduced to the patient’s immune system, can prime and activate CD8+ and CD4+ T cells for downstream signaling of
+cell-death. Patients with high tumor mutation burden tend to have stronger responses to neoantigen based immunotherapy treatments [@Brown2014; @Rizvi2015; @Schumacher2015].
+DNA and RNA sequencing technologies allow researchers and clinicians to computationally predict potential neoantigens based on tumor-specific mutations.
 
-## Curriculum  
+However, neoantigen generation and presentation is complex, and a host of factors must be evaluated by complex analyses to characterize each potential neoantigen.
+These include but are not limited to: somatic variant identification, tumor clonality assessment, RNA expression estimation, mRNA isoform selection, inference of
+translated tumor specific peptides that arise from the somatic variant, and prediction of peptide processing, peptide transportation, peptide-MHC binding, peptide-MHC
+stability and recognition by cytotoxic T cells [@Richters2019].
+
+pVACtools can be used as the final step in a well-established variant calling pipeline. It leverages existing tools with functionality related to variant annotation
+(Ensembl VEP [@McLaren2016]), identifying neoantigens from specific sources (e.g. fusions via star-fusion [@Haas2019], AGFusion [@Murphy2016], and Arriba [@Uhrig2021]),
+HLA typing (OptiType [@Szolek2014], PHLAT [@Bai2018]), peptide-MHC binding prediction (IEDB [@Vita2018], NetMHCpan [@Jurtz2017], MHCflurry [@ODonnell2018],
+MHCnuggets [@Shao2020]), peptide-MHC stability (NetMHCstabpan [@Rasmussen2016]], peptide processing (NetChop [@Nielsen2005]), manufacturability
+metrics (vaxrank [@Rubinsteyn2017]), and reference proteome similarity (BLAST [@Altschul1990]). Each of these tools tackles specific tasks within the broader goal of
+antigen analysis and is utilized by pVACtools to provide an end-to-end integration of novel algorithms and established tools needed to discover, characterize, prioritize,
+and utilize tumor-specific neoantigens in basic research and clinical applications. Combining pVACtools with existing variant calling pipelines provides an end-to-end
+solution for neoantigen prediction and characterization.
+
+```{r, fig.align='center', out.width="100%", echo = FALSE, fig.alt= "Tumor neoantigen background"}
+ottrpal::include_slide("https://docs.google.com/presentation/d/1uz39zaObDGKhEVCGzO0JO35CTbC0oRAM0mxgLcMAA9Y/edit#slide=id.g22b1533a196_0_6")
+```
+
+## Target Audience
+
+The course is intended for anyone seeking a better understanding of current best practices in neoantigen identification and prioritization using pVACtools.
+It assumes that the learner is familiar with basic biology, genetics and immunology concepts.
+
+## Curriculum
 
 This course will teach learners to:
 

diff --git a/02-prerequisites.Rmd b/02-prerequisites.Rmd
@@ -0,0 +1,115 @@
+
+# Prerequisites
+
+```{r, include = FALSE}
+ottrpal::set_knitr_image_path()
+```
+
+## Learning Objectives
+
+This chapter will cover the prerequisites for this course, including:
+
+- Installing Docker
+- Installing R Studio
+- Downloading data files
+
+## Docker
+
+For the purpose of this course, we will be using Docker to run pVACseq and
+pVACfuse.
+Docker is a tool that is used to automate the deployment of applications
+in lightweight containers so that applications can work efficiently in
+different environments in isolation. We provide versioned Docker containers
+for all pVACtools [releases](https://github.com/griffithlab/pVACtools/releases) 
+via [Docker Hub using the griffithlab/pvactools image name](https://hub.docker.com/r/griffithlab/pvactools).
+
+In order to use Docker, you will to download the [Docker Desktop software](https://www.docker.com/get-started/).
+Please ensure you select the correct install package for your operating
+system.
+
+## Terminal
+
+We will be running Docker from the command line on your preferred terminal
+using the Docker command line interface (CLI). The Docker CLI is already
+included with Docker Desktop. Most operating systems already
+come with a Terminal application. If yours doesn't, you will need to first
+install one.
+
+## R Studio and R package dependencies
+
+In order to use pVACview, you will need to download R. Please refer
+[here](https://cran.rstudio.com/) for downloading R (version 3.5 and above
+required). You may also take the additional step of [downloading R
+studio](https://www.rstudio.com/products/rstudio/download/) if
+you are not familiar with launching R Shiny from the command line.
+
+Additionally, there are a number of packages you will need to install in your R/R studio:
+
+```{r, eval = FALSE}
+install.packages("shiny", dependencies=TRUE)
+install.packages("ggplot2", dependencies=TRUE)
+install.packages("DT", dependencies=TRUE)
+install.packages("reshape2", dependencies=TRUE)
+install.packages("jsonlite", dependencies=TRUE)
+install.packages("tibble", dependencies=TRUE)
+install.packages("tidyr", dependencies=TRUE)
+install.packages("plyr", dependencies=TRUE)
+install.packages("dplyr", dependencies=TRUE)
+install.packages("shinydashboard", dependencies=TRUE)
+install.packages("shinydashboardPlus", dependencies=TRUE)
+install.packages("fresh", dependencies=TRUE)
+install.packages("shinycssloaders", dependencies=TRUE)
+install.packages("RCurl", dependencies=TRUE)
+install.packages("curl", dependencies=TRUE)
+install.packages("string", dependencies=TRUE)
+install.packages("shinycssloaders", dependencies=TRUE)
+```
+
+## Data
+
+For this course, we have put together a set of input data generated from the breast 
+cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
+Data from this cell line is commonly used as test data in bioinformatics applications. 
+For more information on these lines and the generation of test data, please refer to 
+the data section of our precision medicine bioinformatics course: 
+[here](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
+
+The input data consists of the following files:
+
+For pVACseq:
+
+- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
+  annotated with VEP and has coverage and expression information added. It has also been annotated with 
+  custom VEP plugins that provide wild type and mutant version of the full length protein sequences 
-  custom VEP plugins that provide wild type and mutant version of the full length protein sequences 
+  custom VEP plugins that provide wild type and mutant versions of the full length protein sequences 
-  custom VEP plugins that provide wild type and mutant version of the full length protein sequences 
+  custom VEP plugins that provide wild type and mutant versions of the full length protein sequences 
+  predicted to arise from each transcript annotated with each variant.
+- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
+  in-phase proximal variants that might alter the predicted peptide sequence around a somatic
+  mutation of interest
+- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions
+
+For more detailed information on how the variant input file is created, please refer to the
+[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html) 
+section of the pVACtools docs
-section of the pVACtools docs
+section of the pVACtools docs.
-section of the pVACtools docs
+section of the pVACtools docs.
+
+For pVACfuse:
+
+- `agfusion_results`: A AGFusion output directory with annotated fusion calls
- `agfusion_results`: A AGFusion output directory with annotated fusion calls
+- `agfusion_results`: An AGFusion output directory with annotated fusion calls
- `agfusion_results`: A AGFusion output directory with annotated fusion calls
+- `agfusion_results`: An AGFusion output directory with annotated fusion calls
+- `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support
+  and expression information
+
+General:
+
+- `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
+  for determining whether there are any reference matches of neoantigen candidates
+
+To download this data, please run the following commands:
+
+```{r, engine = 'bash', eval = FALSE}
+wget https://raw.githubusercontent.com/griffithlab/pVACtools_Intro_Course/main/HCC1395_inputs.zip
+unzip HCC1395_inputs.zip
+```
+
+This course will not cover the required pre-processing steps for the pVACtools
+input data but extensive instructions on how to prepare your own data for use
-input data but extensive instructions on how to prepare your own data for use
+input data, but extensive instructions on how to prepare your own data for use
-input data but extensive instructions on how to prepare your own data for use
+input data, but extensive instructions on how to prepare your own data for use
+with pVACtools can be found at [pvactools.org](http://www.pvactools.org).
+
diff --git a/02-running_pvactools.Rmd b/02-running_pvactools.Rmd