Incorporate @tmooney's suggestions from #31

griffithlab · Jul 21, 2023 · 0830b65 · 0830b65
1 parent 72728e7
commit 0830b65
Show file tree

Hide file tree

Showing 3 changed files with 21 additions and 21 deletions.
diff --git a/01-intro.Rmd b/01-intro.Rmd
@@ -10,7 +10,7 @@ This course has been developed recently (Summer 2023). We welcome any feedback a
 ## Motivation
 
 Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines.
-This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
+This is a cross-disciplinary challenge, which involves genomics, proteomics, immunology, and computational approaches. We have built a computational
 framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization.
 pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions,
 and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework

diff --git a/02-prerequisites.Rmd b/02-prerequisites.Rmd
@@ -71,36 +71,36 @@ For this course, we have put together a set of input data generated from the bre
 cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
 Data from this cell line is commonly used as test data in bioinformatics applications. 
 For more information on these lines and the generation of test data, please refer to 
-the data section of our precision medicine bioinformatics course: 
-[here](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
+the [data section of our precision medicine bioinformatics course](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
 
 The input data consists of the following files:
 
 For pVACseq:
 
 - `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
   annotated with VEP and has coverage and expression information added. It has also been annotated with 
-  custom VEP plugins that provide wild type and mutant version of the full length protein sequences 
+  custom VEP plugins that provide wild type and mutant versions of the full length protein sequences 
   predicted to arise from each transcript annotated with each variant.
 - `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
   in-phase proximal variants that might alter the predicted peptide sequence around a somatic
-  mutation of interest
-- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions
+  mutation of interest.
+- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions.
 
 For more detailed information on how the variant input file is created, please refer to the
 [input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html) 
-section of the pVACtools docs
+section of the pVACtools docs.
 
 For pVACfuse:
 
-- `agfusion_results`: A AGFusion output directory with annotated fusion calls
+- `agfusion_results`: An AGFusion output directory with annotated fusion
+  calls.
 - `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support
-  and expression information
+  and expression information.
 
 General:
 
 - `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
-  for determining whether there are any reference matches of neoantigen candidates
+  for determining whether there are any reference matches of neoantigen candidates.
 
 To download this data, please run the following commands:
 

diff --git a/04-outputs.Rmd b/04-outputs.Rmd
@@ -95,9 +95,9 @@ patient's RNA.
 
 For pVACseq, this generally relies on your VCF being annotated with coverage
 and expression data. In our example, the VCF has already been annotated with
-this data. For more information about how to add coverage and expression data
-to your own VCFs, please see [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
-and [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html).
+this data. For more information about how to add [coverage](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
+and [expression data](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html)
+to your own VCFs, please see our docs.
 Additionally, filtering on the normal DNA depth and variant allele frequency
 (VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
 to be identifies in your pVACseq run using the `--normal-sample-name`
@@ -130,7 +130,7 @@ The following thresholds are applied in pVACfuse by this filter:
 
 ### Transcript Support Level Filter
 
-The Transcript Support Level (TSL) Filter, removes neoantigen candidates for
+The Transcript Support Level (TSL) Filter removes neoantigen candidates for
 transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl).
 The cutoff for this filter is set by the `--maximum-transcript-support-level`
 parameter. Transcripts with a TSL of NA will always be filtered out.
@@ -147,16 +147,16 @@ The Top Score Filter will attempt to determine the best neoantigen candidate
 for each variants.
 
 For pVACseq it works as follows. Given a set of neoantigen candidates for a
-variant we first group the transcripts into set where all transcripts in a set
+variant we first group the transcripts into sets where all transcripts in a set
 code for the same set of neoantigen candidates. For each transcript set we then
 determine the best neoantigen candidate as follows:
 
 - Pick all neoantigens with a variant transcript that have a protein_coding Biotype
 - Of the remaining candidates, pick the ones with a variant transcript having a
   TSL less then the `--maximum-transcript-support-level`.
-- Of the remaining candidates, pick the entries with no Problematic Positions
+- Of the remaining candidates, pick the entries with no Problematic Positions.
 - Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
-  more detail further below)
+  more detail further below).
 - Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
   depending on the `--top-score-metric`), lowest TSL, and longest transcript.
 
@@ -185,10 +185,10 @@ are included in creating this report.
 
 In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows:
 
-- Pick all entries with a variant transcript that have a protein_coding Biotype
-- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`
-- Of the remaining entries, pick the entries with no Problematic Positions
-- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
+- Pick all entries with a variant transcript that have a protein_coding Biotype.
+- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`.
+- Of the remaining entries, pick the entries with no Problematic Positions.
+- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below).
 - Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
   depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript.