Skip to content

Commit

Permalink
Merge pull request #34 from griffithlab/updates
Browse files Browse the repository at this point in the history
Start filling in the pVACview chapter
  • Loading branch information
susannasiebert authored Jul 23, 2023
2 parents df98f80 + 1dd47a5 commit e0deed3
Show file tree
Hide file tree
Showing 7 changed files with 317 additions and 32 deletions.
2 changes: 1 addition & 1 deletion 01-intro.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This course has been developed recently (Summer 2023). We welcome any feedback a
## Motivation

Identification of neoantigens is a critical step in predicting response to checkpoint blockade therapy and design of personalized cancer vaccines.
This is a cross-disciplinary challenge, involving genomics, proteomics, immunology, and computational approaches. We have built a computational
This is a cross-disciplinary challenge, which involves genomics, proteomics, immunology, and computational approaches. We have built a computational
framework called pVACtools that, when paired with a well-established genomics pipeline, produces an end-to-end solution for neoantigen characterization.
pVACtools supports identification of altered peptides from different mechanisms, including point mutations, in-frame and frameshift insertions and deletions,
and gene fusions. Prediction of peptide:MHC binding is accomplished by supporting an ensemble of MHC Class I and II binding algorithms within a framework
Expand Down
18 changes: 9 additions & 9 deletions 02-prerequisites.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,36 +71,36 @@ For this course, we have put together a set of input data generated from the bre
cancer cell line HCC1395 and a matched normal lymphoblastoid cell line HCC1395BL.
Data from this cell line is commonly used as test data in bioinformatics applications.
For more information on these lines and the generation of test data, please refer to
the data section of our precision medicine bioinformatics course:
[here](https://pmbio.org/module-02-inputs/0002/05/01/Data/).
the [data section of our precision medicine bioinformatics course](https://pmbio.org/module-02-inputs/0002/05/01/Data/).

The input data consists of the following files:

For pVACseq:

- `annotated.expression.vcf.gz`: A somatic (tumor-normal) VCF and its tbi index file. The VCF has been
annotated with VEP and has coverage and expression information added. It has also been annotated with
custom VEP plugins that provide wild type and mutant version of the full length protein sequences
custom VEP plugins that provide wild type and mutant versions of the full length protein sequences
predicted to arise from each transcript annotated with each variant.
- `phased.vcf.gz`: A phased tumor-germline VCF and its tbi index file to provide information about
in-phase proximal variants that might alter the predicted peptide sequence around a somatic
mutation of interest
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions
mutation of interest.
- `optitype_normal_result.tsv`: A OptiType file with HLA allele typing predictions.

For more detailed information on how the variant input file is created, please refer to the
[input file preparation](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep.html)
section of the pVACtools docs
section of the pVACtools docs.

For pVACfuse:

- `agfusion_results`: A AGFusion output directory with annotated fusion calls
- `agfusion_results`: An AGFusion output directory with annotated fusion
calls.
- `star-fusion.fusion_predictions.tsv`: A STARFusion prediction file with fusion read support
and expression information
and expression information.

General:

- `Homo_sapiens.GRCh38.pep.all.fa.gz`: A reference proteome peptide FASTA to use
for determining whether there are any reference matches of neoantigen candidates
for determining whether there are any reference matches of neoantigen candidates.

To download this data, please run the following commands:

Expand Down
22 changes: 11 additions & 11 deletions 04-outputs.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,9 @@ patient's RNA.

For pVACseq, this generally relies on your VCF being annotated with coverage
and expression data. In our example, the VCF has already been annotated with
this data. For more information about how to add coverage and expression data
to your own VCFs, please see [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
and [here](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html).
this data. For more information about how to add [coverage](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/readcounts.html)
and [expression data](https://pvactools.readthedocs.io/en/latest/pvacseq/input_file_prep/expression.html)
to your own VCFs, please see our docs.
Additionally, filtering on the normal DNA depth and variant allele frequency
(VAF) requires your VCF to be a tumor-normal sample VCF and the normal sample
to be identifies in your pVACseq run using the `--normal-sample-name`
Expand Down Expand Up @@ -130,7 +130,7 @@ The following thresholds are applied in pVACfuse by this filter:

### Transcript Support Level Filter

The Transcript Support Level (TSL) Filter, removes neoantigen candidates for
The Transcript Support Level (TSL) Filter removes neoantigen candidates for
transcripts with a high TSL, as defined [by Ensembl](https://grch37.ensembl.org/info/genome/genebuild/transcript_quality_tags.html#tsl).
The cutoff for this filter is set by the `--maximum-transcript-support-level`
parameter. Transcripts with a TSL of NA will always be filtered out.
Expand All @@ -147,16 +147,16 @@ The Top Score Filter will attempt to determine the best neoantigen candidate
for each variants.

For pVACseq it works as follows. Given a set of neoantigen candidates for a
variant we first group the transcripts into set where all transcripts in a set
variant we first group the transcripts into sets where all transcripts in a set
code for the same set of neoantigen candidates. For each transcript set we then
determine the best neoantigen candidate as follows:

- Pick all neoantigens with a variant transcript that have a protein_coding Biotype
- Of the remaining candidates, pick the ones with a variant transcript having a
TSL less then the `--maximum-transcript-support-level`.
- Of the remaining candidates, pick the entries with no Problematic Positions
- Of the remaining candidates, pick the entries with no Problematic Positions.
- Of the remaining candidates, pick the ones passing the Anchor Criteria (explained in
more detail further below)
more detail further below).
- Of the remaining candidates, pick the one with the lowest MT IC50 Score (Median or Best
depending on the `--top-score-metric`), lowest TSL, and longest transcript.

Expand Down Expand Up @@ -185,10 +185,10 @@ are included in creating this report.

In pVACseq, for each variant, all neoantigen candidates meeting the `--aggregate-inclusion-threshold` are evaluated as follows:

- Pick all entries with a variant transcript that have a protein_coding Biotype
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`
- Of the remaining entries, pick the entries with no Problematic Positions
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below)
- Pick all entries with a variant transcript that have a protein_coding Biotype.
- Of the remaining entries, pick the ones with a variant transcript having a Transcript Support Level <= `--maximum-transcript-support-level`.
- Of the remaining entries, pick the entries with no Problematic Positions.
- Of the remaining entries, pick the ones passing the Anchor Criteria (see Criteria Details section below).
- Of the remaining entries, pick the one with the lowest MT IC50 score( Median or Best
depending on the `--top-score-metric`), lowest Transcript Support Level, and longest transcript.

Expand Down
Loading

0 comments on commit e0deed3

Please sign in to comment.