Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reference paper for the reproduction number calculation #151

Closed
wants to merge 6 commits into from

Conversation

avallecam
Copy link
Member

@avallecam avallecam commented Jul 2, 2024

  • Please check if the PR fulfills these requirements
  • I have read the CONTRIBUTING guidelines
  • A new item has been added to NEWS.md
  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)
  • Checks have been run locally and pass
  • What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

Fix #139

  • What is the current behavior? (You can also link to an open issue here)

README paragraph Quick start section relates contact distribution and probability of infection for the Ro calculation. We can add a reference to this relationship.

  • What is the new behavior (if this is a feature change)?

This proposes to add complementary text to the README to refer to the equation relating contact_distribution, prob_infection and infectious_period elements to calculate Ro, using as reference a paper cited in {epidemics} https://www.nature.com/articles/s41592-020-0822-z

  • Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

  • Other information:

This still requires adding the reference to the paper: https://www.nature.com/articles/s41592-020-0822-z to make @bjornstad2020a work. This can reuse content in https://github.com/epiverse-trace/epidemics/

I wanted to include the infectious_period object to the Warning: ... paragraph, but not sure how to phrase it or if it would be appropriate given the emphasis on contact_distribution and prob_infection. So open for your assessment.

@avallecam
Copy link
Member Author

avallecam commented Jul 2, 2024

@joshwlambert few points

This still requires adding the reference to the paper: https://www.nature.com/articles/s41592-020-0822-z to make @bjornstad2020a work. This can reuse content in https://github.com/epiverse-trace/epidemics/

would you be able to do this? Willing to see how you are storing references.

I wanted to include the infectious_period object to the Warning: ... paragraph, but not sure how to phrase it or if it would be appropriate given the emphasis on contact_distribution and prob_infection. So open for your assessment.

I am open to discussing this further. I am able to add a line that suits the best if appropriate.

@joshwlambert
Copy link
Member

@avallecam I'm happy to add the reference to the bibliography.

Please feel free to add information to the "Warning..." paragraph if you think it will be informative for users. It's worth adding it as we can always revert to the current commit if we decide not to include it.

I'll let you add the text to the README first and then I'll add the reference.

@avallecam
Copy link
Member Author

hi @joshwlambert, welcome back, and thank you for your patience! I just finished the paragraph content.

This is ready for your review and to add the reference paper for @bjornstad2020 https://www.nature.com/articles/s41592-020-0822-z

Copy link
Member

@joshwlambert joshwlambert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @avallecam. It's always good to reference epi literature/theory in the packages.

However, in looking through this again I'm not uncertain about whether the descriptions of Bjornstad et al. directly correspond to the individual-based simulation used in {simulist} (see in-text comments).

@jamesmbaazam given your experience with epidemic modelling and branching processes, would you mind taking a look at this PR and letting us know whether the descriptions added to the README and the code are correct?

@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db(
)
```

To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As outlined in @bjornstad2020a, the contact rate ($k$)

I'm now wondering if in fact the sim_linelist() model is not parameterised slightly differently from the description given here and in Bjornstad et al. Related to the above comment, the contact_distribution is not a rate and therefore I'm don't think it is equivalent to $k$.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the fact that $k$ is an average and the infectious_period is uniformly sampled in {simulist} mean that both processes give the same result (making an assumption about large sample sizes).

Copy link
Member

@jamesmbaazam jamesmbaazam Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

The way the simulations are set up here assumes the number of contacts an individual has depends on time, i.e., generations of infections. In other words, it depends on how many generations the chain survives.

@jamesmbaazam
Copy link
Member

Thanks for tagging. I'll take a closer look tomorrow.

@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db(
)
```

To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking that this sentence may clarify the difference between the method (branching process) and the relationship between the variables (from a SIR model)

Suggested change
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).

Copy link
Member

@jamesmbaazam jamesmbaazam Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals.

EDIT:

I think it's fine to just say it simulates infections from a simple branching process as mentioning SIR models here might distract from the main point of telling them what the underlying engine is.

@jamesmbaazam
Copy link
Member

jamesmbaazam commented Aug 8, 2024

Thanks for the contribution @avallecam. It's always good to reference epi literature/theory in the packages.

However, in looking through this again I'm not uncertain about whether the descriptions of Bjornstad et al. directly correspond to the individual-based simulation used in {simulist} (see in-text comments).

@jamesmbaazam given your experience with epidemic modelling and branching processes, would you mind taking a look at this PR and letting us know whether the descriptions added to the README and the code are correct?

Here is a quick answer that I will update tomorrow.

The transmission process in the SIR model described in Bjornstad's paper is not always the same as the result of the branching process defined here unless their assumptions match. The main difference here is that the SIR model assumes an exponentially distributed infectious period while the model here assumes an arbitrary infectious period (as far as I can tell from a quick look). The branching process also accounts for individual-level transmission heterogeneity whereas it is the same per capita in the SIR model.

A really good reference for this that I can think of is Mathematical models of infectious disease transmission by Nicholas C. Grassly and Christophe Fraser. The section on "offspring" distribution is really insightful. You can reference equation 1, which states the expected number of offspring (here, an entry in the linelist) in terms of the infectious period and probability of susceptibility.

@joshwlambert
Copy link
Member

@jamesmbaazam and @avallecam thank you both for inputing on this.

Given that we agree there are fundamental differences between the branching process and compartmental models, succinctly wording an explanation in README will be difficult, and we want to avoid confusing users. Instead I propose we close this PR without merging the changes, and instead aim to add a model description vignette as outlined in epiverse-trace/discussions#83. What do you both think?

This vignette can contain the main parts of this PR:

  • citing literature on epidemic modelling
  • making clear the branching process implementation in {simulist}
  • comparing branching process to compartmental models
  • providing a more mathematical explanation of the simulation

We can then add a line to the README that links to this vignette for users that would like a more detailed description of the model.

@jamesmbaazam
Copy link
Member

jamesmbaazam commented Aug 12, 2024

I'm fine with that.

There is quite a good amount of theory and mathematical formulations in epichains in the helpfile of simulate_chains(), the theory vignette, and the literature vignette. So, feel free to use any of that.

@jamesmbaazam and @avallecam thank you both for inputing on this.

Given that we agree there are fundamental differences between the branching process and compartmental models, succinctly wording an explanation in README will be difficult, and we want to avoid confusing users. Instead I propose we close this PR without merging the changes, and instead aim to add a model description vignette as outlined in epiverse-trace/discussions#83. What do you both think?

This vignette can contain the main parts of this PR:

  • citing literature on epidemic modelling

Here, I think the focus should be more on how and why to simulate case data: linelists, aggregated time series, etc., which then leads into the next bullet.

  • making clear the branching process implementation in {simulist}
  • comparing branching process to compartmental models

Why do you need to explain this here?

  • providing a more mathematical explanation of the simulation

I think this is the same as the second bullet.

We can then add a line to the README that links to this vignette for users that would like a more detailed description of the model.

@joshwlambert
Copy link
Member

Does the "Why do you need to explain this here?" refer to "comparing branching process to compartmental models" or "providing a more mathematical explanation of the simulation"? If the former then this relates to Andree's changes in this PR which make comparison to Bjornstad et al. which discusses SIR modelling. If the latter then the reasoning is the same as outlined in epiverse-trace/discussions/83.

@jamesmbaazam
Copy link
Member

Does the "Why do you need to explain this here?" refer to "comparing branching process to compartmental models" or "providing a more mathematical explanation of the simulation"? If the former then this relates to Andree's changes in this PR which make comparison to Bjornstad et al. which discusses SIR modelling. If the latter then the reasoning is the same as outlined in epiverse-trace/discussions/83.

Sorry, my bad. I was referring to the former. I don't see the need to do that comparison since this package doesn't use compartmental models.

@joshwlambert
Copy link
Member

I see your point. The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

@jamesmbaazam
Copy link
Member

jamesmbaazam commented Aug 19, 2024

I see your point. The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

I actually think you have a good point to explore their similarities and differences, so ignore my previous comment. I, however, think it should be done in the scope of describing the different ways that one can generate synthetic linelist. For example, LLsim, though abandoned or in early stages, uses a compartmental model whereas simulist uses branching processes.

A really good reference for these comparisons is Equivalence of the Erlang-Distributed SEIR Epidemic Model and the Renewal Equation. They do a nice comparison and alignment of compartmental models and renewal/branching processes.

@avallecam
Copy link
Member Author

avallecam commented Sep 3, 2024

The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

Hi, both; thanks for your review and additional references. I agree to close this PR.

This agreement has two elements:

  • First, the aim of #139 is to provide a paper citation to the relationship of the contact rate, probability of infection, and infectious period with R0. Bjornstad et al. relate them but do not fit the package method. I agree with @jamesmbaazam proposal to refer to Grassly and Fraser (2008), which also offers a clearer reference to this relationship in the "offspring distribution" section, matching the package method. I also agree for this to be hosted in a methods vignette, as suggested by @joshwlambert.

  • Second, while writing this PR I also found it appropriate to include the infectious period in the paragraphs and rename some concepts:

    • from: contact distribution to: contact rate distribution (contact_distribution), defined as the distribution of the number of contacts per unit time (day),
    • from: the probability of infection to: the probability of infection on contact (prob_infection),
    • write that: the contact rate and probability of infection on contact are combined into a transmission rate,
    • write that: the transmission rate multiplied by the infectious period distribution (infectious_period) determines the outbreak reproduction number.

@joshwlambert Should these renaming proposals be part of a new and more specific PR to review?

@avallecam
Copy link
Member Author

avallecam commented Sep 12, 2024

From meeting with @joshwlambert the method vignette will be an appropriate place to register {simulist} methods. Then, this vignette can be referenced in the README paragraph that starts with "To simulate a line list for..." where the function arguments are related to determine the outbreak reproduction number. Thus, I agree to close this PR.

Additionally, here I draft some learning points from the meeting that clarify the renaming proposals mentioned above and, if appropriate, can be included in the methods vignette:

  • The contact distribution provides a set of contacts per iteration. Given the probability of infection (on contact), this will generate non-infected and infected contacts. These two groups are registered in the output.
  • The contact distribution may be different to the contact rate. The contact distribution is variable per iteration (as used in {simulist}), while the contact rate would be interpreted as fixed per iteration. It may be misleading to define it as contact rate distribution.
  • The iteration rate in {simulist} is given by the infectious period from which a maximum value is sampled to get the time of infection.
  • The basic reproduction number and transmission rate are not calculated in {simulist}. These emerge from the simulation.
  • There are key assumptions in the simulation that can be listed to users: the latent period and incubation period are equal to 0.

If any point needs to be rephrased, we can use the new PR that aims to include the methods for this.

@joshwlambert
Copy link
Member

Thanks for summarising our discussion @avallecam! I'll now close this PR and tackle the simulation algorithm vignette in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

about the relationship between contact distribution, probability of infection, and basic reproduction number
4 participants