add reference paper for the reproduction number calculation #151

avallecam · 2024-07-02T15:11:15Z

Please check if the PR fulfills these requirements

I have read the CONTRIBUTING guidelines
A new item has been added to NEWS.md
Tests for the changes have been added (for bug fixes / features)
Docs have been added / updated (for bug fixes / features)
Checks have been run locally and pass

What kind of change does this PR introduce? (Bug fix, feature, docs update, ...)

What is the current behavior? (You can also link to an open issue here)

README paragraph Quick start section relates contact distribution and probability of infection for the Ro calculation. We can add a reference to this relationship.

What is the new behavior (if this is a feature change)?

This proposes to add complementary text to the README to refer to the equation relating contact_distribution, prob_infection and infectious_period elements to calculate Ro, using as reference a paper cited in {epidemics} https://www.nature.com/articles/s41592-020-0822-z

Does this PR introduce a breaking change? (What changes might users need to make in their application due to this PR?)

No

Other information:

This still requires adding the reference to the paper: https://www.nature.com/articles/s41592-020-0822-z to make @bjornstad2020a work. This can reuse content in https://github.com/epiverse-trace/epidemics/

I wanted to include the infectious_period object to the Warning: ... paragraph, but not sure how to phrase it or if it would be appropriate given the emphasis on contact_distribution and prob_infection. So open for your assessment.

avallecam · 2024-07-02T16:13:42Z

@joshwlambert few points

This still requires adding the reference to the paper: https://www.nature.com/articles/s41592-020-0822-z to make @bjornstad2020a work. This can reuse content in https://github.com/epiverse-trace/epidemics/

would you be able to do this? Willing to see how you are storing references.

I wanted to include the infectious_period object to the Warning: ... paragraph, but not sure how to phrase it or if it would be appropriate given the emphasis on contact_distribution and prob_infection. So open for your assessment.

I am open to discussing this further. I am able to add a line that suits the best if appropriate.

joshwlambert · 2024-07-02T17:24:11Z

@avallecam I'm happy to add the reference to the bibliography.

Please feel free to add information to the "Warning..." paragraph if you think it will be informative for users. It's worth adding it as we can always revert to the current commit if we decide not to include it.

I'll let you add the text to the README first and then I'll add the reference.

avallecam · 2024-08-06T15:35:32Z

hi @joshwlambert, welcome back, and thank you for your patience! I just finished the paragraph content.

This is ready for your review and to add the reference paper for @bjornstad2020 https://www.nature.com/articles/s41592-020-0822-z

joshwlambert

Thanks for the contribution @avallecam. It's always good to reference epi literature/theory in the packages.

However, in looking through this again I'm not uncertain about whether the descriptions of Bjornstad et al. directly correspond to the individual-based simulation used in {simulist} (see in-text comments).

@jamesmbaazam given your experience with epidemic modelling and branching processes, would you mind taking a look at this PR and letting us know whether the descriptions added to the README and the code are correct?

joshwlambert · 2024-08-07T08:33:02Z

README.Rmd

@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db(
 )
 ```

-To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). 
+To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). 


I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

As outlined in @bjornstad2020a, the contact rate ($k$)

I'm now wondering if in fact the sim_linelist() model is not parameterised slightly differently from the description given here and in Bjornstad et al. Related to the above comment, the contact_distribution is not a rate and therefore I'm don't think it is equivalent to $k$.

Perhaps the fact that $k$ is an average and the infectious_period is uniformly sampled in {simulist} mean that both processes give the same result (making an assumption about large sample sizes).

I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

The way the simulations are set up here assumes the number of contacts an individual has depends on time, i.e., generations of infections. In other words, it depends on how many generations the chain survives.

jamesmbaazam · 2024-08-07T17:55:19Z

Thanks for tagging. I'll take a closer look tomorrow.

avallecam · 2024-08-08T18:02:29Z

README.Rmd

@@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db(
 )
 ```

-To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). 
+To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation). 


thinking that this sentence may clarify the difference between the method (branching process) and the relationship between the variables (from a SIR model)

Suggested change

To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).

To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).

This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals.

EDIT:

I think it's fine to just say it simulates infections from a simple branching process as mentioning SIR models here might distract from the main point of telling them what the underlying engine is.

jamesmbaazam · 2024-08-08T18:08:20Z

Thanks for the contribution @avallecam. It's always good to reference epi literature/theory in the packages.

However, in looking through this again I'm not uncertain about whether the descriptions of Bjornstad et al. directly correspond to the individual-based simulation used in {simulist} (see in-text comments).

@jamesmbaazam given your experience with epidemic modelling and branching processes, would you mind taking a look at this PR and letting us know whether the descriptions added to the README and the code are correct?

Here is a quick answer that I will update tomorrow.

The transmission process in the SIR model described in Bjornstad's paper is not always the same as the result of the branching process defined here unless their assumptions match. The main difference here is that the SIR model assumes an exponentially distributed infectious period while the model here assumes an arbitrary infectious period (as far as I can tell from a quick look). The branching process also accounts for individual-level transmission heterogeneity whereas it is the same per capita in the SIR model.

A really good reference for this that I can think of is Mathematical models of infectious disease transmission by Nicholas C. Grassly and Christophe Fraser. The section on "offspring" distribution is really insightful. You can reference equation 1, which states the expected number of offspring (here, an entry in the linelist) in terms of the infectious period and probability of susceptibility.

joshwlambert · 2024-08-12T10:17:18Z

@jamesmbaazam and @avallecam thank you both for inputing on this.

Given that we agree there are fundamental differences between the branching process and compartmental models, succinctly wording an explanation in README will be difficult, and we want to avoid confusing users. Instead I propose we close this PR without merging the changes, and instead aim to add a model description vignette as outlined in epiverse-trace/discussions#83. What do you both think?

This vignette can contain the main parts of this PR:

citing literature on epidemic modelling
making clear the branching process implementation in {simulist}
comparing branching process to compartmental models
providing a more mathematical explanation of the simulation

We can then add a line to the README that links to this vignette for users that would like a more detailed description of the model.

jamesmbaazam · 2024-08-12T12:57:24Z

I'm fine with that.

There is quite a good amount of theory and mathematical formulations in epichains in the helpfile of simulate_chains(), the theory vignette, and the literature vignette. So, feel free to use any of that.

@jamesmbaazam and @avallecam thank you both for inputing on this.

Given that we agree there are fundamental differences between the branching process and compartmental models, succinctly wording an explanation in README will be difficult, and we want to avoid confusing users. Instead I propose we close this PR without merging the changes, and instead aim to add a model description vignette as outlined in epiverse-trace/discussions#83. What do you both think?

This vignette can contain the main parts of this PR:

citing literature on epidemic modelling

Here, I think the focus should be more on how and why to simulate case data: linelists, aggregated time series, etc., which then leads into the next bullet.

making clear the branching process implementation in {simulist}

comparing branching process to compartmental models

Why do you need to explain this here?

providing a more mathematical explanation of the simulation

I think this is the same as the second bullet.

We can then add a line to the README that links to this vignette for users that would like a more detailed description of the model.

joshwlambert · 2024-08-12T13:44:43Z

Does the "Why do you need to explain this here?" refer to "comparing branching process to compartmental models" or "providing a more mathematical explanation of the simulation"? If the former then this relates to Andree's changes in this PR which make comparison to Bjornstad et al. which discusses SIR modelling. If the latter then the reasoning is the same as outlined in epiverse-trace/discussions/83.

jamesmbaazam · 2024-08-12T15:27:50Z

Does the "Why do you need to explain this here?" refer to "comparing branching process to compartmental models" or "providing a more mathematical explanation of the simulation"? If the former then this relates to Andree's changes in this PR which make comparison to Bjornstad et al. which discusses SIR modelling. If the latter then the reasoning is the same as outlined in epiverse-trace/discussions/83.

Sorry, my bad. I was referring to the former. I don't see the need to do that comparison since this package doesn't use compartmental models.

joshwlambert · 2024-08-12T15:51:50Z

I see your point. The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

jamesmbaazam · 2024-08-19T11:59:46Z

I see your point. The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

I actually think you have a good point to explore their similarities and differences, so ignore my previous comment. I, however, think it should be done in the scope of describing the different ways that one can generate synthetic linelist. For example, LLsim, though abandoned or in early stages, uses a compartmental model whereas simulist uses branching processes.

A really good reference for these comparisons is Equivalence of the Erlang-Distributed SEIR Epidemic Model and the Renewal Equation. They do a nice comparison and alignment of compartmental models and renewal/branching processes.

avallecam · 2024-09-03T10:07:55Z

The reason I added this was the changes made in this PR add context to the model and similarities to compartmental models. I'll let @avallecam chip in and see how he thinks we can best resolve #139.

Hi, both; thanks for your review and additional references. I agree to close this PR.

This agreement has two elements:

First, the aim of #139 is to provide a paper citation to the relationship of the contact rate, probability of infection, and infectious period with R0. Bjornstad et al. relate them but do not fit the package method. I agree with @jamesmbaazam proposal to refer to Grassly and Fraser (2008), which also offers a clearer reference to this relationship in the "offspring distribution" section, matching the package method. I also agree for this to be hosted in a methods vignette, as suggested by @joshwlambert.
Second, while writing this PR I also found it appropriate to include the infectious period in the paragraphs and rename some concepts:
- from: contact distribution to: contact rate distribution (contact_distribution), defined as the distribution of the number of contacts per unit time (day),
- from: the probability of infection to: the probability of infection on contact (prob_infection),
- write that: the contact rate and probability of infection on contact are combined into a transmission rate,
- write that: the transmission rate multiplied by the infectious period distribution (infectious_period) determines the outbreak reproduction number.

@joshwlambert Should these renaming proposals be part of a new and more specific PR to review?

avallecam · 2024-09-12T15:03:50Z

From meeting with @joshwlambert the method vignette will be an appropriate place to register {simulist} methods. Then, this vignette can be referenced in the README paragraph that starts with "To simulate a line list for..." where the function arguments are related to determine the outbreak reproduction number. Thus, I agree to close this PR.

Additionally, here I draft some learning points from the meeting that clarify the renaming proposals mentioned above and, if appropriate, can be included in the methods vignette:

The contact distribution provides a set of contacts per iteration. Given the probability of infection (on contact), this will generate non-infected and infected contacts. These two groups are registered in the output.
The contact distribution may be different to the contact rate. The contact distribution is variable per iteration (as used in {simulist}), while the contact rate would be interpreted as fixed per iteration. It may be misleading to define it as contact rate distribution.
The iteration rate in {simulist} is given by the infectious period from which a maximum value is sampled to get the time of infection.
The basic reproduction number and transmission rate are not calculated in {simulist}. These emerge from the simulation.
There are key assumptions in the simulation that can be listed to users: the latent period and incubation period are equal to 0.

If any point needs to be rephrased, we can use the new PR that aims to include the methods for this.

joshwlambert · 2024-09-12T15:24:19Z

Thanks for summarising our discussion @avallecam! I'll now close this PR and tackle the simulation algorithm vignette in the future.

avallecam and others added 2 commits July 2, 2024 16:03

add reference paper for the reproduction number calculation

374b8f8

Automatic readme update

ab7f0aa

avallecam requested a review from joshwlambert July 2, 2024 15:55

avallecam and others added 4 commits August 6, 2024 15:42

add infectious_period to warning paragraph

3b5206d

Automatic readme update

d5edc16

use the concept contact rate

7bc7bce

Automatic readme update

382eeca

joshwlambert reviewed Aug 7, 2024

View reviewed changes

avallecam commented Aug 8, 2024

View reviewed changes

joshwlambert closed this Sep 12, 2024

joshwlambert mentioned this pull request Sep 12, 2024

about the relationship between contact distribution, probability of infection, and basic reproduction number #139

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add reference paper for the reproduction number calculation #151

add reference paper for the reproduction number calculation #151

avallecam commented Jul 2, 2024 •

edited

Loading

avallecam commented Jul 2, 2024 •

edited

Loading

joshwlambert commented Jul 2, 2024

avallecam commented Aug 6, 2024

joshwlambert left a comment

joshwlambert Aug 7, 2024

joshwlambert Aug 7, 2024

joshwlambert Aug 7, 2024

jamesmbaazam Aug 8, 2024 •

edited

Loading

jamesmbaazam commented Aug 7, 2024

avallecam Aug 8, 2024

jamesmbaazam Aug 8, 2024 •

edited

Loading

jamesmbaazam commented Aug 8, 2024 •

edited

Loading

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 12, 2024 •

edited

Loading

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 12, 2024

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 19, 2024 •

edited

Loading

avallecam commented Sep 3, 2024 •

edited

Loading

avallecam commented Sep 12, 2024 •

edited

Loading

joshwlambert commented Sep 12, 2024

add reference paper for the reproduction number calculation #151

add reference paper for the reproduction number calculation #151

Conversation

avallecam commented Jul 2, 2024 • edited Loading

avallecam commented Jul 2, 2024 • edited Loading

joshwlambert commented Jul 2, 2024

avallecam commented Aug 6, 2024

joshwlambert left a comment

Choose a reason for hiding this comment

joshwlambert Aug 7, 2024

Choose a reason for hiding this comment

joshwlambert Aug 7, 2024

Choose a reason for hiding this comment

joshwlambert Aug 7, 2024

Choose a reason for hiding this comment

jamesmbaazam Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

jamesmbaazam commented Aug 7, 2024

avallecam Aug 8, 2024

Choose a reason for hiding this comment

jamesmbaazam Aug 8, 2024 • edited Loading

Choose a reason for hiding this comment

jamesmbaazam commented Aug 8, 2024 • edited Loading

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 12, 2024 • edited Loading

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 12, 2024

joshwlambert commented Aug 12, 2024

jamesmbaazam commented Aug 19, 2024 • edited Loading

avallecam commented Sep 3, 2024 • edited Loading

avallecam commented Sep 12, 2024 • edited Loading

joshwlambert commented Sep 12, 2024

avallecam commented Jul 2, 2024 •

edited

Loading

avallecam commented Jul 2, 2024 •

edited

Loading

jamesmbaazam Aug 8, 2024 •

edited

Loading

jamesmbaazam Aug 8, 2024 •

edited

Loading

jamesmbaazam commented Aug 8, 2024 •

edited

Loading

jamesmbaazam commented Aug 12, 2024 •

edited

Loading

jamesmbaazam commented Aug 19, 2024 •

edited

Loading

avallecam commented Sep 3, 2024 •

edited

Loading

avallecam commented Sep 12, 2024 •

edited

Loading