Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add reference paper for the reproduction number calculation #151

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -94,9 +94,9 @@ onset_to_death <- epiparameter::epidist_db(
)
```

To simulate a line list for COVID-19 with an Poisson contact distribution with a mean number of contacts of 2 and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. The mean number of contacts and probability of infection determine the outbreak reproduction number, if the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As outlined in @bjornstad2020a, the contact rate ($k$)

I'm now wondering if in fact the sim_linelist() model is not parameterised slightly differently from the description given here and in Bjornstad et al. Related to the above comment, the contact_distribution is not a rate and therefore I'm don't think it is equivalent to $k$.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps the fact that $k$ is an average and the infectious_period is uniformly sampled in {simulist} mean that both processes give the same result (making an assumption about large sample sizes).

Copy link
Member

@jamesmbaazam jamesmbaazam Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the addition of "per day" is incorrect, the contact distribution is independent of time, and only defines the number of contacts per case. It is the infectious period that defines the temporal aspect of transmission.

The way the simulations are set up here assumes the number of contacts an individual has depends on time, i.e., generations of infections. In other words, it depends on how many generations the chain survives.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thinking that this sentence may clarify the difference between the method (branching process) and the relationship between the variables (from a SIR model)

Suggested change
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution with a mean number of contacts of 2 per day and a probability of infection per contact of 0.5, we use the `sim_linelist()` function. This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals. As outlined in @bjornstad2020a, the contact rate ($k$) and probability of infection on contact ($\pi$) are combined into a transmission rate that, multiplied by the infectious period ($1/\gamma$), determines the outbreak reproduction number ($R_o$). If the resulting reproduction number is around one it means we will likely get a reasonably sized outbreak (10 - 1,000 cases, varying due to the stochastic simulation).

Copy link
Member

@jamesmbaazam jamesmbaazam Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runs a branching process model to simulate a SIR model but with no depletion of susceptible individuals.

EDIT:

I think it's fine to just say it simulates infections from a simple branching process as mentioning SIR models here might distract from the main point of telling them what the underlying engine is.


***Warning***: the reproduction number of the simulation results from the contact distribution (`contact_distribution`) and the probability of infection (`prob_infection`); the number of infections is a binomial sample of the number of contacts for each case with the probability of infection (i.e. being sampled) given by `prob_infection`. If the average number of secondary infections from each primary case is greater than 1 then this can lead to the outbreak becoming extremely large. There is currently no depletion of susceptible individuals in the simulation model, so the maximum outbreak size (second element of the vector supplied to the `outbreak_size` argument) can be used to return a line list early without producing an excessively large data set.
***Warning***: the reproduction number of the simulation results from the infectious period distribution (`infectious_period`), the contact rate distribution (`contact_distribution`) and the probability of infection on contact (`prob_infection`); the number of infections is a binomial sample of the number of contacts for each case with the probability of infection (i.e. being sampled) given by `prob_infection`. If the average number of secondary infections from each primary case is greater than 1 then this can lead to the outbreak becoming extremely large. There is currently no depletion of susceptible individuals in the simulation model, so the maximum outbreak size (second element of the vector supplied to the `outbreak_size` argument) can be used to return a line list early without producing an excessively large data set.

```{r sim-linelist}
set.seed(1)
Expand Down
184 changes: 85 additions & 99 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,9 +78,6 @@ contact_distribution <- epiparameter::epidist(
prob_distribution_params = c(mean = 2)
)
#> Citation cannot be created as author, year, journal or title is missing
```

``` r

# create COVID-19 infectious period
infectious_period <- epiparameter::epidist(
Expand All @@ -90,9 +87,6 @@ infectious_period <- epiparameter::epidist(
prob_distribution_params = c(shape = 1, scale = 1)
)
#> Citation cannot be created as author, year, journal or title is missing
```

``` r

# get onset to hospital admission from {epiparameter} database
onset_to_hosp <- epiparameter::epidist_db(
Expand All @@ -107,9 +101,6 @@ onset_to_hosp <- epiparameter::epidist_db(
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>..
#> To retrieve the citation use the 'get_citation' function
```

``` r

# get onset to death from {epiparameter} database
onset_to_death <- epiparameter::epidist_db(
Expand All @@ -126,25 +117,28 @@ onset_to_death <- epiparameter::epidist_db(
#> To retrieve the citation use the 'get_citation' function
```

To simulate a line list for COVID-19 with an Poisson contact
distribution with a mean number of contacts of 2 and a probability of
infection per contact of 0.5, we use the `sim_linelist()` function. The
mean number of contacts and probability of infection determine the
outbreak reproduction number, if the resulting reproduction number is
around one it means we will likely get a reasonably sized outbreak (10 -
1,000 cases, varying due to the stochastic simulation).
To simulate a line list for COVID-19 with a Poisson contact distribution
with a mean number of contacts of 2 per day and a probability of
infection per contact of 0.5, we use the `sim_linelist()` function. As
outlined in @bjornstad2020a, the contact rate ($k$) and probability of
infection on contact ($\pi$) are combined into a transmission rate that,
multiplied by the infectious period ($1/\gamma$), determines the
outbreak reproduction number ($R_o$). If the resulting reproduction
number is around one it means we will likely get a reasonably sized
outbreak (10 - 1,000 cases, varying due to the stochastic simulation).

***Warning***: the reproduction number of the simulation results from
the contact distribution (`contact_distribution`) and the probability of
infection (`prob_infection`); the number of infections is a binomial
sample of the number of contacts for each case with the probability of
infection (i.e. being sampled) given by `prob_infection`. If the average
number of secondary infections from each primary case is greater than 1
then this can lead to the outbreak becoming extremely large. There is
currently no depletion of susceptible individuals in the simulation
model, so the maximum outbreak size (second element of the vector
supplied to the `outbreak_size` argument) can be used to return a line
list early without producing an excessively large data set.
the infectious period distribution (`infectious_period`), the contact
rate distribution (`contact_distribution`) and the probability of
infection on contact (`prob_infection`); the number of infections is a
binomial sample of the number of contacts for each case with the
probability of infection (i.e. being sampled) given by `prob_infection`.
If the average number of secondary infections from each primary case is
greater than 1 then this can lead to the outbreak becoming extremely
large. There is currently no depletion of susceptible individuals in the
simulation model, so the maximum outbreak size (second element of the
vector supplied to the `outbreak_size` argument) can be used to return a
line list early without producing an excessively large data set.

``` r
set.seed(1)
Expand All @@ -156,20 +150,20 @@ linelist <- sim_linelist(
onset_to_death = onset_to_death
)
head(linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Rushdi al-Ishak probable m 35 2023-01-01 <NA> recovered
#> 2 2 Jeffrey Le confirmed m 43 2023-01-01 <NA> recovered
#> 3 3 Dominic Barringer suspected m 1 2023-01-01 <NA> recovered
#> 4 5 Tyler Kelley probable m 78 2023-01-01 <NA> recovered
#> 5 6 Carolyn Moore confirmed f 22 2023-01-01 <NA> recovered
#> 6 8 Cheyenne Sayavong confirmed f 28 2023-01-01 2023-01-04 died
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Wajdi al-Demian probable m 35 2023-01-01 <NA> recovered
#> 2 2 Raaid el-Diab confirmed m 43 2023-01-01 2023-01-07 recovered
#> 3 3 Nickolas Nault suspected m 1 2023-01-01 <NA> recovered
#> 4 5 Hee Kennedy confirmed m 78 2023-01-01 2023-01-03 died
#> 5 6 Hope Arshad suspected f 22 2023-01-01 2023-01-28 died
#> 6 8 Shanta Holiday probable f 28 2023-01-01 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> NA
#> 2 <NA> 2022-12-30 2023-01-05 24
#> 2 <NA> 2022-12-30 2023-01-05 24.1
#> 3 <NA> 2022-12-30 2023-01-02 NA
#> 4 <NA> 2022-12-29 2023-01-02 NA
#> 5 <NA> 2023-01-01 2023-01-03 24
#> 6 2023-01-16 2023-01-03 2023-01-04 24
#> 4 2023-01-21 2022-12-29 2023-01-02 24.1
#> 5 2023-01-10 2023-01-01 2023-01-03 NA
#> 6 <NA> 2023-01-03 2023-01-04 NA
```

In this example, the line list is simulated using the default values
Expand All @@ -189,20 +183,20 @@ linelist <- sim_linelist(
outbreak_start_date = as.Date("2019-12-01")
)
head(linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Kiersten Matthews confirmed f 72 2019-12-01 <NA> recovered
#> 2 2 Omar Cruz confirmed m 85 2019-12-01 <NA> recovered
#> 3 3 Emilio Alvarado suspected m 24 2019-12-01 <NA> recovered
#> 4 4 Sonya Santiago confirmed f 67 2019-12-01 2019-12-03 recovered
#> 5 5 Michelle Brooks probable f 37 2019-12-02 <NA> recovered
#> 6 6 Anastasia Hamlin confirmed f 14 2019-12-02 <NA> recovered
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Kristiana Acheampong confirmed f 88 2019-12-01 <NA> recovered
#> 2 3 Jadeeda el-Abdalla confirmed f 8 2019-12-01 <NA> recovered
#> 3 4 Dominic Sandoval probable m 48 2019-12-01 <NA> recovered
#> 4 5 Zoe Johnson confirmed f 3 2019-12-01 <NA> recovered
#> 5 6 Breann Bruski probable f 25 2019-12-01 <NA> recovered
#> 6 7 Joseph Charley suspected m 57 2019-12-01 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> 23.5
#> 2 <NA> 2019-11-30 2019-12-06 23.5
#> 3 <NA> 2019-11-30 2019-12-01 NA
#> 4 <NA> 2019-12-07 2019-12-09 23.5
#> 5 <NA> 2019-12-01 2019-12-04 NA
#> 6 <NA> 2019-12-01 2019-12-04 23.5
#> 1 <NA> <NA> <NA> 26.7
#> 2 <NA> 2019-11-30 2019-12-02 26.7
#> 3 <NA> 2019-11-30 2019-12-03 NA
#> 4 <NA> 2019-12-02 2019-12-04 26.7
#> 5 <NA> 2019-11-28 2019-12-04 NA
#> 6 <NA> 2019-12-05 2019-12-06 NA
```

To simulate a table of contacts of cases (i.e. to reflect a contact
Expand All @@ -215,26 +209,21 @@ contacts <- sim_contacts(
infectious_period = infectious_period,
prob_infection = 0.5
)
#> Warning: Number of cases exceeds maximum outbreak size.
#> Returning data early with 10176 cases and 20217 total contacts (including cases).
```

``` r
head(contacts)
#> from to age sex date_first_contact date_last_contact
#> 1 Brandon Byrnes Mustaba el-Wahba 14 m 2023-01-01 2023-01-03
#> 2 Brandon Byrnes Lonnie Williams 33 m 2022-12-31 2023-01-05
#> 3 Brandon Byrnes Robert Brown 34 m 2022-12-28 2023-01-02
#> 4 Brandon Byrnes Desiree Mabry 76 f 2023-01-03 2023-01-06
#> 5 Brandon Byrnes Salvador Trevino 73 m 2023-01-05 2023-01-07
#> 6 Desiree Mabry Gina Yu 90 f 2023-01-05 2023-01-06
#> was_case status
#> 1 N under_followup
#> 2 N lost_to_followup
#> 3 N under_followup
#> 4 Y case
#> 5 N under_followup
#> 6 N under_followup
#> from to age sex date_first_contact
#> 1 Munisa el-Kamal Alicia Topper 76 f 2023-01-05
#> 2 Munisa el-Kamal Donald Ramirez 24 m 2023-01-01
#> 3 Munisa el-Kamal Brittney Jarmond 7 f 2023-01-05
#> 4 Munisa el-Kamal Ramalaan el-Saadeh 57 m 2023-01-01
#> 5 Alicia Topper Cody Cohen 7 m 2023-01-02
#> 6 Alicia Topper Ashley Kohl 48 f 2022-12-31
#> date_last_contact was_case status
#> 1 2023-01-07 Y case
#> 2 2023-01-03 Y case
#> 3 2023-01-05 N under_followup
#> 4 2023-01-04 N unknown
#> 5 2023-01-04 Y case
#> 6 2023-01-03 Y case
```

If both the line list and contacts table are required, they can be
Expand All @@ -252,38 +241,35 @@ outbreak <- sim_outbreak(
onset_to_death = onset_to_death
)
head(outbreak$linelist)
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Annnees al-Nazir probable m 24 2023-01-01 <NA> recovered
#> 2 2 Muammar al-Miah probable m 46 2023-01-02 <NA> recovered
#> 3 4 Melina Tarin probable f 71 2023-01-02 <NA> recovered
#> 4 6 Alysh Lovejoy suspected f 13 2023-01-03 <NA> recovered
#> 5 9 Dylan Scanlon confirmed m 28 2023-01-03 <NA> recovered
#> 6 11 Nikolay Mcinnis suspected m 15 2023-01-03 <NA> recovered
#> id case_name case_type sex age date_onset date_admission outcome
#> 1 1 Okatomi Reish confirmed f 30 2023-01-01 <NA> recovered
#> 2 6 Angel Escalante confirmed m 83 2023-01-01 <NA> recovered
#> 3 8 Miqdaad el-Turay probable m 17 2023-01-01 <NA> recovered
#> 4 9 Jennifer Gonzalez confirmed f 74 2023-01-01 <NA> died
#> 5 10 Brendan Wu confirmed m 33 2023-01-01 2023-01-03 recovered
#> 6 12 Erminio Mandell probable m 24 2023-01-01 <NA> recovered
#> date_outcome date_first_contact date_last_contact ct_value
#> 1 <NA> <NA> <NA> NA
#> 2 <NA> 2023-01-04 2023-01-07 NA
#> 3 <NA> 2023-01-03 2023-01-05 NA
#> 4 <NA> 2023-01-04 2023-01-06 NA
#> 5 <NA> 2023-01-04 2023-01-07 25.6
#> 6 <NA> 2023-01-05 2023-01-06 NA
```

``` r
#> 1 <NA> <NA> <NA> 23.9
#> 2 <NA> 2023-01-01 2023-01-05 23.9
#> 3 <NA> 2022-12-31 2023-01-02 NA
#> 4 2023-01-21 2023-01-07 2023-01-08 23.9
#> 5 <NA> 2023-01-01 2023-01-01 23.9
#> 6 <NA> 2023-01-02 2023-01-05 NA
head(outbreak$contacts)
#> from to age sex date_first_contact date_last_contact
#> 1 Annnees al-Nazir Muammar al-Miah 46 m 2023-01-04 2023-01-07
#> 2 Annnees al-Nazir Siena Quimby 76 f 2022-12-27 2023-01-02
#> 3 Muammar al-Miah Melina Tarin 71 f 2023-01-03 2023-01-05
#> 4 Muammar al-Miah Destinee Embry 49 f 2023-01-01 2023-01-04
#> 5 Melina Tarin Alysh Lovejoy 13 f 2023-01-04 2023-01-06
#> 6 Alysh Lovejoy Andrew Knott 55 m 2023-01-03 2023-01-05
#> was_case status
#> 1 Y case
#> 2 N lost_to_followup
#> 3 Y case
#> 4 N under_followup
#> 5 Y case
#> 6 N under_followup
#> from to age sex date_first_contact
#> 1 Okatomi Reish Jesse Lynn 44 f 2023-01-04
#> 2 Okatomi Reish Joshua Rose 89 m 2023-01-05
#> 3 Okatomi Reish Wadee'a al-Rayes 11 f 2023-01-04
#> 4 Okatomi Reish Khaleel al-Hamady 19 m 2022-12-30
#> 5 Okatomi Reish Angel Escalante 83 m 2023-01-01
#> 6 Angel Escalante Burhaan el-Hashemi 87 m 2022-12-29
#> date_last_contact was_case status
#> 1 2023-01-05 N lost_to_followup
#> 2 2023-01-05 N unknown
#> 3 2023-01-05 N under_followup
#> 4 2023-01-03 N under_followup
#> 5 2023-01-05 Y case
#> 6 2023-01-02 N unknown
```

## Help
Expand Down