RK4 Method for Solving SIR Model

My object is to rewrite the 4th order Runge-Kutta (abbreviated for RK4) method for solving the absolute humidity-driven SIRS model developed by Yang et al. (2014) in R language. The details of the SIRS model are provided in the paper.

1 RK4

1.1 Preliminary

RK4 is one of the classic methods for numerical integration of ODE models. A brief introduction of RK4 refers to Wikipedia.

Consider the following initial value problem of ODE $$ \begin{equation} \begin{aligned} & \frac{dy}{dt} = f(t, y) \\ & y(t_0) = y_0 \end{aligned} \label{eq1-rk4} \end{equation} $$ where $y(t)$ is the unknown function (scalar or vector) which I would like to approximate.

The iterative formula of RK4 method for solving ODE $(\ref{eq1-rk4})$ is as follows $$ \begin{equation} \begin{aligned} & y_{n + 1} = y_n + \frac{\Delta t}{6}(k_1 + 2k_2 + 2k_3 + k_4) \\ & k_1 = f(t_n, y_n) \\ & k_2 = f(t_n + \frac{\Delta t}{2}, y_n + \frac{k_1\Delta t}{2}) \\ & k_3 = f(t_n + \frac{\Delta t}{2}, y_n + \frac{k_2\Delta t}{2}) \\ & k_4 = f(t_n + \Delta t, y_n + k_3\Delta t) \\ & t_{n + 1} = t_n + \Delta t \\ & n = 0, 1, 2, 3, \dots \end{aligned} \label{eq2} \end{equation} $$

For simplicity, here I use the simplest SIR model rather than the SIRS model used in the paper to examine whether the RK4 method has been implemented correctly. The SIR model is defined as follows $$ \begin{equation} \label{eq3} \begin{aligned} & \frac{dS}{dt} = -\frac{\beta SI}{N} \\ & \frac{dI}{dt} = \frac{\beta SI}{N} - \gamma I \\ & \frac{dR}{dt} = \gamma I \end{aligned} \end{equation} $$ where $S(t)$ is the number of susceptible people in the population at time $t$, $I(t)$ is the number of infectious people at time $t$, $R(t)$ is the number of recovered people at time $t$, $\beta$ is the transmission rate, $\gamma$ represents the recovery rate, and $N = S(t) + I(t) + R(t)$ is the fixed population.

According to the general iterative formula $(\ref{eq2})$, the iterative formulas for $S(t)$, $I(t)$ and $R(t)$ of SIR model can be written out. $$ \begin{equation} \begin{aligned} & S_{n + 1} = S_n + \frac{\Delta t}{6}(k_1^S + 2k_2^S + 2k_3^S + k_4^S) \\ & k_1^S = f(t_n, S_n, I_n) = -\frac{\beta S_nI_n}{N} \\ & k_2^S = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_1^S\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = -\frac{\beta}{N}(S_n + \frac{k_1^S\Delta t}{2})(I_n + \frac{k_1^I\Delta t}{2}) \\ & k_3^S = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_2^S\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = -\frac{\beta}{N}(S_n + \frac{k_2^S\Delta t}{2})(I_n + \frac{k_2^I\Delta t}{2}) \\ & k_4^S = f(t_n + \Delta t, S_n + k_3^S\Delta t, I_n + k_3^I\Delta t) = -\frac{\beta}{N}(S_n + k_3^S\Delta t)(I_n + k_3^I\Delta t) \end{aligned} \end{equation} $$

$$ \begin{equation} \begin{aligned} & I_{n + 1} = I_n + \frac{\Delta t}{6}(k_1^I + 2k_2^I + 2k_3^I + k_4^I) \\ & k_1^I = f(t_n, S_n, I_n) = \frac{\beta S_nI_n}{N} - \gamma I_n \\ & k_2^I = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_1^S\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = \frac{\beta}{N}(S_n + \frac{k_1^S\Delta t}{2})(I_n + \frac{k_1^I\Delta t}{2}) - \gamma (I_n + \frac{k_1^I\Delta t}{2}) \\ & k_3^I = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_2^S\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = \frac{\beta}{N}(S_n + \frac{k_2^S\Delta t}{2})(I_n + \frac{k_2^I\Delta t}{2}) - \gamma (I_n + \frac{k_2^I\Delta t}{2}) \\ & k_4^I = f(t_n + \Delta t, S_n + k_3^S\Delta t, I_n + k_3^I\Delta t) = \frac{\beta}{N}(S_n + k_3^S\Delta t)(I_n + k_3^I\Delta t) - \gamma (I_n + k_3^I\Delta t) \end{aligned} \end{equation} $$

$$ \begin{equation} \begin{aligned} & R_{n + 1} = R_n + \frac{\Delta t}{6}(k_1^R + 2k_2^R + 2k_3^R + k_4^R) \\ & k_1^R = f(t_n, I_n) = \gamma I_n \\ & k_2^R = f(t_n + \frac{\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = \gamma (I_n + \frac{k_1^I\Delta t}{2}) \\ & k_3^R = f(t_n + \frac{\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = \gamma (I_n + \frac{k_2^I\Delta t}{2}) \\ & k_4^R = f(t_n + \Delta t, I_n + k_3^I\Delta t) = \gamma (I_n + k_3^I\Delta t) \end{aligned} \end{equation} $$

Note that since the population $N = S(t) + I(t) + R(t)$ is constant, there will have $\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = 0$. Therefore, only two of the three ODEs are independent and sufficient to solve the ODEs. Here, only iterative formulas for $S(t)$ and $I(t)$ are used and $R(t)$ is calculated by $R(t) = N - S(t) - I(t)$.

The specific simulation parameters for SIR model are from the study case of spread of Hong Kong flu in New York city. The inital values for the population variables are $$ \begin{aligned} & S(0) = 7,900,000 \\ & I(0) = 10 \\ & R(0) = 0 \\ & N = S(0) + I(0) + R(0) = 7,900,010 \end{aligned} $$ and the values for the parameters are $\beta = \frac{1}{2}$, $\gamma = \frac{1}{3}$. All the following numerical methods solve the SIR model with a step size $\Delta t = 1$ day and time steps $t$ ranging from 0 to 200 days.

1.2 `RK4SIR` function

Firstly, I need three helper functions to describe the dynamic of S, I and R compartments of SIR model. For variants of SIR model, these functions can be easily modified accordingly.

dSdt <- function(t, S, I) {
  return(-beta * S * I / N)
}

dIdt <- function(t, S, I) {
  return(beta * S * I / N - gamma * I)
}

dRdt <- function(t, I) {
  return(gamma * I)
}

Note that the dRdt function will not be called hereinafter. Here, it’s still defined just for the integrity of SIR model and easily understanding.

Secondly, as per the algorithm of RK4, I create the following RK4SIR function to solve the SIR model.

RK4SIR <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1) {
  N <<- S0 + I0 + R0  # fixed population
  
  S <- c(S0, rep(0, n))
  I <- c(I0, rep(0, n))
  R <- c(R0, rep(0, n))
  for (i in 1:n) {
    Si <- S[i]
    Ii <- I[i]
#     Ri <- R[i]
    
    S.k1 <- dSdt(i, Si, Ii)
    I.k1 <- dIdt(i, Si, Ii)
#     R.k1 <- dRdt(i, Ii)
    
    S.k2 <- dSdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
    I.k2 <- dIdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
#     R.k2 <- dRdt(i + dt / 2, Ii + dt / 2 * I.k1)
    
    S.k3 <- dSdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
    I.k3 <- dIdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
#     R.k3 <- dRdt(i + dt / 2, Ii + dt / 2 * I.k2)
    
    S.k4 <- dSdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
    I.k4 <- dIdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
#     R.k4 <- dRdt(i + dt, Ii + dt * I.k3)
    
    S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
    I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
#     R[i + 1] <- Ri + dt / 6 * (R.k1 + 2 * R.k2 + 2 * R.k3 + R.k4)
  }
  R <- N - S - I
  return(data.frame(n = 0:n, S = S, I = I, R = R))
}

Last, call RK4SIR to simulate the spread of Hong Kong flu in New York city.

S0 <- 7900000
I0 <- 10
# R0 <- 0
# N <- S0 + I0 + R0
beta <- 1 / 2
gamma <- 1 / 3
n <- 200

r <- RK4SIR(n, beta, gamma, S0, I0)

Use ggplot2 package to plot the resulting $S(t)$, $I(t)$ and $R(t)$ curves.

library(reshape2)
r.plot <- melt(r, id = "n", measure = c("S", "I", "R"))
library(ggplot2)
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() + 
  ggtitle("Spread of Hong Kong Flu in New York City")

For the spread dynamic of the disease, I am interested in the time $t\_{max}$ when the number of infected people gets maximum and the corresponding maximum number of infected people $I\_{max}$. $t\_{max}$ and $I\_{max}$ are also key measures to evaluate the performance of the absolute humidity-driven SIRS model in Yang et al. (2014). The following codes give $t_{max}$ and $I_{max}$.

which.max(r$I)

## [1] 75

max(r$I)

## [1] 497473.6

Comparing to the results of the study case The SIR Model for Spread of Disease, the trends of $S(t)$, $I(t)$ and $R(t)$ curves are correct. Furthermore, $t_{max} = 75$ is the same to the approximate value of 75 identified from the figure.

1.3 `RK4SIR.Yang` function

Based on the propagateParSIR function from the code shipping with Yang et al. (2014), I just reproduce the algorithm for easily understanding and create the following function RK4SIR.Yang to solve the SIR model.

RK4SIR.Yang <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1) {
  N <<- S0 + I0 + R0  # fixed population
  
  S <- c(S0, rep(0, n))
  I <- c(I0, rep(0, n))
  R <- c(R0, rep(0, n))
  for (i in 1:n) {
    Si <- S[i]
    Ii <- I[i]
    S.k1 <- dSdt(i, Si, Ii)
    I.k1 <- dIdt(i, Si, Ii)
    
    Ts1 <- Si + dt / 2 * S.k1
    Ti1 <- Ii + dt / 2 * I.k1
    S.k2 <- dSdt(i + dt / 2, Ts1, Ti1)
    I.k2 <- dIdt(i + dt / 2, Ts1, Ti1)
    
    Ts2 <- Ts1 + dt / 2 * S.k2
    Ti2 <- Ti1 + dt / 2 * I.k2
    S.k3 <- dSdt(i + dt / 2, Ts2, Ti2)
    I.k3 <- dIdt(i + dt / 2, Ts2, Ti2)
    
    Ts3 <- Ts2 + dt * S.k3
    Ti3 <- Ti2 + dt * I.k3
    S.k4 <- dSdt(i + dt, Ts3, Ti3)
    I.k4 <- dIdt(i + dt, Ts3, Ti3)
    
    S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
    I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
    R[i + 1] <- N - S[i + 1] - I[i + 1]
  }
  return(data.frame(n = 0:n, S = S, I = I, R = R))
}

Note that RK4SIR.Yang is a bit different from RK4SIR in calculating $k_1$, $k_2$, $k_3$ and $k_4$. The functionality of RK4SIR.Yang is also tested by the same case.

Call RK4SIR.Yang to simulate the spread of Hong Kong flu in New York city.

r <- RK4SIR.Yang(n, beta, gamma, S0, I0)

Plot $S(t)$, $I(t)$ and $R(t)$ curves.

r.plot <- melt(r, id = "n", measure = c("S", "I", "R"))
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() + 
  ggtitle("Spread of Hong Kong Flu in New York City")

Extract $t_{max}$ and $I_{max}$.

which.max(r$I)

## [1] 72

max(r$I)

## [1] 482429.6

Comparing to the results of RK4SIR, the results of RK4SIR.Yang show that the infected indiviuals reaching peak is 3 days earlier and the number of infected individuals is more than 15,000 less. I doubt the RK4SIR.Yang function gives a wrong solution. To figure out which function (RK4SIR or RK4SIR.Yang) implements the RK4 method correctly, I also solve the SIR model by directly calling ODE solvers from the existing software package, including deSolve package and MATLAB.

1.4 `deSolve` package

deSolve package provides general solvers for inital value problems of ODE, PDE, DAE and DDE. rk4 function from deSolve package is an implementation of the classical RK4 integration algorithm. The steps of invoking rk4 function to solve the SIR model are as follows.

Firstly, define the initial values and parameters used in the SIR model.

# initial (state) values for SIR model
y0 <- c(S = S0, I = I0, R = 0)
# vector of time steps
times <- 0:n
# vector of parameters used in SIR model
params <- c(beta = beta, gamma = gamma)

Secondly, define the SIR model.

SIR <- function(t, y, params) {
  with(as.list(c(params, y)), {
    dS <- -beta * S * I / N
    dI <- beta * S * I / N - gamma * I
    dR <- gamma * I
    list(c(dS, dI, dR))
  })
}

Last, call rk4 function to solve the SIR model and plot the results.

library(deSolve)
r <-  rk4(y0, times, SIR, params)
plot(r)

Similarly, extract $t_{max}$ and $I_{max}$.

which.max(r[, "I"])

## [1] 75

max(r[, "I"])

## [1] 497473.6

The results are the same to those of RK4SIR function. As deSolve is a mature package and has been used by numerous users, I believe the $t_{max} = 75$ and $I_{max} = 4.97474\times 10^{5}$ given by RK4SIR function are correct.

1.5 `ode45` function

Furthermore, I solve the SIR model by calling ode45 function from MATLAB to further confirm my judgement. The codes are organized in two MATLAB scripts SIR.m and simSIR.m. Since the whole process is very similar to the R code, I would not explain the codes in detail. Here, I just present the codes.

SIR.m defines the ODEs of the SIR model.

function dy = SIR(t, y, beta, gamma, N)
% only two ODEs of SIR model are independent
% solving three ODE together, MATLAB gives wrong solution
S = y(1);
I = y(2);
dS = - beta * S * I / N;
dI = beta * S * I / N - gamma * I;
dy = [dS, dI]';
end

simSIR.m runs the specific simulation of the spread of Hong Kong flu in New York city and gives the results of interest.

clear

t = 0:200;
S0 = 7900000;
I0 = 10;
R0 = 0;
N = S0 + I0 + R0;
y0 = [S0, I0];
beta = 1 / 2;
gamma = 1 / 3;
options = [];

[T, Y] = ode45(@SIR, t, y0, options, beta, gamma, N);
% plot S, I, R curves
S = Y(:, 1);
I = Y(:, 2);
R = N - S - I;
plot(T, S, '-r', T, I, '-g', T, R, '-b')
title('Spread of Hong Kong Flu in New York City')
xlabel('Days')
ylabel('Value')
legend('S','I', 'R')

% peak of the infected
[maxVal, ind] = max(Y(:, 2))
% add veritical line to indicate the peak
ylim = get(gca, 'ylim');  %  get y range
hold on
plot([ind - 1, ind - 1], [ylim(1), ylim(2)], 'LineStyle', '--')

The results of running simSIR.m are

>> simSIR

ind =

    75
    
maxVal =

   4.9755e+05

Note that by default MATLAB’s ode45 solver is Dormand–Prince method (alias dopri5), which is also a member of the Runge–Kutta family of ODE solvers. As ode45 is adaptive stepsize integration algorithm, it can give more accurate numerical solution than RK4 method.

rk45dp7 (alias ode45) method from deSolve package provides ode45 solver for ODEs in R. Solve the SIR model by using ode45 method in deSolve and extract $t_{max}$ and $I_{max}$.

r <-  ode(y0, times, SIR, params, method = rkMethod("ode45"))
# peak of the infected
which.max(r[, "I"])

## [1] 75

max(r[, "I"])

## [1] 497474.2

$I_{max}$ given by rk4 and ode45 from deSolve package are almost the same, but both are approximate 80 less than that of ode45 in MATLAB. The results of ode45 provided by R and MATLAB are both right as the discrepancy results from the inherent difference in numerical precision between R and MATLAB. For more details refer to the post.

The results given by both deSolve package and MATLAB show that $I_{max} = 75$ is the right answer. I am pretty sure that the results from RK4SIR function are correct; however, the RK4SIR.Yang function ${\color{red}{doesn't\ correctly}}$ implement RK4 method for solving SIR model. Since the RK4SIR.Yang function gives a bit ealier $I_{max}$, I wonder whether this wrong RK4 method has any influence on the conclusions of Yang et al. (2014)

2 Incidence

2.1 Preliminary

Instead of $S(t)$, $I(t)$ and $R(t)$ of SIR model, health workers in practice often collect daily number of newly infected people. This is the loosely expressed incidence. Besides incidence, cummulative incidence is also of interest. Let $C(t)$ denote incidence at time $t$, $P(t)$ denote cummulative incidence at time $t$. $C(t)$ and $P(t)$ can be calculated as follows $$ \begin{aligned} & C(t) = \frac{dP}{dt} = \frac{\beta SI}{N} \\ & C(t_0) = S_0 \\ & P(t) = \int_0^t{C(t)}dt = \int_0^t{\frac{\beta SI}{N}}dt \end{aligned} $$

In some references, cummulative incidence (also known as incidence proportion) refers to the number of new cases $P(T)$ within a specified time period $T$ divided by the size of the population initially at risk $N$, expressed as $P(T)$ cases per $N$ persons, or $\frac{P(T)}{N}\times 100\%$. The rigorous definition of incidence is incidence rate, which is expressed as $P(T)$ cases per $N$ persons per $T$ time, or $\frac{P(T)}{N\times T}\times 100\%$.

2.2 `RK4SIR` function

A logical parameter incidence is added to RK4SIR function to control whether or not incidence and cummulative incidence are returned.

RK4SIR <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1, incidence = FALSE) {
  N <<- S0 + I0 + R0
  
  S <- c(S0, rep(0, n))
  I <- c(I0, rep(0, n))
  R <- c(R0, rep(0, n))
  for (i in 1:n) {
    Si <- S[i]
    Ii <- I[i]
    
    S.k1 <- dSdt(i, Si, Ii)
    I.k1 <- dIdt(i, Si, Ii)
    
    S.k2 <- dSdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
    I.k2 <- dIdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
    
    S.k3 <- dSdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
    I.k3 <- dIdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
    
    S.k4 <- dSdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
    I.k4 <- dIdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
    
    S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
    I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
  }
  R <- N - S - I
  
  if (!incidence) {
    return(data.frame(n = 0:n, S = S, I = I, R = R))
  } else {
    # newly infected per day (incidence)
    inc = c(I0, -diff(S))
    # cumulative incidence
    cum.inc = cumsum(inc)
    return(data.frame(n = 0:n, S = S, I = I, R = R, inc = inc, cum.inc = cum.inc))
  }
}

Use the updated RK4SIR function to track the time-varying incidence and cummulative incidence of the same study case.

r <- RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE)
# # plot S, I, R, incidence, cummulative incidence curves
r.plot <- melt(r, id = "n", measure = c("S", "I", "R", "inc", "cum.inc"))
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() + 
  scale_colour_discrete(name = "Legend", 
                      breaks = c("S", "I", "R", "inc", "cum.inc"), 
                      labels = c("Susceptible", "Infected ", "Recovered", "Incidence", 
                               "Cummulative incidence")) + 
  ggtitle("Spread of Hong Kong Flu in New York City")

As shown in the above figure, the time when incidence reaches peak is a bit earlier than that of the infected people. The following codes give the exact results.

which.max(r$inc)

## [1] 73

max(r$inc)

## [1] 173488.5

2.3 `RK4SIR.Yang` function

I also update RK4SIR.Yang function to return cummulative incidence based on the algorithm from propagateParSIR function. The initial cummulative incidence $C_0$ is 0, which is $S_0$ in RK4SIR function; however, I think it’s more consistent to set $C_0 = S_0$. Notably, the RK4 method for RK4SIR.Yang hasn’t been corrected and the function doesn’t return incidence.

RK4SIR.Yang <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1, incidence = FALSE) {
  N <<- S0 + I0 + R0
  
  S <- c(S0, rep(0, n))
  I <- c(I0, rep(0, n))
  R <- c(R0, rep(0, n))
  cum.inc <- rep(0, n + 1)  # cumulative incidence
  for (i in 1:n) {
    Si <- S[i]
    Ii <- I[i]
    S.k1 <- dSdt(i, Si, Ii)
    I.k1 <- dIdt(i, Si, Ii)
    CI.k1 <- -S.k1
    
    Ts1 <- Si + dt / 2 * S.k1
    Ti1 <- Ii + dt / 2 * I.k1
    S.k2 <- dSdt(i + dt / 2, Ts1, Ti1)
    I.k2 <- dIdt(i + dt / 2, Ts1, Ti1)
    CI.k2 <- -S.k2
    
    Ts2 <- Ts1 + dt / 2 * S.k2
    Ti2 <- Ti1 + dt / 2 * I.k2
    S.k3 <- dSdt(i + dt / 2, Ts2, Ti2)
    I.k3 <- dIdt(i + dt / 2, Ts2, Ti2)
    CI.k3 <- -S.k3
    
    Ts3 <- Ts2 + dt * S.k3
    Ti3 <- Ti2 + dt * I.k3
    S.k4 <- dSdt(i + dt, Ts3, Ti3)
    I.k4 <- dIdt(i + dt, Ts3, Ti3)
    CI.k4 <- -S.k4
    
    S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
    I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
    R[i + 1] <- N - S[i + 1] - I[i + 1]
    cum.inc[i + 1] <- cum.inc[i] + dt / 6 * (CI.k1 + 2 * CI.k2 + 2 * CI.k3 + CI.k4)
  }
  if (!incidence) {
    return(data.frame(n = 0:n, S = S, I = I, R = R))
  } else {
    return(data.frame(n = 0:n, S = S, I = I, R = R, cum.inc = cum.inc))
  }
}

2.4 Benchmark

microbenchmark package is a good option for benchmarking functions. I use microbenchmark function to compare rk4, RK4SIR and RK4SIR.Yang 1000 times.

library(microbenchmark)
compare <- microbenchmark(rk4(y0, times, SIR, params), 
                          RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE), 
                          RK4SIR.Yang(n, beta, gamma, S0, I0, incidence = TRUE), 
                          times = 10)
# change expr for plot
compare$expr <- gsub("rk4(y0, times, SIR, params)", "rk4", 
                     compare$expr, fixed = TRUE)
compare$expr <- gsub("RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE)", "RK4SIR", 
                     compare$expr, fixed = TRUE)
compare$expr <- gsub("RK4SIR.Yang(n, beta, gamma, S0, I0, incidence = TRUE)", 
                     "RK4SIR.Yang", compare$expr, fixed = TRUE)
compare

## Unit: milliseconds
##         expr       min        lq      mean    median        uq       max neval
##          rk4 10.522828 10.849218 12.314321 11.330950 14.672388 15.626585    10
##       RK4SIR  1.691710  2.008025  2.244679  2.157010  2.655575  2.672397    10
##  RK4SIR.Yang  1.627021  1.879434  6.647515  2.081856  2.122347 48.506175    10

Draw a comparison of distribution times.

p <- autoplot(compare)

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

p + geom_violin(aes(color = expr, fill = expr)) + 
  theme(legend.position = "none")

rk4 function from deSolve package runs much slower than both RK4SIR and RK4SIR.Yang fucntions as it takes much time to spin up for calling external FORTRAN or C functions. Though RK4SIR function performs the extra calculation of incidence, the benchmarking results show that RK4SIR function still runs faster than RK4SIR.Yang function. This is because RK4SIR.Yang function calculates cummulative incidence one by one per iterative formula while RK4SIR function directly apply diff function to the resulting vector S to accomplish the calculation. diff function is a vector operation and takes much less time.

Tony Tsai

May the force of science be with you

RK4 Method for Solving SIR Model

1 RK4

1.1 Preliminary

1.2 `RK4SIR` function

1.3 `RK4SIR.Yang` function

1.4 `deSolve` package

1.5 `ode45` function

2 Incidence

2.1 Preliminary

2.2 `RK4SIR` function

2.3 `RK4SIR.Yang` function

2.4 Benchmark

1 RK4

1.1 Preliminary

1.2 RK4SIR function

1.3 RK4SIR.Yang function

1.4 deSolve package

1.5 ode45 function

2 Incidence

2.1 Preliminary

2.2 RK4SIR function

2.3 RK4SIR.Yang function

2.4 Benchmark

1.2 `RK4SIR` function

1.3 `RK4SIR.Yang` function

1.4 `deSolve` package

1.5 `ode45` function

2.2 `RK4SIR` function

2.3 `RK4SIR.Yang` function