My object is to rewrite the 4th order Runge-Kutta (abbreviated for **RK4**) method for solving the absolute humidity-driven SIRS model developed by
Yang et al. (2014) in **R** language. The details of the SIRS model are provided in the paper.

## 1 RK4

### 1.1 Preliminary

**RK4** is one of the classic methods for numerical integration of ODE models. A brief introduction of **RK4** refers to
Wikipedia.

Consider the following initial value problem of ODE
$$
\begin{equation}
\begin{aligned}
& \frac{dy}{dt} = f(t, y) \\
& y(t_0) = y_0
\end{aligned}
\label{eq1-rk4}
\end{equation}
$$
where `\(y(t)\)`

is the unknown function (scalar or **vector**) which I would like to approximate.

The iterative formula of **RK4** method for solving ODE `\((\ref{eq1-rk4})\)`

is as follows
$$
\begin{equation}
\begin{aligned}
& y_{n + 1} = y_n + \frac{\Delta t}{6}(k_1 + 2k_2 + 2k_3 + k_4) \\
& k_1 = f(t_n, y_n) \\
& k_2 = f(t_n + \frac{\Delta t}{2}, y_n + \frac{k_1\Delta t}{2}) \\
& k_3 = f(t_n + \frac{\Delta t}{2}, y_n + \frac{k_2\Delta t}{2}) \\
& k_4 = f(t_n + \Delta t, y_n + k_3\Delta t) \\
& t_{n + 1} = t_n + \Delta t \\
& n = 0, 1, 2, 3, \dots
\end{aligned}
\label{eq2}
\end{equation}
$$

For simplicity, here I use the simplest
SIR model rather than the SIRS model used in the paper to examine whether the **RK4** method has been implemented correctly. The SIR model is defined as follows
$$
\begin{equation} \label{eq3}
\begin{aligned}
& \frac{dS}{dt} = -\frac{\beta SI}{N} \\
& \frac{dI}{dt} = \frac{\beta SI}{N} - \gamma I \\
& \frac{dR}{dt} = \gamma I
\end{aligned}
\end{equation}
$$
where `\(S(t)\)`

is the number of susceptible people in the population at time `\(t\)`

, `\(I(t)\)`

is the number of infectious people at time `\(t\)`

, `\(R(t)\)`

is the number of recovered people at time `\(t\)`

, `\(\beta\)`

is the transmission rate, `\(\gamma\)`

represents the recovery rate, and `\(N = S(t) + I(t) + R(t)\)`

is the fixed population.

According to the general iterative formula `\((\ref{eq2})\)`

, the iterative formulas for `\(S(t)\)`

, `\(I(t)\)`

and `\(R(t)\)`

of SIR model can be written out.
$$
\begin{equation}
\begin{aligned}
& S_{n + 1} = S_n + \frac{\Delta t}{6}(k_1^S + 2k_2^S + 2k_3^S + k_4^S) \\
& k_1^S = f(t_n, S_n, I_n) = -\frac{\beta S_nI_n}{N} \\
& k_2^S = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_1^S\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = -\frac{\beta}{N}(S_n + \frac{k_1^S\Delta t}{2})(I_n + \frac{k_1^I\Delta t}{2}) \\
& k_3^S = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_2^S\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = -\frac{\beta}{N}(S_n + \frac{k_2^S\Delta t}{2})(I_n + \frac{k_2^I\Delta t}{2}) \\
& k_4^S = f(t_n + \Delta t, S_n + k_3^S\Delta t, I_n + k_3^I\Delta t) = -\frac{\beta}{N}(S_n + k_3^S\Delta t)(I_n + k_3^I\Delta t)
\end{aligned}
\end{equation}
$$

$$ \begin{equation} \begin{aligned} & I_{n + 1} = I_n + \frac{\Delta t}{6}(k_1^I + 2k_2^I + 2k_3^I + k_4^I) \\ & k_1^I = f(t_n, S_n, I_n) = \frac{\beta S_nI_n}{N} - \gamma I_n \\ & k_2^I = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_1^S\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = \frac{\beta}{N}(S_n + \frac{k_1^S\Delta t}{2})(I_n + \frac{k_1^I\Delta t}{2}) - \gamma (I_n + \frac{k_1^I\Delta t}{2}) \\ & k_3^I = f(t_n + \frac{\Delta t}{2}, S_n + \frac{k_2^S\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = \frac{\beta}{N}(S_n + \frac{k_2^S\Delta t}{2})(I_n + \frac{k_2^I\Delta t}{2}) - \gamma (I_n + \frac{k_2^I\Delta t}{2}) \\ & k_4^I = f(t_n + \Delta t, S_n + k_3^S\Delta t, I_n + k_3^I\Delta t) = \frac{\beta}{N}(S_n + k_3^S\Delta t)(I_n + k_3^I\Delta t) - \gamma (I_n + k_3^I\Delta t) \end{aligned} \end{equation} $$

$$ \begin{equation} \begin{aligned} & R_{n + 1} = R_n + \frac{\Delta t}{6}(k_1^R + 2k_2^R + 2k_3^R + k_4^R) \\ & k_1^R = f(t_n, I_n) = \gamma I_n \\ & k_2^R = f(t_n + \frac{\Delta t}{2}, I_n + \frac{k_1^I\Delta t}{2}) = \gamma (I_n + \frac{k_1^I\Delta t}{2}) \\ & k_3^R = f(t_n + \frac{\Delta t}{2}, I_n + \frac{k_2^I\Delta t}{2}) = \gamma (I_n + \frac{k_2^I\Delta t}{2}) \\ & k_4^R = f(t_n + \Delta t, I_n + k_3^I\Delta t) = \gamma (I_n + k_3^I\Delta t) \end{aligned} \end{equation} $$

Note that since the population `\(N = S(t) + I(t) + R(t)\)`

is constant, there will have `\(\frac{dS}{dt} + \frac{dI}{dt} + \frac{dR}{dt} = 0\)`

. Therefore, only two of the three ODEs are independent and sufficient to solve the ODEs. Here, only iterative formulas for `\(S(t)\)`

and `\(I(t)\)`

are used and `\(R(t)\)`

is calculated by `\(R(t) = N - S(t) - I(t)\)`

.

The specific simulation parameters for SIR model are from the study case of
spread of Hong Kong flu in New York city. The inital values for the population variables are
$$
\begin{aligned}
& S(0) = 7,900,000 \\
& I(0) = 10 \\
& R(0) = 0 \\
& N = S(0) + I(0) + R(0) = 7,900,010
\end{aligned}
$$
and the values for the parameters are `\(\beta = \frac{1}{2}\)`

, `\(\gamma = \frac{1}{3}\)`

. All the following numerical methods solve the SIR model with a step size `\(\Delta t = 1\)`

day and time steps `\(t\)`

ranging from 0 to 200 days.

### 1.2 `RK4SIR`

function

Firstly, I need three helper functions to describe the dynamic of **S**, **I** and **R** compartments of SIR model. For variants of SIR model, these functions can be easily modified accordingly.

```
dSdt <- function(t, S, I) {
return(-beta * S * I / N)
}
dIdt <- function(t, S, I) {
return(beta * S * I / N - gamma * I)
}
dRdt <- function(t, I) {
return(gamma * I)
}
```

Note that the `dRdt`

function will not be called hereinafter. Here, it’s still defined just for the integrity of SIR model and easily understanding.

Secondly, as per the algorithm of **RK4**, I create the following `RK4SIR`

function to solve the SIR model.

```
RK4SIR <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1) {
N <<- S0 + I0 + R0 # fixed population
S <- c(S0, rep(0, n))
I <- c(I0, rep(0, n))
R <- c(R0, rep(0, n))
for (i in 1:n) {
Si <- S[i]
Ii <- I[i]
# Ri <- R[i]
S.k1 <- dSdt(i, Si, Ii)
I.k1 <- dIdt(i, Si, Ii)
# R.k1 <- dRdt(i, Ii)
S.k2 <- dSdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
I.k2 <- dIdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
# R.k2 <- dRdt(i + dt / 2, Ii + dt / 2 * I.k1)
S.k3 <- dSdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
I.k3 <- dIdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
# R.k3 <- dRdt(i + dt / 2, Ii + dt / 2 * I.k2)
S.k4 <- dSdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
I.k4 <- dIdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
# R.k4 <- dRdt(i + dt, Ii + dt * I.k3)
S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
# R[i + 1] <- Ri + dt / 6 * (R.k1 + 2 * R.k2 + 2 * R.k3 + R.k4)
}
R <- N - S - I
return(data.frame(n = 0:n, S = S, I = I, R = R))
}
```

Last, call `RK4SIR`

to simulate the spread of Hong Kong flu in New York city.

```
S0 <- 7900000
I0 <- 10
# R0 <- 0
# N <- S0 + I0 + R0
beta <- 1 / 2
gamma <- 1 / 3
n <- 200
r <- RK4SIR(n, beta, gamma, S0, I0)
```

Use `ggplot2`

package to plot the resulting `\(S(t)\)`

, `\(I(t)\)`

and `\(R(t)\)`

curves.

```
library(reshape2)
r.plot <- melt(r, id = "n", measure = c("S", "I", "R"))
library(ggplot2)
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() +
ggtitle("Spread of Hong Kong Flu in New York City")
```

For the spread dynamic of the disease, I am interested in the time `\(t\_{max}\)`

when the number of infected people gets maximum and the corresponding maximum number of infected people `\(I\_{max}\)`

. `\(t\_{max}\)`

and `\(I\_{max}\)`

are also key measures to evaluate the performance of the absolute humidity-driven SIRS model in
Yang et al. (2014). The following codes give `\(t_{max}\)`

and `\(I_{max}\)`

.

```
which.max(r$I)
```

```
## [1] 75
```

```
max(r$I)
```

```
## [1] 497473.6
```

Comparing to the results of the study case
The SIR Model for Spread of Disease, the trends of `\(S(t)\)`

, `\(I(t)\)`

and `\(R(t)\)`

curves are correct. Furthermore, `\(t_{max} = 75\)`

is the same to the approximate value of 75 identified from the figure.

### 1.3 `RK4SIR.Yang`

function

Based on the `propagateParSIR`

function from the code shipping with
Yang et al. (2014), I just reproduce the algorithm for easily understanding and create the following function `RK4SIR.Yang`

to solve the SIR model.

```
RK4SIR.Yang <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1) {
N <<- S0 + I0 + R0 # fixed population
S <- c(S0, rep(0, n))
I <- c(I0, rep(0, n))
R <- c(R0, rep(0, n))
for (i in 1:n) {
Si <- S[i]
Ii <- I[i]
S.k1 <- dSdt(i, Si, Ii)
I.k1 <- dIdt(i, Si, Ii)
Ts1 <- Si + dt / 2 * S.k1
Ti1 <- Ii + dt / 2 * I.k1
S.k2 <- dSdt(i + dt / 2, Ts1, Ti1)
I.k2 <- dIdt(i + dt / 2, Ts1, Ti1)
Ts2 <- Ts1 + dt / 2 * S.k2
Ti2 <- Ti1 + dt / 2 * I.k2
S.k3 <- dSdt(i + dt / 2, Ts2, Ti2)
I.k3 <- dIdt(i + dt / 2, Ts2, Ti2)
Ts3 <- Ts2 + dt * S.k3
Ti3 <- Ti2 + dt * I.k3
S.k4 <- dSdt(i + dt, Ts3, Ti3)
I.k4 <- dIdt(i + dt, Ts3, Ti3)
S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
R[i + 1] <- N - S[i + 1] - I[i + 1]
}
return(data.frame(n = 0:n, S = S, I = I, R = R))
}
```

Note that `RK4SIR.Yang`

is a bit different from `RK4SIR`

in calculating `\(k_1\)`

, `\(k_2\)`

, `\(k_3\)`

and `\(k_4\)`

. The functionality of `RK4SIR.Yang`

is also tested by the same case.

Call `RK4SIR.Yang`

to simulate the spread of Hong Kong flu in New York city.

```
r <- RK4SIR.Yang(n, beta, gamma, S0, I0)
```

Plot `\(S(t)\)`

, `\(I(t)\)`

and `\(R(t)\)`

curves.

```
r.plot <- melt(r, id = "n", measure = c("S", "I", "R"))
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() +
ggtitle("Spread of Hong Kong Flu in New York City")
```

Extract `\(t_{max}\)`

and `\(I_{max}\)`

.

```
which.max(r$I)
```

```
## [1] 72
```

```
max(r$I)
```

```
## [1] 482429.6
```

Comparing to the results of `RK4SIR`

, the results of `RK4SIR.Yang`

show that the infected indiviuals reaching peak is 3 days earlier and the number of infected individuals is more than 15,000 less. I doubt the `RK4SIR.Yang`

function gives a wrong solution. To figure out which function (`RK4SIR`

or `RK4SIR.Yang`

) implements the **RK4** method correctly, I also solve the SIR model by directly calling ODE solvers from the existing software package, including `deSolve`

package and **MATLAB**.

### 1.4 `deSolve`

package

`deSolve`

package provides general solvers for inital value problems of ODE, PDE, DAE and DDE. `rk4`

function from `deSolve`

package is an implementation of the classical **RK4** integration algorithm. The steps of invoking `rk4`

function to solve the SIR model are as follows.

Firstly, define the initial values and parameters used in the SIR model.

```
# initial (state) values for SIR model
y0 <- c(S = S0, I = I0, R = 0)
# vector of time steps
times <- 0:n
# vector of parameters used in SIR model
params <- c(beta = beta, gamma = gamma)
```

Secondly, define the SIR model.

```
SIR <- function(t, y, params) {
with(as.list(c(params, y)), {
dS <- -beta * S * I / N
dI <- beta * S * I / N - gamma * I
dR <- gamma * I
list(c(dS, dI, dR))
})
}
```

Last, call `rk4`

function to solve the SIR model and plot the results.

```
library(deSolve)
r <- rk4(y0, times, SIR, params)
plot(r)
```

Similarly, extract `\(t_{max}\)`

and `\(I_{max}\)`

.

```
which.max(r[, "I"])
```

```
## [1] 75
```

```
max(r[, "I"])
```

```
## [1] 497473.6
```

The results are the same to those of `RK4SIR`

function. As `deSolve`

is a mature package and has been used by numerous users, I believe the `\(t_{max} = 75\)`

and `\(I_{max} = 4.97474\times 10^{5}\)`

given by `RK4SIR`

function are correct.

### 1.5 `ode45`

function

Furthermore, I solve the SIR model by calling `ode45`

function from **MATLAB** to further confirm my judgement. The codes are organized in two **MATLAB** scripts `SIR.m`

and `simSIR.m`

. Since the whole process is very similar to the **R** code, I would not explain the codes in detail. Here, I just present the codes.

`SIR.m`

defines the ODEs of the SIR model.

```
function dy = SIR(t, y, beta, gamma, N)
% only two ODEs of SIR model are independent
% solving three ODE together, MATLAB gives wrong solution
S = y(1);
I = y(2);
dS = - beta * S * I / N;
dI = beta * S * I / N - gamma * I;
dy = [dS, dI]';
end
```

`simSIR.m`

runs the specific simulation of the spread of Hong Kong flu in New York city and gives the results of interest.

```
clear
t = 0:200;
S0 = 7900000;
I0 = 10;
R0 = 0;
N = S0 + I0 + R0;
y0 = [S0, I0];
beta = 1 / 2;
gamma = 1 / 3;
options = [];
[T, Y] = ode45(@SIR, t, y0, options, beta, gamma, N);
% plot S, I, R curves
S = Y(:, 1);
I = Y(:, 2);
R = N - S - I;
plot(T, S, '-r', T, I, '-g', T, R, '-b')
title('Spread of Hong Kong Flu in New York City')
xlabel('Days')
ylabel('Value')
legend('S','I', 'R')
% peak of the infected
[maxVal, ind] = max(Y(:, 2))
% add veritical line to indicate the peak
ylim = get(gca, 'ylim'); % get y range
hold on
plot([ind - 1, ind - 1], [ylim(1), ylim(2)], 'LineStyle', '--')
```

The results of running `simSIR.m`

are

```
>> simSIR
ind =
75
maxVal =
4.9755e+05
```

Note that by default **MATLAB**’s `ode45`

solver is
Dormand–Prince method (alias `dopri5`

), which is also a member of the Runge–Kutta family of ODE solvers. As `ode45`

is adaptive stepsize integration algorithm, it can give more accurate numerical solution than **RK4** method.

`rk45dp7`

(alias `ode45`

) method from `deSolve`

package provides `ode45`

solver for ODEs in **R**. Solve the SIR model by using `ode45`

method in `deSolve`

and extract `\(t_{max}\)`

and `\(I_{max}\)`

.

```
r <- ode(y0, times, SIR, params, method = rkMethod("ode45"))
# peak of the infected
which.max(r[, "I"])
```

```
## [1] 75
```

```
max(r[, "I"])
```

```
## [1] 497474.2
```

`\(I_{max}\)`

given by `rk4`

and `ode45`

from `deSolve`

package are almost the same, but both are approximate 80 less than that of `ode45`

in **MATLAB**. The results of `ode45`

provided by **R** and **MATLAB** are both right as the discrepancy results from the inherent difference in numerical precision between **R** and **MATLAB**. For more details refer to
the post.

The results given by both `deSolve`

package and **MATLAB** show that `\(I_{max} = 75\)`

is the right answer. I am pretty sure that the results from `RK4SIR`

function are correct; however, the `RK4SIR.Yang`

function `\({\color{red}{doesn't\ correctly}}\)`

implement **RK4** method for solving SIR model. Since the `RK4SIR.Yang`

function gives a bit ealier `\(I_{max}\)`

, I wonder whether this wrong **RK4** method has any influence on the conclusions of
Yang et al. (2014)

## 2 Incidence

### 2.1 Preliminary

Instead of `\(S(t)\)`

, `\(I(t)\)`

and `\(R(t)\)`

of SIR model, health workers in practice often collect daily number of newly infected people. This is the loosely expressed
incidence. Besides incidence, cummulative incidence is also of interest. Let `\(C(t)\)`

denote incidence at time `\(t\)`

, `\(P(t)\)`

denote cummulative incidence at time `\(t\)`

. `\(C(t)\)`

and `\(P(t)\)`

can be calculated as follows
$$
\begin{aligned}
& C(t) = \frac{dP}{dt} = \frac{\beta SI}{N} \\
& C(t_0) = S_0 \\
& P(t) = \int_0^t{C(t)}dt = \int_0^t{\frac{\beta SI}{N}}dt
\end{aligned}
$$

In some references,
cummulative incidence (also known as **incidence proportion**) refers to the number of new cases `\(P(T)\)`

within a specified time period `\(T\)`

divided by the size of the population initially at risk `\(N\)`

, expressed as `\(P(T)\)`

cases per `\(N\)`

persons, or `\(\frac{P(T)}{N}\times 100\%\)`

. The rigorous definition of incidence is
incidence rate, which is expressed as `\(P(T)\)`

cases per `\(N\)`

persons per `\(T\)`

time, or `\(\frac{P(T)}{N\times T}\times 100\%\)`

.

### 2.2 `RK4SIR`

function

A logical parameter `incidence`

is added to `RK4SIR`

function to control whether or not incidence and cummulative incidence are returned.

```
RK4SIR <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1, incidence = FALSE) {
N <<- S0 + I0 + R0
S <- c(S0, rep(0, n))
I <- c(I0, rep(0, n))
R <- c(R0, rep(0, n))
for (i in 1:n) {
Si <- S[i]
Ii <- I[i]
S.k1 <- dSdt(i, Si, Ii)
I.k1 <- dIdt(i, Si, Ii)
S.k2 <- dSdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
I.k2 <- dIdt(i + dt / 2, Si + dt / 2 * S.k1, Ii + dt / 2 * I.k1)
S.k3 <- dSdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
I.k3 <- dIdt(i + dt / 2, Si + dt / 2 * S.k2, Ii + dt / 2 * I.k2)
S.k4 <- dSdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
I.k4 <- dIdt(i + dt, Si + dt * S.k3, Ii + dt * I.k3)
S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
}
R <- N - S - I
if (!incidence) {
return(data.frame(n = 0:n, S = S, I = I, R = R))
} else {
# newly infected per day (incidence)
inc = c(I0, -diff(S))
# cumulative incidence
cum.inc = cumsum(inc)
return(data.frame(n = 0:n, S = S, I = I, R = R, inc = inc, cum.inc = cum.inc))
}
}
```

Use the updated `RK4SIR`

function to track the time-varying incidence and cummulative incidence of the same study case.

```
r <- RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE)
# # plot S, I, R, incidence, cummulative incidence curves
r.plot <- melt(r, id = "n", measure = c("S", "I", "R", "inc", "cum.inc"))
p <- ggplot(r.plot, aes(x = n, y = value, group = variable, color = variable))
p + geom_line() +
scale_colour_discrete(name = "Legend",
breaks = c("S", "I", "R", "inc", "cum.inc"),
labels = c("Susceptible", "Infected ", "Recovered", "Incidence",
"Cummulative incidence")) +
ggtitle("Spread of Hong Kong Flu in New York City")
```

As shown in the above figure, the time when incidence reaches peak is a bit earlier than that of the infected people. The following codes give the exact results.

```
which.max(r$inc)
```

```
## [1] 73
```

```
max(r$inc)
```

```
## [1] 173488.5
```

### 2.3 `RK4SIR.Yang`

function

I also update `RK4SIR.Yang`

function to return cummulative incidence based on the algorithm from `propagateParSIR`

function. The initial cummulative incidence `\(C_0\)`

is 0, which is `\(S_0\)`

in `RK4SIR`

function; however, I think it’s more consistent to set `\(C_0 = S_0\)`

. Notably, the `RK4`

method for `RK4SIR.Yang`

hasn’t been corrected and the function doesn’t return incidence.

```
RK4SIR.Yang <- function(n, beta, gamma, S0, I0, R0 = 0, dt = 1, incidence = FALSE) {
N <<- S0 + I0 + R0
S <- c(S0, rep(0, n))
I <- c(I0, rep(0, n))
R <- c(R0, rep(0, n))
cum.inc <- rep(0, n + 1) # cumulative incidence
for (i in 1:n) {
Si <- S[i]
Ii <- I[i]
S.k1 <- dSdt(i, Si, Ii)
I.k1 <- dIdt(i, Si, Ii)
CI.k1 <- -S.k1
Ts1 <- Si + dt / 2 * S.k1
Ti1 <- Ii + dt / 2 * I.k1
S.k2 <- dSdt(i + dt / 2, Ts1, Ti1)
I.k2 <- dIdt(i + dt / 2, Ts1, Ti1)
CI.k2 <- -S.k2
Ts2 <- Ts1 + dt / 2 * S.k2
Ti2 <- Ti1 + dt / 2 * I.k2
S.k3 <- dSdt(i + dt / 2, Ts2, Ti2)
I.k3 <- dIdt(i + dt / 2, Ts2, Ti2)
CI.k3 <- -S.k3
Ts3 <- Ts2 + dt * S.k3
Ti3 <- Ti2 + dt * I.k3
S.k4 <- dSdt(i + dt, Ts3, Ti3)
I.k4 <- dIdt(i + dt, Ts3, Ti3)
CI.k4 <- -S.k4
S[i + 1] <- Si + dt / 6 * (S.k1 + 2 * S.k2 + 2 * S.k3 + S.k4)
I[i + 1] <- Ii + dt / 6 * (I.k1 + 2 * I.k2 + 2 * I.k3 + I.k4)
R[i + 1] <- N - S[i + 1] - I[i + 1]
cum.inc[i + 1] <- cum.inc[i] + dt / 6 * (CI.k1 + 2 * CI.k2 + 2 * CI.k3 + CI.k4)
}
if (!incidence) {
return(data.frame(n = 0:n, S = S, I = I, R = R))
} else {
return(data.frame(n = 0:n, S = S, I = I, R = R, cum.inc = cum.inc))
}
}
```

### 2.4 Benchmark

`microbenchmark`

package is a good option for benchmarking functions. I use `microbenchmark`

function to compare `rk4`

, `RK4SIR`

and `RK4SIR.Yang`

1000 times.

```
library(microbenchmark)
compare <- microbenchmark(rk4(y0, times, SIR, params),
RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE),
RK4SIR.Yang(n, beta, gamma, S0, I0, incidence = TRUE),
times = 10)
# change expr for plot
compare$expr <- gsub("rk4(y0, times, SIR, params)", "rk4",
compare$expr, fixed = TRUE)
compare$expr <- gsub("RK4SIR(n, beta, gamma, S0, I0, incidence = TRUE)", "RK4SIR",
compare$expr, fixed = TRUE)
compare$expr <- gsub("RK4SIR.Yang(n, beta, gamma, S0, I0, incidence = TRUE)",
"RK4SIR.Yang", compare$expr, fixed = TRUE)
compare
```

```
## Unit: milliseconds
## expr min lq mean median uq max neval
## rk4 10.522828 10.849218 12.314321 11.330950 14.672388 15.626585 10
## RK4SIR 1.691710 2.008025 2.244679 2.157010 2.655575 2.672397 10
## RK4SIR.Yang 1.627021 1.879434 6.647515 2.081856 2.122347 48.506175 10
```

Draw a comparison of distribution times.

```
p <- autoplot(compare)
```

```
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
```

```
p + geom_violin(aes(color = expr, fill = expr)) +
theme(legend.position = "none")
```

`rk4`

function from `deSolve`

package runs much slower than both `RK4SIR`

and `RK4SIR.Yang`

fucntions as it takes much time to spin up for calling external `FORTRAN`

or `C`

functions. Though `RK4SIR`

function performs the extra calculation of incidence, the benchmarking results show that `RK4SIR`

function still runs faster than `RK4SIR.Yang`

function. This is because `RK4SIR.Yang`

function calculates cummulative incidence one by one per iterative formula while `RK4SIR`

function directly apply `diff`

function to the resulting vector `S`

to accomplish the calculation. `diff`

function is a vector operation and takes much less time.