Introduction

Point data with cases coordinates vs. Polygon data with cases statistics

I. Point data with cases coordinates

1. Preparation

Load required packages and functions

Please make sure the following packages are installed for the functions to work. Matrix is required to handle sparse matrix in the calculation of individual reproductive numbers. ggplot2, sf, hexbin and their dependencies are required for plotting functions. gganimate, gifski, png, and transformr are used for producing .gif animation. dplyr is required for data manipulation, and tmap is required for plotting maps. Then, load the functions in Spatial_Reproductive_Number.R file.

source("https://raw.githubusercontent.com/wenlab501/Rt/main/Spatial_Reproductive_Number.R")
Load demo dataset: dengue outbreak

The demo data (dengue_outbreak.csv) provided consists of the onset date and coordinates of 1,848 individual observations. The plot.epi() can draw the epidemic curve (incident cases over time)of this outbreak.

dengue = read.csv("https://raw.githubusercontent.com/wenlab501/Rt/main/dengue_outbreak.csv", stringsAsFactors = F)
dengue$date = as.Date(dengue$date)  # integer format will work as well
head(dengue)
        date     long      lat
1 2007-06-07 120.2165 23.02999
2 2007-06-09 120.2184 23.03016
3 2007-06-10 120.2187 23.03065
4 2007-06-10 120.2165 23.02999
5 2007-06-14 120.2081 22.91810
6 2007-06-15 120.2144 22.97004

2. Descriptive Plots

Several plotting tools are provided to visualize disease surveillance data.

Epidemic curve

Epidemic curve (time series of incident cases) is the most common way to characterize temporal pattern of an outbreak. Timestamp (here, the onset date) for each case is the only arguent needed for plot.epi().

plot.epi(t = dengue$date)

Point pattern

Point pattern is the most direct and exact way to visualize spatial pattern of an outbreak. Coordinates of the cases are neccessary input for plot.points(). crs_pts specifies the coordinate reference system (CRS) of the input coordinates (x and y), the default (crs_pts=NULL) sets the CRS to EPSG:4326, and input coordinates should be longtitude and lattitude. base_map(optional) allows sf-multipolygon object to be inserted as base map.

base = st_read("https://raw.githubusercontent.com/wenlab501/Rt/main/Taiwan_town.geojson",quiet=T) #optional
bnd = c(120.1,22.9,120.4,23.1) #optional
plot.points(x = dengue$long, y= dengue$lat, crs_pts = NULL,base_map = base, bnd = bnd)

Note that, here, we manually specify the boundary of the figure (or it will be decided by sporadic cases in the remote area) to get appropriate view of the main outbreak.

Hexagon-bining plot

However, point pattern can be overwhelmingly unreadable when the data points growing large. A solution “bining”, aggregate the point into specified bins. Here, we provide hexagon-bining function plot.hex() for the job. One can set number of bins on the x-axis (automatically determined on y-axis) to tune the resolution, for example, nbin = 50.

plot.hex(x = dengue$long, y= dengue$lat, base_map = base, nbin = 50, bnd = bnd)

Kernel density estimation

Also, one might need a smoothed overview of the clustering pattern of an outbreak. plot.kde() calculates the 2-d kernel density estimation, and plots the contour. ngrid controls the resolution in calculation.

plot.kde(x = dengue$long, y= dengue$lat, base_map = base, ngrid = 200, bnd = bnd)

3. Estimating Reproductive Numbers

Specify the parameters

Estimating spatial-adjusted reproductive numbers involves two important likelihood components that determine the transmission likelihood of each case-pair. Both of them should be chosen carefully based on prior knowledge about the disease. First is the probability distribution of generation interval, defined as the time interval between the infection (or onset) time of infector and infectee. In the case of dengue, we apply gamma distribution with mean = 20 days and variance = 9 day^2. We need to specify this distribution as a single-argument, log probability densiy function for the subsequent calculation for spatial-adjusted reproductive numbers. Another key component is the spatial weighting function. Here, we use an exponential distribution with mean tranmission distance = 125 m.

lpdf_GI = function(dt) dgamma(dt, 
                               # translate gamma distributed mean-variance to shape-sale parameter
                               shape=(20^2)/9, scale=9/20, 
                               # use log probability to deal with small probabilities
                               log = T)
lpdf_SP = function(dd) dexp(dd,
                             # translate exponential distributed mean to rate
                             rate = 1/125, log = T)

Visualize these two probability density functions

g1 = ggplot(data.frame(x = seq(0, 50, 1)), aes(x))+
        stat_function(fun= ~ exp(lpdf_GI(.x)))+
        theme_classic()
g2 = ggplot(data.frame(x = seq(0, 1000, 10)), aes(x))+
        stat_function(fun= ~ exp(lpdf_SP(.x)))+
        theme_classic()
gridExtra::grid.arrange(g1, g2, ncol=2)

Calculate spatial-adjusted reproductive numbers

calc.Rj() function calculate individual reproductive numbers given data (n-length vectors : t, x, y) and prespecified likelihood functions (lpdf_GI, lpdf_SP). adj.sp controls whether to calculate spatial-adjusted reproductive number (default = True). This function returns a list containing the resulting individual reproductive numbers (n-length vector) and pair-wise transmission probability, an n*n upper triangular matrix (a sparse matrix of class “dtCMatrix”).

res_adj = calc.Rj(t = dengue$date, x = dengue$long, y = dengue$lat, 
                  lpdf_GI = lpdf_GI, lpdf_SP = lpdf_SP, adj.sp = T)
summary(res_adj$Rj) 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.1632  0.5240  0.9989  1.2493 30.2583 

Calculate non-adjusted reproductive numbers

We can use the same function calc.Rj(adj.sp = F) to calculate non-adjusted reproductive numbers, where only t and and lpdf_GI are required

res = calc.Rj(t = dengue$date, lpdf_GI = lpdf_GI, adj.sp = F)
summary(res$Rj)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.4659  0.8887  0.9995  1.2603 20.2298 

Visualize pair-wise transmission likelihood

The pair-wise transmission probabilities can be visualized by image() method. Comparing two method, we can find that spatial-adjusted method produced irregular pattern for accounting the distance of each case-pair. On the other hand, non-adjusted produced very smooth pattern that is solely governed by the distribution of generation interval.

image(res_adj$Pij[1:200, 1:200], main="Adjusted",colorkey = T, lwd =0, at = seq(0,.25,.0025))
image(res$Pij[1:200, 1:200], main="Non-adjusted",colorkey = T, lwd =0, at = seq(0,.25,.0025))

4. Visualizing Individual Reproductive Numbers

Time-varying reproductive numbers

plot.Rt() converts individual reproductive numbers into time-varying reproductive number which is frequently used to characterize the evolution in transmision potential of a whole outbreak. The shaded bars (only present in spatial-adjusted method) represent the variance in the transmissibility among individuals.

dengue$Rj = res$Rj
dengue$Rj_adj = res_adj$Rj
plot.Rt(t = dengue$date, Rjs = list("Non-adjusted"=dengue$Rj,"Adjusted"=dengue$Rj_adj))  

Spatial pattern

plot.points() and plot.hex() allow estimated individual reproductive numbers to be mapped to the color attribute of previous plots. Just specify the estimates in the Rj argument. Because the original scle of individual reproductive numbers is usually large, plot.points() will convert it to percentiles which is then mapped to the color and point size in the figure.

sdf = dengue[dengue$date > as.Date("2007-09-15")&dengue$date <  as.Date("2007-10-15"),]
plot.points(x = sdf$long, y= sdf$lat, Rj = sdf$Rj_adj, base_map = base, bnd = bnd)

plot.hex(), on the other hand, calculates the average individual reproductive numbers whithin each heaxgon, and maps them to color.

plot.hex(x = sdf$long, y= sdf$lat, Rj = sdf$Rj, nbin = 50, base_map = base, bnd = bnd)

Spatio-temporal pattern

We use animation to visualize the spatio-temporal pattern of an outbreak’s transmision potential. animate.points() generation animation of evolving point pattern while animate.hex() is used for Hexagon-bining pattern. dt controls the time step size in the animation

animate.points(t = dengue$date, x = dengue$long, y= dengue$lat, Rj = dengue$Rj_adj,dt = 14, base_map = base,bnd = bnd)

animate.hex(t = dengue$date, x = dengue$long, y= dengue$lat, Rj = dengue$Rj_adj,dt = 14, base_map = base,bnd = bnd)

II. Polygon data with cases statistics

1. Preparation

Load demo dataset: COVID-19 outbreak

The demo data (covid_outbreak.csv) provided consists of the onset date and administrative district (TOWN) of 11,348 individual observations. This data contains COVID-19 local cases in Taipei Metropolitan Area between May to June from Taiwan CDC. Due to the spatial resolution of this data is town (polygon data), the method in the previous part cannot be used directly.

Therefore, our first goal is to convert polygon data to point coordinates. We import the administrative district data, and create random points according to populations in advance (you can use more detailed polygon census data to get more accurate population distribution), to treat as the real population distribution (RandomPoints.csv). Further, through random selection, we can generate a set of case coordinates. If you are worried about the distortion of the sampling, you can use Monte Carlo method repeating sampling multiple times to get stable results.

covid=read.csv("https://raw.githubusercontent.com/wenlab501/Rt/main/covid_outbreak.csv")
covid$date=as.Date(covid$date) #convert to Date format
TPE_town=st_read("https://raw.githubusercontent.com/wenlab501/Rt/main/TPE_town.geojson",quiet=T)
RndPts=read.csv("https://raw.githubusercontent.com/wenlab501/Rt/main/RandomPoints.csv")
Random sampling

covid contains three columns. date stands for onset data and TOWN stands for where case happened. SID is consist of TOWN and a number, to ensure each observations have an independent SID for sampling.

head(covid)
        date         TOWN           SID
1 2021-05-01 台北市萬華區 台北市萬華區1
2 2021-05-01 台北市萬華區 台北市萬華區2
3 2021-05-01 台北市萬華區 台北市萬華區3
4 2021-05-01 台北市萬華區 台北市萬華區4
5 2021-05-01 台北市信義區 台北市信義區1
6 2021-05-01 台北市中正區 台北市中正區1

RndPts contains four columns. RndID is for independent RndID for sampling containing TOWN and a number, X and Y stand for coordinates of each random points. Here, due to processing data in Taiwan, these coordinates is based on CRS of TWD97/TM2 zone 121 (EPSG:3826).

head(RndPts)
          RndID         TOWN        X       Y
1 基隆市信義區1 基隆市信義區 328726.9 2780933
2 基隆市信義區2 基隆市信義區 328941.2 2780944
3 基隆市信義區3 基隆市信義區 328840.1 2780856
4 基隆市信義區4 基隆市信義區 328840.9 2780815
5 基隆市信義區5 基隆市信義區 328820.6 2780807
6 基隆市信義區6 基隆市信義區 328797.6 2780919

Calculate the number of cases and random points in each town.

TOWN.df=data.frame(xtabs(~TOWN,covid))
TOWN.df=left_join(TOWN.df,data.frame(xtabs(~TOWN,RndPts)),by="TOWN")
colnames(TOWN.df)=c("TOWN","CASE","RANDOM")
head(TOWN.df)
          TOWN CASE RANDOM
1 台北市士林區  375  14037
2 台北市大同區  289   6180
3 台北市大安區  266  15241
4 台北市中山區  262  11324
5 台北市中正區  272   7819
6 台北市內湖區  154  14215

Through random sampling to get case point coordinates.

RndID=unlist(apply(TOWN.df, 1, function(x) paste0(x[1],sample(as.numeric(x[3]),as.numeric(x[2])))))
CasePoint=data.frame(SID=sort(covid$SID),RndID=RndID)
CasePoint=left_join(CasePoint,RndPts,by="RndID")
covid.point=left_join(covid,CasePoint,by=c("SID","TOWN"))
head(covid.point)
        date         TOWN           SID            RndID        X       Y
1 2021-05-01 台北市萬華區 台北市萬華區1 台北市萬華區5878 300518.5 2770278
2 2021-05-01 台北市萬華區 台北市萬華區2 台北市萬華區2789 299810.5 2767981
3 2021-05-01 台北市萬華區 台北市萬華區3 台北市萬華區5414 299709.6 2769920
4 2021-05-01 台北市萬華區 台北市萬華區4 台北市萬華區7820 300922.4 2770587
5 2021-05-01 台北市信義區 台北市信義區1 台北市信義區7878 306847.2 2770619
6 2021-05-01 台北市中正區 台北市中正區1 台北市中正區6425 302761.2 2770402

After getting the point coordinates, we can use previous method to calculate reproductive numbers.

2. Descriptive Plots

Point pattern

Those graphs drawn in the first part can also be plotted in the same way. Notice that the parameter crs_pts must be set to 3826. Take plot.points() as an example.

plot.points(x = covid.point$X, y = covid.point$Y, crs_pts = 3826, base_map = base)

Choropleth map

Furthermore, joining the original town data (TPE_town) and case statistics (TOWN.df), we can plot choropleth map of case distribution with the functions in tmap package. For more information about tmap.

TPE_town=left_join(TPE_town,TOWN.df,by="TOWN")
tm_shape(TPE_town)+tm_polygons("CASE",n=10)

Temporal Choropleth map

We use animation to visualize the spatio-temporal pattern of choropleth map through tmap_animation() function.

df=as.data.frame.matrix(xtabs(~TOWN+date,covid))
time=colnames(df)
df$TOWN=rownames(df)
TPE_count=left_join(TPE_town,df,by="TOWN")
map_animate = lapply(time, function(x) {tm_shape(TPE_count)+tm_polygons(col=x,breaks=c(0,1,5,10,20,30,50,75,100,150))})
tmap_animation(map_animate, width = 400, height = 400, delay = 50)

3. Estimating Reproductive Numbers

Calculate spatial-adjusted reproductive numbers

In the same way, through specifying the parameters of generation interval and spatial weighting function, to calculate spatial-adjusted reproductive numbers. Here, we modified calc.Rj() for CRS of TWD97/TM2 to improve computing efficiency. Notice to set CRS_TWD97 to true.

lpdf_GI = function(dt) dgamma(dt, shape=(5.2^2)/1.5, scale=1.5/5.2,log = T)
lpdf_SP = function(dd) dexp(dd, rate = 1/125, log = T)
res_covid = calc.Rj(t=covid.point$date,x=covid.point$X,y=covid.point$Y,
                    lpdf_GI=lpdf_GI,lpdf_SP=lpdf_SP,adj.sp=T,CRS_TWD97=T)

And then, we can sum up individual reproductive numbers to town level through ReduceMatrix() function.

covid_TOWNij=ReduceMatrix(res_covid$Pij,covid.point$TOWN)
head(covid_TOWNij,c(5,5))
             台北市士林區 台北市大同區 台北市大安區 台北市中山區 台北市中正區
台北市士林區 3.404870e+02 4.269062e-01 6.526246e-14    0.9359748 3.134933e-11
台北市大同區 3.994247e+00 2.612638e+02 4.253592e-05   12.9231971 4.156046e+00
台北市大安區 8.499022e-14 4.153135e-06 2.061178e+02    3.4869809 1.347022e+01
台北市中山區 1.099083e+00 1.154398e+01 3.045853e+00  226.4820154 4.127749e+00
台北市中正區 2.671418e-10 2.987112e+00 2.592951e+01    5.0180925 1.981885e+02

4. Visualizing Town Level Reproductive Numbers

After summing up reproductive numbers to town level, the results mean the transitivity between towns. We can draw origin-destination (OD) map by following steps to visualize the transmission.

library(stplanr)
OD=reshape2::melt(covid_TOWNij,c("from","to"),value.name = "Tij")
OD=OD[OD$from!=OD$to,]
ODLine=od2line(OD,TPE_town,zone_code='TOWN')
ODLine=subset(ODLine,Tij>1) #select transmission more than 1 
ODLine$NAME=paste0(ODLine$from,"to",ODLine$to)
ODLine=ODLine[,c(4,1:3)] #change column order

Tii=data.frame(Tij=diag(covid_TOWNij),TOWN=rownames(covid_TOWNij))
TPE_town=left_join(TPE_town,Tii,by="TOWN")
TPE_town=TPE_town[,c(2,1,3:7)] #change column order
TPE_center=st_centroid(TPE_town)

tmap_mode("view")
tm_shape(TPE_town)+tm_polygons("COUNTY",palette="Blues",alpha=0.2,legend.show = F)+
  tm_shape(ODLine)+tm_lines(lwd="Tij",col ="Tij",legend.col.show = F,breaks = c(0:6)*10,scale = max(ODLine$Tij)/3)+
  tm_shape(TPE_center)+tm_symbols("Tij",col ="Tij",palette="Reds",legend.col.show = F)
[Back to top]