- Introduction
- Loading packages
- Importing Data
- Merging the Data
- Analyzing Data for a Specific Prefecture and Year
- Visualizing the Data
- Trends Over Time
- Linear Regression Analysis
- Comparing Two Prefectures
- Dispatches by Day of the Week
- Creating the Heat Index
- Visualizing Heat Index and Ambulance Dispatches
- Linear Regression with Heat Index
Introduction
In this encounter, we will explore the relationship between temperature and heatstroke-related ambulance dispatches using data from Japan. The data contains daily records of maximum temperature and relative humidity across all 47 prefectures from 2015 to 2019.
Loading packages
We will start by loading the tidyverse and rio packages. Ensure these packages are installed before running the code.
# install.packages("tidyverse")
# install.packages("rio")
library(tidyverse)
library(rio)
Importing Data
We will import two data sets: one for heatstroke-related ambulance dispatches (HSAD) and one for temperature data.
hsad <- import("https://github.com/ucrdatacenter/projects/raw/refs/heads/main/SCIBIOM303/2025h1/data/HSAD.csv")
temp <- import("https://github.com/ucrdatacenter/projects/raw/refs/heads/main/SCIBIOM303/2025h1/data/temperature.csv")
Merging the Data
To analyze the relationship between temperature and heatstroke-related ambulance dispatches, we will merge the two data sets using the left_join() function. We will join them on the columns Date and Prefecture. Next, we convert the Date column into a date format.
merged <- left_join(temp, hsad, by = c("Date", "Prefecture")) |>
mutate(Date = dmy(Date))
Analyzing Data for a Specific Prefecture and Year
We will focus on Hiroshima in 2015 to study the relationship between daily maximum temperature and heatstroke-related ambulance dispatches.
hiroshima_2015 <- merged |>
filter(Prefecture == "Hiroshima") |>
filter(Year == 2015)
Visualizing the Data
We will create a scatter plot to visualize the relationship between maximum temperature and heatstroke-related ambulance dispatches. A trend line is added to show the overall pattern.
ggplot(hiroshima_2015, aes(x = Tempmax, y = HSAD)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Temperature and Heatstroke-related Ambulance Dispatches in Hiroshima in 2015",
x = "Maximum Temperature",
y = "Heatstroke-related Ambulance Dispatches") +
theme_minimal()
The number of heatstroke-related ambulance dispatches increases significantly when the temperature exceeds 30°C. This trend aligns with what we expect because higher temperatures can cause a rapid rise in core body temperature, leading to heatstroke (source).
Trends Over Time
We will create another plot to show how maximum temperature changes over time in Hiroshima in 2015.
ggplot(hiroshima_2015, aes(x = Date, y = Tempmax)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Temperature Over Time in Hiroshima in 2015",
x = "Date",
y = "Maximum Temperature") +
theme_minimal()
The temperature rises in June, peaks in August (above 35°C), and then decreases in October. This seasonal pattern is typical for Hiroshima (source).
Linear Regression Analysis
To quantify the relationship between temperature and heatstroke-related ambulance dispatches, we perform a linear regression analysis.
lr <- lm(HSAD ~ Tempmax, data = hiroshima_2015)
summary(lr)
##
## Call:
## lm(formula = HSAD ~ Tempmax, data = hiroshima_2015)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.947 -5.163 -1.816 3.845 27.511
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -80.4723 6.5532 -12.28 <2e-16 ***
## Tempmax 3.1151 0.2248 13.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.083 on 118 degrees of freedom
## Multiple R-squared: 0.6193, Adjusted R-squared: 0.6161
## F-statistic: 192 on 1 and 118 DF, p-value: < 2.2e-16
The R-squared value of 0.6193 indicates that about 61.93% of the variability in ambulance dispatches can be explained by the maximum temperature. This suggests a strong positive relationship between HSAD and temperature.
Comparing Two Prefectures
Next, we will compare the relationship between temperature and ambulance dispatches in Hiroshima and Kyoto in 2015 using a line graph.
merged |>
filter(Prefecture == "Kyoto"| Prefecture == "Hiroshima") |>
filter(Year == 2015) |>
ggplot() +
geom_smooth(aes(x = Tempmax, y = HSAD, color = Prefecture), se = FALSE) +
labs(title = "Temperature and Heatstroke-related Ambulance Dispatches (2015)",
x = "Maximum Temperature",
y = "Heatstroke-related Ambulance Dispatches") +
theme_minimal()
Both prefectures show similar trends, with the number of dispatches increasing as temperature rises (source).
Dispatches by Day of the Week
We can also analyze the average number of dispatches by the day of the week.
hiroshima_2015 |>
group_by(Dow) |>
summarize(HSAD = mean(HSAD)) |>
ggplot(aes(x = Dow, y = HSAD)) +
geom_col() +
labs(title = "Heatstroke-related Ambulance Dispatches by Day of the Week (2015)",
x = "Day of the week",
y = "Heatstroke-related Ambulance Dispatches") +
theme_minimal()
There are more dispatches on days 1 and 7, which correspond to Sunday and Saturday, respectively. This is expected, as heatstroke cases are more frequent on weekends, especially on Sundays when people engage in outdoor or sporting activities (source).
Creating the Heat Index
The heat index, also known as the “feels-like” temperature, is calculated using temperature and humidity to reflect how hot it feels to the human body. This formula was developed by Robert G. Steadman in 1979 and later adapted by the National Weather Service. It accounts for the reduced ability of the body to cool itself through sweating in high-humidity conditions, making it a crucial indicator for heat-related health risks (source).
We first convert the temperature to Fahrenheit to calculate the heat index, then convert it back to Celsius.
hiroshima_lr_data <- hiroshima_2015 %>%
mutate(
Temp_F = Tempmax * 9 / 5 + 32,
Heat_Index_F = -42.379 +
2.04901523 * Temp_F +
10.14333127 * Rhumave -
0.22475541 * Temp_F * Rhumave -
0.00683783 * Temp_F^2 -
0.05481717 * Rhumave^2 +
0.00122874 * Temp_F^2 * Rhumave +
0.00085282 * Temp_F * Rhumave^2 -
0.00000199 * Temp_F^2 * Rhumave^2,
Heat_Index_C = (Heat_Index_F - 32) * 5 / 9)
Visualizing Heat Index and Ambulance Dispatches
We will create a plot to show the relationship between the heat index and ambulance dispatches.
ggplot(hiroshima_lr_data, aes(x = Heat_Index_C, y = HSAD)) +
geom_point() +
geom_smooth(se = FALSE) +
labs(title = "Heat Index and Heatstroke-related Ambulance Dispatches in Hiroshima in 2015",
x = "Heat Index",
y = "Heatstroke-related Ambulance Dispatches") +
theme_minimal()
The number of dispatches rises sharply after the heat index exceeds 30°C, confirming that the heat index is a strong predictor of heat-related emergencies (source).
Linear Regression with Heat Index
Finally, we perform a linear regression analysis using the heat index to predict ambulance dispatches.
lr2 <- lm(HSAD ~ Heat_Index_C, data = hiroshima_lr_data)
summary(lr2)
##
## Call:
## lm(formula = HSAD ~ Heat_Index_C, data = hiroshima_lr_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.4816 -4.0284 -0.9406 3.2193 21.7201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -45.3048 3.2060 -14.13 <2e-16 ***
## Heat_Index_C 1.6585 0.0947 17.51 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.905 on 118 degrees of freedom
## Multiple R-squared: 0.7222, Adjusted R-squared: 0.7198
## F-statistic: 306.8 on 1 and 118 DF, p-value: < 2.2e-16
The R-squared value is 0.7222, meaning that 72.22% of the variability in dispatches can be explained by the heat index. This suggests the heat index is an even stronger predictor than temperature alone.