Overview
Acute malnutrition remains a critical public health concern, particularly in fragile and humanitarian settings, often requiring targetted interventions. Its spatial distribution is often uneven, with clusters of elevated burden. Understanding where these unusual high-burden areas exist is crucial for effective planning, resource allocation, early warning systems, and timely response to alleviate the plight of the affected population.
In this article, you will learn how to use the wowi
R
package to understand the spatial distribution of acute malnutrition in
your data set. Before we start with the walk through, it is important to
have some critical concepts under your belt.
The spatial scan approach
To identify statistically significant spatial clusters, analysts
frequently rely on SaTScan
, a powerful tool
for spatial and spatio-temporal scan statistics. SaTScan’s
Bernoulli model is particularly suited for binary
outcomes such as:
- “Case” = child is acutely malnourished
- “Control” = child is not acutely malnourished
It evaluates whether observed clustering of cases exceeds what would be expected by chance Martin Kulldorff (July 2022).
Using wowi
to identify spatial clusters of high rates
of acute malnutrition
The wowi
R package provides a tidy, reproducible
interface for running Bernoulli spatial scan analysis via SaTScan. It
depends on the rsatscan
R package, which enables the use of SaTScan
software from
within R. While rsatscan
offers general-purpose
functionality, wowi
was specifically developed for acute
malnutrition analysis, tailoring the tools to the needs of
nutrition-focused spatial investigations.
What outputs does wowi
provide?
The package returns the standard set of outputs generated by SaTScan:
- An interactive HTML map displaying the detected clusters. These can be opened with any web browser, and user can play it with it - zoom in and out, click on the bubbles and see the statistics about the bubbles, and more.
- A text-based output with extension
.txt
containing the results. - Additional GIS-based files (e.g., shapefiles), which can be useful for further geospatial manipulation or integration into other mapping workflows.
- Furthermore, it also returns a tidy data frame summarising the detected clusters, parsed from the text-based file described in point 2. This summary enables easier manipulation and integration into an analyst’s downstream workflow.
Tip 💡
The output described in point 4 offers an IPC Acute Malnutrition-related insight for IPC users. It indicates whether the number of survey clusters or enumeration area IDs included in the detected cluster—and the total number of children—meet the IPC Acute Malnutrition criteria for disaggregated analysis, as recommended when the design effect exceeds 1.3 (IPC Global Partners 2021)
The results are presented in the final column of the table and take one of two possible values:
“yes” – the number of cluster IDs is ≥ 5 and the number of children with valid measurements is ≥ 100.
“no” – the criteria above are not met.
From data to clusters: the wowi analysis workflow
The analysis workflow with wowi begins with the standard
anthropometric data processing steps: data wrangling and quality checks.
For this purpose, wowi relies on the mwana
package,
which will be installed or updated automatically when you install
wowi.
Further downstream in the analysis, the sf package is required to
generate shapefile-related outputs. Note that sf
is not
installed with wowi, so you must install it separately.
Note ⓘ
wowi
will not function unless the SaTScan software is installed on your machine. You can download it from: https://www.satscan.org.Once installed, copy the path to the SaTScan executable on your machine. This is needed for R to locate and run it.
We will now begin the demonstration using the package’s built-in
dataset, anthro
.
head(anthro)
#> # A tibble: 6 × 11
#> district cluster sex age weight height oedema muac y x precision
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Kotido 15 2 48.9 15.9 108. n 128 2.93 34.1 8
#> 2 Kaabong 161 1 43.8 15 106. n 128 3.65 34.1 4
#> 3 Kaabong 161 1 37.0 13.6 91.9 n 149 NA NA NA
#> 4 Kaabong 161 1 NA 7 100 n 150 NA NA NA
#> 5 Kaabong 160 1 22.5 8.6 74.2 n 119 3.65 34.1 4
#> 6 Kaabong 160 1 32.1 13.4 88.1 n 163 NA NA NA
This is a SMART survey dataset, which includes geographical
coordinates: the x
and y
columns, along with
the corresponding precision recorded in the precision
column.
Note ⓘ
Geographical coordinates are indispensable for spatial cluster detection — without precise location data, spatial analysis simply cannot be performed.
Data wrangling
You can scan for spatial clusters using acute malnutrition case definitions based on weight-for-height z-scores and/or oedema, mid-upper arm circumference (MUAC) and/or oedema, or a combined definition. The data wrangling workflow should be configured accordingly. In this article, we will demonstrate using the former.
library(mwana)
a <- anthro |>
mw_wrangle_wfhz(
sex = sex,
weight = weight,
height = height,
.recode_sex = FALSE
) |>
define_wasting(
zscores = wfhz,
edema = oedema,
.by = "zscores"
) |>
dplyr::rename(
longitude = x,
latitude = y
)
#> ================================================================================
Hereafter, you can check the quality of the data. Learn how to do so here.
Running the spatial scan
This is handled by the ww_run_satscan()
function. Read
its full documentation by typing ?ww_run_satscan
in your
console or by clicking here.
You can run the analysis on either a single-area dataset or a multiple-area dataset. We will begin with the former, followed by the latter
Single-area data set
## Filter out one district from the data set -----
d <- subset(a, district == "Kotido")
## Load rsatscan ----
library(rsatscan) #<1>
## Set up the parameters ----
results <- ww_run_satscan(
.data = d,
filename = "Kotido",
dir = directory, #<2>
sslocation = "/Applications/SaTScan.app/Contents/app", #<3>
ssbatchfilename = "satscan", #<4>
satscan_version = "10.3.2",
.by_area = FALSE,
.scan_for = "high-low-rates",
.gam_based = "wfhz",
area = NULL,
cleanup = TRUE,
verbose = FALSE
)
Although you will never use the
rsatscan
package explicitly, you must make it available in your environment. This is because it creates a specific environment object,ssenv
, required forwowi
to function. If this is not loaded,wowi
will not run.Specify the folder where you would like the output results to be saved for your analysis.
Provide the path to your SaTScan GUI installation. This varies depending on your operating system (OS). For macOS, it is typically “/Applications/SaTScan.app/Contents/app”; for Windows, “C:/Program Files/SaTScan”.
Indicate the name of the SaTScan batch file. For macOS, use “satscan”; for Windows, use “SaTScanBatch64”.
Getting acquainted with the files
The code above will generate seven files in the folder you specified
using the dir
argument. These files will share a common
base name, as defined by the string provided in the
filename
argument.
#> [1] "Kotido.cas" "Kotido.clustermap.html" "Kotido.col.prj"
#> [4] "Kotido.ctl" "Kotido.geo" "Kotido.gis.prj"
#> [7] "Kotido.gis.shp" "Kotido.gis.shx" "Kotido.prm"
- The
.cas
,.ctl
and.geo
files are SaTScan input datasets representing cases, controls and geographical coordinates, respectively. - The
.clustermap.html
file is an interactive HTML visualisation with an embedded Google Map displaying the detected clusters. - All other files are related to shapefiles and can be used for further GIS analysis and manipulation.
Furthermore, we can view the results in a text-based format. To do
this, we access the object to which we assigned the results of
ww_run_satscan()
. In our demo, this object is called
results.
To see the text-based results, we use results$.txt
. As
this file is quite long, for the purposes of this demo we will only
display the first 48 rows.
#> [1] " _____________________________"
#> [2] ""
#> [3] " SaTScan v10.3.2"
#> [4] " _____________________________"
#> [5] ""
#> [6] "results"
#> [7] ""
#> [8] "Program run on: Sat Aug 2 16:20:17 2025"
#> [9] ""
#> [10] "Purely Spatial analysis"
#> [11] "scanning for clusters with high or low rates"
#> [12] "using the Bernoulli model."
#> [13] "_______________________________________________________________________________________________"
#> [14] ""
#> [15] "SUMMARY OF DATA"
#> [16] ""
#> [17] "Study period.......................: 2025/08/02 to 2025/08/02"
#> [18] "Number of locations................: 36"
#> [19] "Total population...................: 333"
#> [20] "Total number of cases..............: 26"
#> [21] "Percent cases......................: 7.8"
#> [22] "_______________________________________________________________________________________________"
#> [23] ""
#> [24] "CLUSTERS DETECTED"
#> [25] ""
#> [26] "1.Location IDs included.: 10, 9"
#> [27] " Coordinates / radius..: (34.113909 N, 3.087933 E) / 1.20 km"
#> [28] " Span..................: 1.20 km"
#> [29] " Population............: 25"
#> [30] " Number of cases.......: 6"
#> [31] " Expected cases........: 1.95"
#> [32] " Observed / expected...: 3.07"
#> [33] " Relative risk.........: 3.70"
#> [34] " Percent cases in area.: 24.0"
#> [35] " Log likelihood ratio..: 3.458213"
#> [36] " P-value...............: 0.55"
#> [37] ""
#> [38] "2.Location IDs included.: 2, 3, 1"
#> [39] " Coordinates / radius..: (33.930353 N, 3.172303 E) / 4.00 km"
#> [40] " Span..................: 5.88 km"
#> [41] " Population............: 35"
#> [42] " Number of cases.......: 0"
#> [43] " Expected cases........: 2.73"
#> [44] " Observed / expected...: 0"
#> [45] " Relative risk.........: 0"
#> [46] " Percent cases in area.: 0"
#> [47] " Log likelihood ratio..: 3.013494"
#> [48] " P-value...............: 0.68"
This file isn’t very convenient if we want to perform further
manipulation. Don’t worry — ww_run_satscan()
has got you
covered ☂️. It parses the text-based results into a tidy data frame that
you can easily work with 😎. To access it, simply subset the results
object like this: results$.df
:
#> # A tibble: 4 × 18
#> survey_area nr_EAs total_children total_cases `%_cases` location_ids geo
#> <chr> <int> <int> <int> <dbl> <chr> <chr>
#> 1 Kotido 36 333 26 7 10,9 34.11390…
#> 2 Kotido 36 333 26 7 2,3,1 33.93035…
#> 3 Kotido 36 333 26 7 15 34.11670…
#> 4 Kotido 36 333 26 7 28,30 34.05993…
#> # ℹ 11 more variables: radius <chr>, span <chr>, children <int>, n_cases <int>,
#> # expected_cases <dbl>, observedExpected <dbl>, relative_risk <dbl>,
#> # `%_cases_in_area` <dbl>, log_lik_ratio <dbl>, pvalue <dbl>, ipc_amn <chr>
Multiple-area data set
We use a dataset containing multiple areas. In our example dataset,
there are several districts, so we want to run the spatial scan on a
district-wise basis. To do this, we set the filename parameter to NULL,
then set .by_area = TRUE
, and finally specify the column in
our dataset that stores the areas—in this case, district.
For this demo, we will filter out just two districts.
## Filter out two districts -----
ds <- subset(a, district == "Kotido" | district == "Abim")
## Set up the parameters ----
multiple_area <- ww_run_satscan(
.data = ds,
filename = NULL,
dir = directory,
sslocation = "/Applications/SaTScan.app/Contents/app",
ssbatchfilename = "satscan",
satscan_version = "10.3.2",
.by_area = TRUE,
.scan_for = "high-low-rates",
.gam_based = "wfhz",
area = district,
cleanup = TRUE,
verbose = FALSE
)
This returns the following outputs:
#> [1] "Abim.cas" "Abim.clustermap.html" "Abim.col.prj"
#> [4] "Abim.ctl" "Abim.geo" "Abim.gis.prj"
#> [7] "Abim.gis.shp" "Abim.gis.shx" "Abim.prm"
#> [10] "Kotido.cas" "Kotido.clustermap.html" "Kotido.col.prj"
#> [13] "Kotido.ctl" "Kotido.geo" "Kotido.gis.prj"
#> [16] "Kotido.gis.shp" "Kotido.gis.shx" "Kotido.prm"
We can also obtain the tidy data frame:
#> # A tibble: 8 × 19
#> survey_area nr_EAs total_children total_cases `%_cases` location_ids geo
#> <chr> <int> <int> <int> <dbl> <chr> <chr>
#> 1 Kotido 36 333 26 7 10,9 34.1…
#> 2 Kotido 36 333 26 7 2,3,1 33.9…
#> 3 Kotido 36 333 26 7 15 34.1…
#> 4 Kotido 36 333 26 7 28,30 34.0…
#> 5 Abim 31 223 15 6 111,116,117,112… 33.6…
#> 6 Abim 31 223 15 6 135,136,137,133 33.9…
#> 7 Abim 31 223 15 6 127,129,128,130 33.8…
#> 8 Abim 31 223 15 6 126,143,131 33.7…
#> # ℹ 12 more variables: radius <chr>, span <chr>, children <int>, n_cases <int>,
#> # expected_cases <dbl>, observedExpected <dbl>, relative_risk <dbl>,
#> # `%_cases_in_area` <dbl>, log_lik_ratio <dbl>, pvalue <dbl>, ipc_amn <chr>,
#> # area <chr>