Improving forecast flood maps using earth observation data

In this article, Helen Hooker, a PhD researcher at Reading University, discusses how earth observation data can identify flooded areas. Helen has proposed a new method for validating flood maps using a scale-selective approach that enables a quantitative, location specific measure of flood map accuracy.

In August 2022, devastating, widespread flooding in Pakistan displaced 33 million people. This flood highlighted the importance of earth observation (EO) data for observing flood extent, for example as seen in Figure 1, which shows an estimated inundation area of 30% in satellite images collected by the European Space Agency (ESA).

These flood maps give vital information to emergency response teams and aid operations. They also benefit flood forecasting agencies, who often use hydrometeorological models to predict the evolution of flooding extent. Flood forecasters can use observed flood maps derived from EO data to evaluate how well a forecasting system is performing and improve their predictions about how the flooding may evolve over time. EO data are particularly valuable over large areas where ground-based observations are limited in number, or are unavailable because the measurement infrastructure has been destroyed during a flood.

Figure 1: Observed flooding from Optical (Sentinel-2, left panel) and Synthetic Aperture Rader (SAR, Sentinel-1, right panel) satellite data. Middle panel shows the Normalised Difference Water Index, which highlights areas of open water. Source ESA.

Optical images from Sentinel-2 satellite data can be used to determine the flood extent by calculating the Normalised Difference Water Index (NDWI). Unfortunately, optical instruments need daylight and cannot see through cloud. But Synthetic Aperture Radar (SAR) sensors can detect flooding through cloud and at night, which means they are very useful for monitoring flood situations.

Measuring accuracy of forecast flood maps

Comparing forecast flood maps against observed, EO-derived flood maps is an important part of evaluating and improving model performance. For quantitative validation, skill scores can be calculated for an area of interest. The skill scores work by comparing each grid box or pixel in the modelled flood map with the corresponding satellite image. By applying a contingency table (a way of tabulating the frequency with which locations are correctly or incorrectly predicted), an overall measure of accuracy can be calculated.

The skill scores will vary depending on the size of the flood, the size of the model grid box (spatial scale) and the accuracy of the forecast flood map. Figure 2 demonstrates how a smaller spatial grid scale in the data affects a commonly applied skill score (the CSI or Threat Score).

Where the forecast flood edge misses by one grid box (see purple arrow), the overall CSI value is much lower for the more detailed, 25-by-25 m grid scale, even though the distance between the forecast and observed inundation areas is smaller than in the coarser, 50-by-50 m mapping. This can make interpreting the skill score difficult, particularly when it is calculated as an average score across a region.

Figure 2: Impact of grid size or spatial scale on binary skill scores.

Using skillful scale to validate flood maps

To overcome these issues, we have recently published a paper where we suggested validating the flood map using a scale-selective approach. This means computing a skilful scale (grid box size) rather than a skill score. The skilful scale can also be calculated for the flood edge location. This can then be interpreted as the average physical distance between the edge of the flooded areas observed in the EO map and the edges in the modelled flood map. We can also calculate a skilful scale at each location across the area to show locally how the flood map performs. The skilful scales can be plotted on a Categorical Scale Map (CSM), which colour-codes each location according to the degree or under- or over-prediction as explained below.

These evaluation methods have been applied in a spatial flood forecasting project with the Start Network to establish Disaster Risk Financing schemes in northern Bangladesh. Figure 3 shows an example CSM. Correctly predicted flooding is shown in grey, under-prediction or ‘missed’ areas are red and over-prediction or ‘false alarm’ areas are blue. The shading relates to the skilful scale calculated for each grid cell. Lighter shades show a smaller skilful scale and a closer agreement. This evaluation led to updates in the flood forecasting system which improved the agreement between the forecast flood maps and the EO flood maps. This improvement means that the population impacted by flooding can be more accurately predicted for future flood events.


Figure 3: Categorical Scale Maps (CSM) for predictions of flood extent compared to EO observed flood extents, tested with data from Bangladesh in July 2020.

In summary, the CSMs give additional benefits to conventional binary performance measures as they show a quantitative, location specific measure of flood map accuracy that can be used to target specific model improvements such as updating the ground height data (digital terrain model) used in the forecast, or improving the calibration of parameters within the flood forecasting model that produces the flood map.


Helen Hooker, University of Reading:

Rob Lamb, JBA Trust:

John Bevington, JBA Consulting:


Hooker, H., Dance, S. L., Mason, D. C., Bevington, J., & Shelton, K. (2022). Spatial scale evaluation of forecast flood inundation maps. Journal of Hydrology, 128170.

Hooker, H., Dance, S. L., Mason, D. C., Bevington, J., & Shelton, K. (2022). A new skill score for ensemble flood maps: assessing spatial spread-skill with remote sensing observations. Natural Hazards and Earth System Sciences Discussions, 2022, 1–27.


You can find out more about all our PhD projects here.