## Abstract

The catastrophic surge event of 1953 on the eastern UK and northern European coastlines led to widespread agreement on the necessity of a coordinated response to understand the risk of future oceanographic flood events and, so far as possible, to afford protection against such events. One element of this response was better use of historical data and scientific knowledge in assessing flood risk. The timing of the event also coincided roughly with the birth of extreme value theory as a statistical discipline for measuring risks of extreme events, and over the last 50 years, as techniques have been developed and refined, various attempts have been made to improve the precision of flood risk assessment around the UK coastline. In part, this article provides a review of such developments. Our broader aim, however, is to show how modern statistical modelling techniques, allied with the tools of extreme value theory and knowledge of sea-dynamic physics, can lead to further improvements in flood risk assessment. Our long-term goal is a coherent spatial model that exploits spatial smoothness in the surge process characteristics and we outline the details of such a model. The analysis of the present article, however, is restricted to a site-by-site analysis of high-tide surges. Nonetheless, we argue that the Bayesian methodology adopted for such analysis enables a risk-based interpretation of results that is most natural in this setting, and preferable to inferences that are available from more conventional analyses.

## 1. Introduction

The flood events of 1953 brought into sharp focus the urgent need for a combined scientific effort to quantify the flood risks from future events. What actions should be taken in light of such risk assessment is an ongoing debate, but a consensus has been maintained as to the importance of obtaining maps of flood risk that are as accurate as possible. Inevitably, the nature of such risk events means that some form of analysis based on the likelihood of future extreme sea surges is central to flood contingency planning.

At the time of the 1953 flood event, extreme value analysis was in its infancy as a statistical discipline. The basic mathematical limit results on which the subject rests had been known for some time, but little had been done in terms of converting these theoretical results into working statistical models. This was about to change, however, as later in the same decade E. J. Gumbel and J. F. Jenkinson laid the foundations for modern extreme value theory. At the heart of this theory is the following well-versed argument. Let *X*_{1}, *X*_{2}, … be a series of independent and identically distributed random variables and let *M*_{n}=max{*X*_{1}, …, *X*_{n}}. Then, based on detailed limiting arguments as *n*→∞, and under quite broad conditions, the distribution function of *M*_{n} can be approximated by a member of the generalized extreme value (GEV) family of distribution functions(1.1)where *x*_{+}=max(*x*, 0) and the parameters *μ*, *σ* (>0) and *ξ* depend on the parent distribution of the *X*_{i}. The case *ξ*=0 is interpreted by taking the limit *ξ*→0 in equation (1.1). It is useful to note that when *ξ*<0 this definition implies an upper bound on the distribution of *M*_{n}, whereas when *ξ*≥0 the distribution upper limit is infinite.

A typical application of equation (1.1) is to assume its validity for a variable such as an annual maximum sea-level, for instance, *Z*, and to estimate the parameters (*μ*, *σ*, *ξ*) of the model by fitting to an observed series of such annual maxima. Results are often displayed and interpreted in terms of extreme quantiles of this distribution. In standard jargon the quantity *Z*_{p}, satisfying(1.2)is termed the 1/*p* year return level. By inversion of equation (1.1),(1.3)

On average, the level *z*_{p} is expected to be exceeded by the annual maximum sea-level one year in every 1/*p* year, so, for example, *z*_{0.02} is the level expected to be exceeded one year in every 50 years. A more precise interpretation is that *z*_{p} is the level which will be exceeded in any one year with probability *p*. Substituting parameters with their estimates in equation (1.3) and plotting *z*_{p} against −log{−log(1−*p*)}, or equivalently *z*_{p} against on a logarithmic scale, leads to a so-called return level plot. For small *p*, *r*_{p}≈1/*p*, so that *r*_{p} is essentially the return period. The return level plot has the effect of emphasizing the distributional tail for easy examination of model fit and extrapolation, and is linear in the limiting case of *ξ*=0. Confidence intervals can be added to the plot based on either an asymptotic or Monte Carlo approximation of the sampling variation of the return level estimates.

The importance of taking an objective and scientifically credible view of extreme risk assessment is not to be understated, and Gumbel, Jenkinson and others should truly be recognized for the pioneers they were. However, there are two lines of objection, each of which points to limitations in the details of the GEV analysis explained above for the analysis of extreme sea-levels at a single site. The first is that the sea-level process has complex dynamics which make the naive assumptions leading to the GEV family for annual maxima unrealistic. In particular, the sea-level is a complex mix of part stochastic (storm-driven) and part deterministic (tide-driven) components. Moreover, the surge component is seasonal in its behaviour and there is the possibility of tide-surge interaction in regions of shallow water. Thus, both the hourly sea-level and surge series violate the usual assumptions of independence and, more critically, time-homogeneity in distribution. Even without these complications however, the second objection is that on statistical grounds alone the GEV analysis is wanting. The main argument here is that sea-level data are collected on a regular basis, generally hourly, while the GEV analysis utilises only the annual maximum data; all other data having been dropped from the analysis. Consequently, other extreme sea-levels, some of which are likely to have been more extreme than annual maxima from other years that are included in the analysis, do not contribute to the inference. Whatever the mathematical justification for the GEV model, this suggests a statistical inefficiency in the use of available information.

The aim of this article is partly review; to set out the various developments in extreme value modelling that have taken place over the last 50 years and that have led to improvements in the way that risk assessment of sea-level flooding is carried out. Some of these developments relate to the theoretical characterization of extremal processes, while others exploit the specific characteristics of sea-level dynamics. In combination they provide a comprehensive suite of techniques that have the potential for much more precise risk assessment calculations than are achievable by a standard GEV annual maximum analysis. These arguments are set out in §2. Our further aim is to provide a new analysis of the England east coast sea-level data. In particular, we use Bayesian techniques to model the surge process at 11 sites along this coastline. There has been a recent resurgence in the use of Bayesian techniques within the general statistical community, and there are good reasons for setting extreme value risk assessments within a Bayesian framework, though there have been relatively few published works that do so. In §3 we develop the argument, focusing in particular on the importance of predictive inference in extreme value risk assessment and the facility a Bayesian framework provides for making such inferences. In §4 we illustrate how the Bayesian approach can be extended to handle more complex characteristics such as seasonality.

In §5 we consider spatial modelling. The analysis of data from a single site, however it is performed, ignores the natural spatial cohesion of the sea-level process. Options for exploiting this aspect include the use of hydrodynamical or spatial statistical models. Hydrodynamical models are basically transfer models that convert observed meteorological space-time-series into sea-level series on a spatial grid using established hydrostatic laws. Though they are remarkably accurate for many purposes, they tend to perform less reliably at high levels, leaving their utility for extreme value analyses questionable. Spatial statistical models, in contrast, have the potential to be adapted specifically to the unique features of an extreme value analysis. The potential advantages of a spatial model for extreme sea-levels are that information can transfer between sites leading to potentially improved inferences, and model estimates can also be extrapolated to intermediate sites. There are practical complications, however, as data records are temporally incomplete and the spatial coverage is poor. We examined these aspects in a previous invited paper to the Royal Society (Coles & Tawn 1990), but the model we proposed was limited in various ways. Now, due to powerful recent developments in computational techniques for statistical inference, large and complex statistical models can be built with relative ease. We have ideas on how to exploit these techniques for extreme surge risk assessment, and although our work on this aspect is at an early stage, in §5 we make some comments on the model structure and its potential for the inclusion of information from hydrodynamical model output.

## 2. Historical review

In this section we give an overview of the major developments in extreme value modelling since the 1953 flood that have led to improved methods for measuring risk assessment. Some of these improvements are specific to sea-level modelling, deriving from a better understanding of sea-level dynamics. Others are more general, relating to improvements in the characterization and statistical inference of extremes of wide classes of stochastic processes.

As already explained, the foundations for extreme value modelling as a credible statistical discipline appropriate for extreme risk measure were laid in the 1950s; Gumbel (1958) remains an essential monologue. Based on these ideas, Lennon (1963) and Suthons (1963) mapped return levels of the sea-level process around different parts of the UK coastline. These studies were revolutionary in providing systematic spatial maps of flood risk measures, but were limited both by data availability and the range of statistical models and inference techniques that were then available. The next major study of UK sea-levels was made by Graff (1981), who produced a detailed comprehensive mapping of return levels of the sea-level around the entire UK coastline using data obtained by systematic collection of archived historical data and by exploiting the expansion in the number of tide-gauges that had been installed following the 1953 event. These results revealed patterns of spatial smoothness in extreme risk characteristics, though the possibility to interpolate results to other sites was hindered by the fact that measures of estimation precision were not obtained. As in the earlier studies, the accuracy of inferences was also limited by the restriction to annual maxima data. This was not so much due to problems of data availability, but because of a limitation in the statistical models and techniques for modelling extremes available at that time.

In terms of the theory of extremes, the most relevant developments were in terms of the characterization of extreme events. Two contrasting models were developed: the first based on a limiting approximation to the distribution of events exceeding a high threshold (Pickands 1971, 1975), the second based on the limiting joint distribution of the several largest observations in a sequence, rather than just the single largest (Smith 1986; Tawn 1988*a*,*b*). Both of these characterizations enable an inference on extremes that exploits more of the relevant available information than the annual maxima. In a landmark move, the hydrological community adopted threshold-based analyses as standard working practice in the mid 1970s (NERC 1975). Also in the 1970 and 1980s, powerful theoretical results were derived that greatly enhanced the reputability of extreme value modelling for real data processes. In broad terms it was shown that the assumption of independence in the observations *X*_{1}, *X*_{2}, … for the GEV to be appropriate as the approximating distribution of *M*_{n} is far from necessary. Provided long-range dependence is weak in so much as it affects extreme process levels, the GEV approximation is likely to remain valid. Full mathematical details of these results can be found in Leadbetter *et al*. (1983).

The theoretical developments in extreme value characterization were complemented by advances in statistical techniques that enabled improved inferences to be drawn from such models when fitted to data. The studies by Prescott & Walden (1980, 1983) represent important milestones. They explored the possibility of using likelihood-based techniques in preference to the more naive methods of parameter estimation that had previously been used, so enabling extreme value modellers to draw from the wealth of results and techniques that had been developed more widely in the statistical community. In particular, off-the-shelf measures of estimation precision were immediately available, together with the opportunity of extending models to allow for non-stationary effects such as trends (Walden & Prescott 1983). Many of the remaining mathematical details concerning the validity of standard likelihood theory results for extreme value models were explained by Smith (1985). Techniques based on threshold-type models, but allowing covariate effects to handle non-stationarity, were developed by Smith (1989) and Davison & Smith (1990). Case studies on oceanographic datasets include Tawn (1988*a*) and de Haan (1990) for the UK and Dutch coastlines, respectively.

A further recent development of considerable importance has been the study of multivariate extreme value models. Information is at a premium in any extreme value analysis, and multivariate models have the advantage that they enable a sharing of information between sites. Statistical modelling techniques based on the multivariate analogues of the block maxima and threshold methods were set out by Tawn (1988*b*) and Coles & Tawn (1991), respectively. These developments led to a series of attempts, with increasing levels of refinement, to produce a map of spatial risk due to extreme sea-levels around the UK coastline (Coles & Tawn 1990; Dixon & Tawn 1992; Tawn 1993). The main features of these models, based in each case on the annual maxima data collated by Graff (1981), were: smoothly varying model parameters, allowance for trends, spatial dependence in extreme events, identification of the role of tidal distribution in extreme sea-level distribution and a study of the impact of climate change. The information transfer through such a model enables, in particular, much more precise estimation at sites with short data records than would otherwise have been available, and a greater reliability in the estimated effect of sea-level rise, after allowance for the known effects of land movements. Spatial analysis also facilitates the identification of outliers and enables a correction for biased sampling periods (Barão & Tawn 1999). The modelling issues related to both multivariate and spatial models for extremes are discussed by Coles (2001). Other advances include techniques for smoothing extreme value models and for the characterization of dependence in extreme value models, while the free availability of relevant software has made accessible the statistical modelling of extremes to the wider scientific community.

Returning to the specific context of sea-level extremes, the joint probability method developed by Pugh & Vassie (1979, 1980) deserves particular mention. Pugh & Vassie were aware of the inefficiencies in applying standard extreme value techniques to sea-levels as a consequence of its decomposition into deterministic tide and stochastic surge components. Their proposal comprised a separate analysis of the two components, followed by a convolution to obtain the probability distribution of the sum. This was revolutionary, and although the original version suffered a number of deficiencies—extreme surges were modelled empirically and temporal dependence was ignored—later refinements by Tawn & Vassie (1989, 1990) and Tawn (1992) resulted in an approach that combined the idea of process decomposition with the latest techniques for efficient modelling of extremes of stationary processes. These refinements led to a resolution of many of the apparent inconsistencies that were obtained by comparing results from the original joint probability method with those of conventional extreme value analyses (e.g. Alcock & Blackman 1985).

Many of the more recent improvements for the specific modelling of sea-level extremes have stemmed from a MAFF-funded project jointly undertaken by research teams at the Proudman Oceanographic Laboratory and Lancaster University. This project was broadly aimed at improving techniques for the spatial mapping of extreme value parameters of sea-levels around the UK coastline by embedding the revised joint probability method within a spatial modelling framework. There were three major outputs from this study. First, Dixon & Tawn (1999) showed that extreme value analyses based directly on sea-levels, rather than on a decomposed tide and surge series, tend to underestimate return levels for return periods of 20 years or more when surge variation is small in comparison with variation in high water levels. This effect is negligible for analyses on the east coast of the UK, but substantial for the west coast. Second, Dixon *et al*. (1998) applied a weighted local spatial smoother to extreme surge characteristics, including measures of tide-surge interaction, to improve estimation within the revised joint probability method and to map estimates to other locations along the UK east coast. This enabled estimation of risk measures at all locations for which tidal characteristics were available. Finally, Dixon & Tawn (1997) improved the spatial interpolation of surge characteristics by calibration with estimates generated from fine-grid hydrodynamical data (Flather 1987).

## 3. Bayesian analysis of extreme surges

Our review in the previous section is not quite complete. An essential further development has been the gradual introduction of Bayesian techniques. In different areas of statistics the Bayesian paradigm for inference is adopted for different reasons. Within an extreme value setting we see two major advantages for adopting a Bayesian framework: first, it provides algebra for the management of uncertainties which, in particular, leads to probabilistic representations for return level estimates that account for model estimation; second, it enables modular construction of complex models that would otherwise be intractable for inference. In §5 we discuss some work in progress that exploits the second of these opportunities to improve on the spatial models discussed in §2. In this section we limit ourselves to site-by-site Bayesian analyses of surge data.

The fundamental components of a Bayesian analysis are a prior distribution *f*(*θ*) and a likelihood , where denotes historical data. Bayes' theorem provides the mechanism for balancing these two sources of information in a coherent way, leading to the posterior distribution . Simply,(3.1)See O'Hagan (1994) for example. In our setting, will be some subset of an observed dataset that contains all of the relevant information on extreme events and is some version of an extreme value model, where *θ*=(*μ*, *σ*, *ξ*) are the parameters of the annual maximum GEV distribution. The technical complication in a Bayesian analysis is the evaluation of the proportionality factor in equation (3.1), since this requires integration over the space of the parameter *θ*. Though this can be tackled in various ways, it has been the recent advances in simulation-based techniques, most notably the methodology of Markov chain Monte Carlo, that has revolutionized the subject, enabling the statistical inference of models that would otherwise be intractable. The output of a Markov chain Monte Carlo algorithm is a simulated series *θ*_{1}, …, *θ*_{m} which, though not independent, is approximately stationary with marginal distribution equal to the target posterior distribution . Consequently, empirical summaries of the {*θ*_{i}} series provide approximate posterior inferences. See Gilks *et al*. (1996) for a much more detailed explanation.

The first serious attempt at a Bayesian extreme value analysis was by Smith & Naylor (1987). Subsequently, Coles & Powell (1996) and Coles & Tawn (1996) showed how Markov chain Monte Carlo algorithms could be applied to obtain a Bayesian inference in an extreme value setting.

In this section we set out the advantages of adopting a Bayesian inference of extreme sea-levels, particularly with reference to return levels. We develop the arguments with a novel application, modelling extreme surges at each of 11 sites on the eastern UK coastline, from Wick in the north of Scotland to Dover on the southeast coast of England. At each of these sites, hourly records of tide and sea-levels are available for some subset of the period 1960–1991. Though data are also available post-1991, we restrict attention to the earlier period to enable comparisons with previous results. As discussed in §2, it may be inappropriate to fit extreme value models directly to sea-levels because of the deterministic effect of the tidal component. On the other hand, tide-surge interaction complicates the interpretation of models fitted directly to surges. To offset these difficulties our analysis is based on the series of surges that occur at high tide and of which there are 705 events annually. The effects are twofold. First, temporal dependence in the series is reduced, since our observations are approximately twice-daily rather than hourly; second, the impact of surge-tide interaction is eliminated, with analysis based on the most damaging surge events, even if they are not the most extreme.

To maximize the information potential in the data we use a point process version of the threshold exceedance model of Smith (1989) discussed in §2. To avoid technicalities we omit giving full details of the model, which is based on a limiting Poisson process result for threshold exceedance behaviour. This model is now standard within the extreme value literature; for example, see Coles (2001, ch. 7) for a complete description. The model is based on a characterization for the pattern of exceedances of a high threshold, and has the advantage that the natural parametrization of the model is in terms of the GEV parameters of the corresponding annual maximum distribution. In the context of Bayes' theorem, equation (3.1), the model provides a likelihood , for which the relevant data in is more extensive than that of a classical annual maximum analysis. The use of the extra information has the scope to reduce estimation uncertainty while retaining *θ*=(*μ*, *σ*, *ξ*), as the model parameters leads to an easy interpretability of results.

Subsequently, we apply the point process model to each of the 11 sites, but for exposition it is convenient to focus first on a single site. The high-tide surge series at Lowestoft is shown in figure 1. Two features are especially evident: first, a block of missing data in 1984 (and also in 1991); second, an apparent seasonality in the series. The missing period of data is dropped from the analysis and, temporarily, we also ignore the seasonality, though this is as an issue we return to later. We are left with around 20 years of data, a record length that would be extensive for many statistical applications, but which is relatively short for extreme value analysis. This highlights the importance of accruing information through a threshold exceedance model rather than relying on an annual maximum analysis. The data record limitations are also partly offset by the fact that sea-level extrapolations are obtained by a convolution of the tidal and surge distributions, so that shorter extrapolation of the surge process is needed compared with what might be required in a standard extreme value analysis of this process. The problem could be offset further by the development of a fully spatial model, enabling a pooling of information across different locations. This is likely to be especially helpful when individual sites have blocks of missing data.

Threshold choice is a thorny issue that amounts to a bias-variance trade-off. The asymptotic basis of the point process model implies a lower bias with high thresholds, but very high thresholds imply few exceedances and, therefore, large sampling variability. Conversely, low thresholds generate many exceedances but bias is incurred by using a model that has only an asymptotic justification at low levels. In practice, it is usual to use as low a threshold as possible for which the exceedances seem consistent with the point process limit model. The mean residual life plot (Davison & Smith 1990; Smith 2004) is a useful diagnostic to identify such a threshold, though sensitivity of results to threshold choice should also be checked. Mean residual life plots of the high-tide surge data suggested we could reasonably use a constant threshold exceedance rate of 0.018 at each site. At Lowestoft, for example, this criterion led to a threshold of 0.6 m, of which there were 274 exceedances in the available series.

The principal objection to a Bayesian analysis is that results may be sensitive to the specific choice of prior distribution *f* (*θ*), which at best is chosen subjectively and at worst arbitrarily. In our experience, however, Bayesian analyses of extremes are remarkably robust across a wide range of non-informative priors, that is, proper distributions with large variance. For each of our analyses of extreme surges we have used(3.2)

Inference proceeds by calculation of the posterior distribution in equation (3.1) via a suitable Markov chain Monte Carlo algorithm. All of our results are virtually unchanged under order of magnitude changes to the variance specifications in equation (3.2), confirming the robustness to prior specification. In figure 2 we show the marginal posterior distributions for *μ*, *σ*, *ξ* and also, by transformation, the 50 year return level for the high-tide surge series at Lowestoft. Two features are especially notable: the relatively large variance in the posterior for *ξ* and the heavy skew in the distribution of the 50 year return level. The former is a consequence of the limited amount of data, while the latter reflects the greater uncertainty within the model for establishing upper limits of the return levels relative to lower limits.

The Bayesian calculation of a probability distribution for the parameter vector *θ* contrasts with classical analyses in which it is usual to calculate a point estimate, perhaps complemented by standard errors or confidence intervals. For example, the maximum likelihood estimator is the value of *θ* that maximizes with respect to *θ*. If a single estimate is required from the Bayesian inference, the mean, median or mode of the posterior distribution can be used. In particular, if the mode is used and the prior is flat as in equation (3.2) then the Bayesian estimate will be virtually coincident with the maximum likelihood estimate. It follows that in the absence of external information with which to construct an informative prior distribution—though there are situations where this can be achieved; for example, see Coles & Tawn (1996)—the distinctions between a classical and a Bayesian point estimator are slight, both conceptually and numerically.

The real distinction between classical and Bayesian analyses comes in the management of uncertainty. In any statistical analysis, measures of variability are generally as important as estimates themselves. This is especially true in an extreme value analysis since, for example, estimates of extreme quantiles have an unavoidably large sampling variability, which it is important to be honest about, and to take account of, in risk assessment. One possibility is to add confidence intervals to standard point estimates of, for example, return levels. However, there are difficulties of interpretation in such a presentation of results. We contend that a Bayesian framework is more natural for handling uncertainties in extreme value models, partly because inferences on model parameters are presented explicitly as a probability distribution, and partly because it is natural to recast the issue of return levels in predictive terms of a future annual maximum, *Z*. The predictive distribution function is defined by(3.3)thereby averaging the GEV distribution for *Z* across the posterior distribution for the model parameters *θ*=(*μ*, *σ*, *ξ*). Accordingly, both model and process variability are incorporated, the former through the posterior term , and the latter through the conditional annual maximum GEV distribution function *G*(*z*_{p}; *θ*). By analogy with equation (1.2) we now define the quantity , which satisfies(3.4)as the predictive return level corresponding to a return period of 1/*p* years. Now the terminology of return level and return period is even looser, but it is useful in order to maintain comparison with the standard analogue. However, it is more precise to regard *z*_{p} and as appropriate quantiles of the estimative and predictive versions of the annual maximum distribution, respectively. Our contention is that the predictive return level has an easier interpretation for decision makers. Compare, for example, the following two representative statements: (i) our best estimate of the level that will be exceeded once every 50 years is 1.4 m, and a 95% confidence interval for this level is [1.1,1.7] and (ii) given all the available information, there is a *p*=0.02 that next year's maximum surge will exceed 1.55 m. The numbers here are fictitious, but it seems clear to us that the latter formulation, which corresponds to the predictive return level, provides the clearer information.

The contrast between standard and predictive return level estimates for the high-tide surge process for Lowestoft can be seen in figure 3. The predictive return levels are uniformly greater as they include a component of variability due to parameter uncertainty. However, at low levels there is little difference between the two, and both are in reasonable agreement with the empirical data. On extrapolation the parameter uncertainty has an increasingly large effect on return level calculation, and the differences between the two become exaggerated. These differences appear slighter once confidence intervals of the standard estimates are considered. Taking the view that the predictive return level is the correct way to combine the uncertainties due to estimation variability and process variability, the confidence intervals of the maximum likelihood estimator seem to provide a poor proxy for this calculation. In this particular analysis, designing to the upper 95% confidence interval limit would lead to a considerably over-conservative estimate if, for example, an annual flood risk of *p*=0.001 were countenanced.

The use of confidence intervals for handling uncertainty looks even more unreliable when the entire set of 11 sites is considered. The results can be viewed in various ways. We start with figure 4, which summarizes the posterior distribution of each parameter, site-by-site, by the median and central 95% credibility interval—a central region of 95% probability of the associated posterior distribution. After allowance for estimation uncertainty, there is a degree of spatial smoothness in each of the parameters. In particular, the location parameter *μ* grows steadily down the coast as far as Lowestoft. Around Harwich there is a dip, and then a decline down to Dover. The scale parameter *σ* behaves similarly, while the shape parameter *ξ* stays close to zero throughout.

Figure 5 shows predictive and standard return level estimates by site, with 95% confidence intervals in the latter case. Both sets of estimates are reasonably constant from Wick down to North Shields. Moving further south, levels rise consistently as far as Lowestoft, followed by a gradual decline as far as Dover and an additional drop around Harwich. These results are broadly consistent with previous spatial mappings of extreme surge behaviour that take no account of the tide: to the north of Immingham results are almost identical, but they are notably lower further south due to tide-surge interaction, which limits the simultaneous occurrence of large surges and high tides in this region. The comparison between predictive and standard return level estimates is generally more complicated than the Lowestoft analysis suggested. The predictive estimates are again uniformly greater than the standard estimates as a consequence of the parameter uncertainty that they account for. However, the differences are sometimes slight, but sometimes large, especially for a return period of 1000 years. Indeed, the predictive estimates for some sites lie outside of the 95% confidence interval of the standard estimate. This reaffirms the difficulty in the interpretation of standard estimates and their confidence intervals.

Our conclusions for this section are that, being probability based, a Bayesian inference provides the most coherent and informative method for estimating extreme value behaviour. The computations are viable and lead to point estimates that are generally comparable with maximum likelihood estimates. Framing return level questions in predictive terms is the most natural for decision making, while alternative approaches based on confidence intervals of classical return level estimates can be misleading due to their lack of a formal probabilistic interpretation. Software to carry analyses of this type is now freely available (Stephenson 2002). Bayesian models are also proving to be useful for tackling extreme value problems of greater complexity. For example, Casson & Coles (1999) propose a Bayesian analysis of spatial extremes, while Smith & Goodman (2000), in an analysis subsequently extended by Bottolo *et al*. (2001), use a Bayesian hierarchical model formulation to combine information across risk types in an analysis of extreme insurance losses. It seems likely, moreover, that the opportunities opened up by a Bayesian formulation of extreme value problems will lead to many more developments in the near future.

## 4. Seasonality

We have already commented that a weakness in the threshold model adopted in §3 is the assumption of stationarity in the surge process. This assumption has also been the norm in previous analyses of extremes, though it seems to be clearly at odds with the annual cycle evident in simple plots of data as figure 1. Likelihood-based methods, including Bayesian inference, offer the opportunity to include aspects such as seasonality into the model, to assess formally the importance of such components and to calculate corresponding risk measures.

One way to account for an annual cycle is the following. Assume that on day *t* there are parameters *μ*(*t*), *σ*(*t*) and *ξ*(*t*) such that if the process were homogeneous through time, the annual maximum distribution would be GEV with parameters (*μ*(*t*), *σ*(*t*), *ξ*(*t*)). Now assume that each of *μ*(*t*), *σ*(*t*) and *ξ*(*t*) is a periodic function in *t*, with a period of 1 year. In our modelling we have taken(4.1)(4.2)(4.3)corresponding to a single harmonic in *μ*(*t*) and *σ*(*t*). The exponential transform in equation (4.2) ensures that *σ*(*t*) is positive for all parameter values and time points. In principle, the function *ξ*(*t*) could also be made harmonic, though it is rare that data have sufficient structure for such a form to be identifiable in practice. It is straightforward to extend the point process threshold excess model to this seasonal structure, leading to a likelihood function where nowHowever, the issue of threshold specification is even more complicated in this case. In our analysis we adopt a time-variable threshold, determined via pre-analysis smoothing, to give an approximately constant threshold exceedance rate. Figure 6 plots the Lowestoft data in order of within-year occurrence, ignoring year of occurrence. Superimposed is the time-varying threshold that we have determined to have a near uniform crossing rate, which seems like a sensible criterion bearing in mind the bias-variance trade-off issue.

We continue to present results for Lowestoft, for which the results are typical of those obtained at other locations. The posterior marginal distributions for *μ*_{1}, *μ*_{2}, *σ*_{1} and *σ*_{2} are shown in figure 7. To simplify interpretation, parameters *μ*_{2} and *σ*_{2} have been rescaled to daily units, with day 0 corresponding to the boundary between 31 December and 1 January. The posterior distributions for each of *μ*_{1} and *σ*_{1} are concentrated away from zero, implying that the evidence for a seasonal effect in both *μ*(*t*) and *σ*(*t*) is strong. The strength of evidence for the seasonal model is confirmed further by a comparison of maximum log-likelihoods: 603.3 for the seasonal model compared with 498.2 for the stationary model, a difference that would be overwhelmingly significant by any standard log-likelihood ratio test for the validity of a model with four additional parameters. Finally, considering *μ*_{2} and *σ*_{2}, both *μ*(*t*) and *σ*(*t*) are likely to peak in late December or January, though the peak for *σ*(*t*) could plausibly extend into February.

The seasonal model can again be judged in terms of return levels. Assuming independence of extreme surges at high tide, predictive return levels can be calculated exactly as in equation (3.3), but now with(4.4)where *G*(*z*_{p}; *θ*_{j}) is the distribution function of the GEV distribution, with *θ*_{j}=(*μ*_{j}, *σ*_{j}, *ξ*_{j}) being the parameter vector for the high-tide surge *j* determined from *ϕ* according to equations (4.1)–(4.3). As before, calculation of the integral in equation (3.3) is facilitated by averaging equation (4.4) across the Markov chain Monte Carlo output, which now consists of a series {*ϕ*_{i}} having marginal distribution that corresponds to the posterior distribution of *ϕ*. A comparison of the predictive return levels for the Lowestoft data with the corresponding standard and predictive estimates for the non-seasonal model is made in figure 8. In this case, the predictive return levels are found to be slightly increased relative to the previous analysis, emphasizing the importance of taking the possible presence of seasonality into account in design considerations.

## 5. Spatial modelling

We conclude with some remarks about the utility and viability of an integrated spatial model for the high-tide surge series, and how such a model might be combined with outputs from hydrodynamical models to improve spatial interpolation.

As mentioned in §2, Coles & Tawn (1990) suggested a spatially Markov model for annual maximum sea-levels, with marginal parameters having a simple regression type structure, modified to account for tidal effects, and joint distributions of successive pairs of sites having the bivariate extreme value logistic distribution (Coles 2001, for example). The benefits of a spatial model include the ability to interpolate and an improved inferential precision due to information transfer, especially for sites with a scarcity of data. However, there are a number of limitations with the original model: data are restricted to annual maxima sea-levels, there is a model inconsistency when data are absent, the assumption of simple polynomial forms for the spatial variation in parameters is unrealistically restrictive and uncertainty in dependence structure is not accounted for in uncertainty measures for marginal characteristics.

We have quite precise ideas on how to overcome these limitations, but our work is at an early stage and it is not yet clear whether the complexity of the model structure we have in mind is feasible for computation or not. In substance, the model will remain spatially Markov, but for all large surge events at the times of high tides rather than annual maxima. This dependence structure is motivated by exploratory findings by Ancona-Navarrete & Tawn (2003) and is easily facilitated by a switch to the point process representation discussed in §3. To simplify likelihood calculations, which become complicated in the presence of missing data, a latent variable representation of the bivariate logistic distribution can be exploited within a Markov chain Monte Carlo setup for the model (Tawn 1990). Spatial smoothing on model parameters can also be incorporated as part of the modelling via a range of local smoothing approaches which have been developed for extreme values (Davison & Ramesh 1998; Hall & Tajvidi 2000). We propose, in particular, to use dynamic representatives of non-Gaussian filters, estimating marginal and dependence parameters simultaneously so that all sources of parameter estimation are accounted for in predictive return level calculations.

Once fully implemented, the proposed spatial model will overcome many of the deficiencies of the Coles & Tawn (1990) model. A residual difficulty, however, is that of resolution. Changes in the surge process with spatial location occur for a variety of reasons: for example, due to variations in bathymetry, coastal exposure or climate pattern. Some of these features vary only slightly, even over large distances, while others vary erratically over short distances. This implies that naive spatial smoothing may have some benefit in terms of information-transfer, but is likely to be partially confounded by short-scale and irregular process change. These notions were implicit in the Coles & Tawn (1990) model as spatial smoothing was applied only to the scale and shape parameters, which have the greatest role in extrapolation, while allowing the location parameter to absorb short-scale spatial variation in the surge process. A similar strategy can also be adopted with the Bayesian smoothing model, but interpolating results to other sites remains a difficulty because of the effect of irregular and erratic process change. To resolve this difficulty we intend to exploit hydrodynamical models, which generally respond well to local bathymetric characteristics. The difficulty is obtaining a calibration of such models with process extremes. Dixon & Tawn (1997) proposed a calibration of this type, applied separately for each of the extreme value parameters of the surge process, finding the calibration errors to change smoothly over the data sites. Focusing on high-tide surges is likely to further simplify this calibration process. Moreover, integration of the spatial smoothing and hydrodynamical models will ensure that calibration uncertainty is accounted for when interpolating predictive return levels to sites with no data.

As a final comment, we mention two additional benefits of a Bayesian spatial model that we see as having wider relevance. First, such models can immediately be extended to include a range of climate trend scenarios as potential covariates. Moreover, uncertainties in adopting climate scenarios that are consistent with observed data can be absorbed and reflected in the Bayesian inference. Second, having options to interpolate, the model provides a natural basis for calculating coastal flood loss distribution estimates that are typically required by the insurance industry. Such applications are described more fully by Muir-Wood *et al*. (2005).

## Acknowledgments

We are grateful to an anonymous referee for many helpful comments on a preliminary version of this article. Our work was supported in part by grants connected to the projects ‘Statistics as an aid for environmental decisions: identification, monitoring and evaluation’ (2002134337) funded by the Italian Ministry for Education and ‘Methods for the analysis of extreme sea-levels and for coastal erosion’ (CPDA037217) funded by the University of Padova.

## Footnotes

One contribution of 14 to a Theme ‘The Big Flood: North Sea storm surge’.

- © 2005 The Royal Society