## Abstract

Predicting or measuring the output of complex systems is an important and challenging part of many areas of science. If multiple observations are required for parameter studies and optimization, accurate, computationally intensive predictions or expensive experiments are intractable. This paper looks at the use of Gaussian-process-based correlations to correct simple computer models with sparse data from physical experiments or more complex computer models. In essence, physics-based computer codes and experiments are replaced by fast problem-specific statistics-based codes. Two aerodynamic design examples are presented. First, a cheap two-dimensional potential-flow solver is calibrated to represent the flow over the wing of an unmanned air vehicle. The rear wing of a racing car is then optimized using rear-wing simulations calibrated to include the effects of the flow over the whole car.

## 1. Statistical-model-based simulation

This paper is concerned with the problem of determining the effects of various parameters on the output of a system. This is a problem encountered in many, perhaps all, areas of science and lies at the heart of any optimization process. Systems for which effects are to be determined may be extremely complex and depend upon many parameters, for example, the human body. Scientists endeavour to capture a systems behaviour via physical and computer experiments. If experiments are quick and cheap to run, then wide-ranging parameter studies or lengthy optimization processes can be performed. However, cheap experiments are often unable to capture many of the complex effects that are likely to be of interest. For example, the performance of an aerodynamic device can be simulated in a fraction of a second by ignoring viscous effects. More accurate simulations may take weeks, while physical experiments require expensive models and wind-tunnel time. Parameter studies and optimization using such methods are either very expensive or cursory in nature.

Hard to obtain data can be approximated using a surrogate model that is fitted to a set of known data. Instead of relying on the results from experiments at all points of interest, we can fill in the gaps between a set of observed data by fitting a surrogate model, that is, a model that can be used in lieu of directly simulating or observing the system. Such models are, essentially, curve fits through known data and may take many forms, the most popular being polynomial regression (Box & Draper 1987). Other examples include, though are not limited to, neural networks, radial basis functions (Broomhead & Lowe 1988), support-vector regression (Smola & Schölkopf 2004) and Kriging (a more complex radial basis function method; Sacks *et al*. 1989). Once fitted to the data, a surrogate can be used in lieu of calls to computationally intensive codes or experiments to expedite design-space visualization, parameter studies, optimization, etc. The time savings afforded by using a surrogate are often at the expense of accuracy. Clearly, if a polynomial model is employed, we are assuming that the complex system’s response can be modelled as such. More complex methods, such as Kriging, which models the response as a sum of parametric Gaussian basis functions (with parameters chosen to maximize the likelihood of the data), make fewer assumptions, but may still require large amounts of data to predict the underlying result to the required accuracy. If data are expensive and so very sparse, the surrogate may be inaccurate to the point of being of little practical use. As an example of how powerful the Kriging method can be, given sufficient data, figure 1 shows the contours of the Branin function (an analytic function often used to test optimization algorithms) and a Kriging prediction based on 20 observations. The similarity between the two contour maps is remarkable; the surrogate has captured all the key features of the true Branin function. Clearly, an optimization process (or some other investigation) could save time and resources by calling on the surrogate model for data rather than the true function.

### (a) Co-Kriging

Often, simple physics-based simulations of systems are available, for example, in §2, a simple potential-flow solver will be used. We can therefore hope to obtain very accurate surrogates of the response of these codes, based on large quantities of data. We may also be able to simulate a small part of a system, which is of particular interest, at a reduced cost. For example, in §3, the rear wing of a race car is considered rather than the entire car geometry. If the error in these surrogates of simpler systems with respect to the true system response is simpler than the true system response itself, we can gain accuracy by fitting a second surrogate to the error, based on a small set of expensive data, and adding the two surrogates together. We then have a statistics-based code of predicted behaviour plus black-box complex effects.

This way of adding a surrogate based on plentiful cheap data to a hopefully more simple surrogate based on sparse expensive data could employ any surrogate-modelling method, but Kriging’s multi-output variant co-Kriging is particularly attractive, with its parametric Gaussian basis functions, which are able to interpolate or regress data to varying extents and emulate complex true functions, and the provision of error estimates that are used for global optimization in §3. The co-Kriging method originates in geostatistics (Krige 1951; Cressie 1993), where different ore grades found from core samples were correlated to give a better picture of gold distributions than possible by looking at gold in isolation. The idea being that factors that affect, say, zinc distribution are likely to affect the distribution of gold too.

More recently, Hevesi *et al*. (1992) employed co-Kriging to predict average annual precipitation values near a potential nuclear-waste disposal site using a sparse set of precipitation measurements from the region, along with the more easily obtainable elevation map of the area. Altitude will naturally be correlated with precipitation, but may display a complex relationship, with features such as valleys and ridges needing to be identified, making the co-Kriging method particularly attractive for this application.

Kennedy & O’Hagan (2000) used co-Kriging to correlate finite-element analyses based on a coarse mesh with those based on a fine mesh. They used a formulation that assumes the complex process is equal to the simple process, multiplied by some ratio, plus a difference process. The ratio and difference process are determined using a set of collocated data. In engineering design, this method has been cast in a global optimization context and used to optimize a transonic wing design by combining an empirical drag prediction code with computational fluid dynamics (CFD; Forrester *et al*. 2007). Wang *et al*. (2009) derived their work on validating computer models with physical experiments from Kennedy & O’Hagan’s method. Gaussian-process-based multi-fidelity models are also used, for example, by Qian & Wu (2008), and a good source of further information is available in Santner *et al*. (2003).

In this paper, we look at two particularly demanding calibration scenarios in the field of engineering design. First, in §2, we consider the calibration of simple CFD with wind-tunnel data to predict full-scale performance. In §3, we consider a novel and industrially relevant case where the cheaper design simulation uses a simplified geometry, rather than the more usual simplified analysis.

The mathematics behind the co-Kriging method has been omitted for clarity. More details and the Matlab code used to produce the results in this paper can be found in Forrester *et al*. (2008). Data from the examples presented are available as electronic supplementary material.

## 2. Low-speed unmanned air-vehicle wing design

Although the availability of accurate turbulent-flow simulations has improved in recent years, within an engineering design context time and computer resources dictating that simulations of design alternatives may be limited to Euler or even potential-flow simulations of simplified or incomplete geometries. Solutions of the Reynolds-averaged Navier–Stokes (RANS) equations may also be employed on simplified geometry, though run times preclude their use in extensive optimization processes. These, and more accurate simulations, are employed for more detailed design of components. Similarly, the expense of physical experiments, e.g. wind-tunnel tests, usually precludes their use in an optimization context.

In this example, we will consider the lift/drag (*L*/*D*) of the wing of a low-speed unmanned air vehicle (UAV) with a wing span of 1.4 m operating at 18 m s^{−1} with a Reynolds number of 1.5×10^{5} and a lift coefficient of 0.36. The wing has a constant aerofoil section throughout its span, and it is the design of this aerofoil section that we are concerned with. The aerofoil is defined by six parameters that describe the shape of an upper and lower Ferguson spline (see Sóbester & Keane 2007 for more details). We have 12 full-scale wind-tunnel tests at our disposal (chosen using a space filling Latin hypercube sampling plan; Morris & Mitchell 1995), which are unlikely to be sufficient to gain insight into the six-dimensional design landscape. A Kriging prediction of *L*/*D* based on these wind tunnel data is shown in figure 2. Each sub-tile shows a clear trend, with higher *L*/*D* for lower values of (a sharper lower leading edge), but there is little variation from sub-tile to sub-tile or tile to tile, indicating that, although a key trend has been identified, there are insufficient data to capture trends due to all the parameter variations.

To augment the wind-tunnel tests, we can also use a two-dimensional full-potential code with an empirical drag correction (VGK, short for ‘viscous Garabedian and Korn’; Freestone 1996). Each two-dimensional aerofoil simulation takes a few seconds, and 120 *L*/*D* values have been obtained. A Kriging prediction based on these data is shown in figure 3. Compared with figure 2, the trends in this plot are more complex, which we expect as there are more data. The trends are also starkly different, which we expect as the two-dimensional potential-flow model is a gross simplification of the flow over the three-dimensional wing.

We hope to calibrate this grossly simplified model using the wind-tunnel data and co-Kriging. If the Kriging model of the VGK data has captured at least some important effects, calibrating this model will prove more accurate than working with the wind-tunnel data alone. The resulting co-Kriging model, with complex trends afforded by the density of cheap data, but accuracy in these trends afforded by the wind-tunnel data, is shown in figure 4. Note how the key sub-tile trend from figure 2 has been retained (though slightly modified), but there are now more complex variations across the other variables. The leave-one-out cross-validation error^{1} of this model is a 21 per cent improvement compared with the Kriging model based on wind-tunnel data alone, and confirms the benefits of using the cheap data via co-Kriging.

Seemingly ‘ignoring’ the underlying physics of a system, as we have in figure 4, could be considered a little controversial. However, although the physics is not being modelled directly, it is taken into account via the black-box calibration process. In essence, figure 4 shows the results of a new, quick-running CFD code, specific to this problem, which takes into account all physical effects. The accuracy of the code is limited by the quantity and quality of underlying data (inevitably, there are experimental errors in figure 4), although the Gaussian-process-based method employed does provide us with error estimates, which will come into play in §3 in the context of global optimization.

## 3. Race-car wing design

In this study, we consider the design of the rear wing of an open cockpit hill-climb race car, similar in appearance to a Formula One car. We are interested in the effects of varying the geometry of the rear wing by changing the 12 design parameters shown in figure 5. The geometry of the car, along with boxes depicting areas of mesh refinement around the turbulent regions of the wheels, wings and wake for a 6 000 000 cell mesh, is shown in figure 6. RANS simulations, using the one equation Spalart–Allmaras model in Fluent, based on this mesh take between 12 and 15 h on an eight processor 16 GB Linux node. Cluster scheduling dictates that only two nodes can be used at any one time. Constraints such as these on computing resources are representative of those experienced throughout academia and industry.

We hope to expedite the search of this 12-dimensional design space by augmenting full-car simulations with rear-wing-only simulations. The rear-wing geometry and mesh-refinement zones for a 1 500 000 cell mesh are shown in figure 6. Spalart–Allmaras simulations of this geometry, using inflow conditions taken from the baseline full-car simulation, take approximately 1.5 h on the computer resources used for the full-car simulations.

Naturally, the simulated flow over such a simplified geometry (figure 7) is quite different from that which includes the rest of the car (figure 8). Nevertheless, by exploiting the fact that the difference between the effects of the two flows is simpler than the effects of the full-car flow itself, useful time savings can be had by employing the co-Kriging methodology.

Our first study of this problem compares a Kriging model built with *L*/*D* data from 20 full-car simulations with a co-Kriging model built using 10 full-car simulations and 120 rear-wing-only simulations. This represents a similar level of computing resource (CPU hours) for each method. The Kriging prediction of *L*/*D* based on 20 full-car simulations is shown in figure 9. For visualization purposes, only four variables are shown. These are those that the co-Kriging model indicates have the greatest discrepancy between the cheap and expensive data (this is determined by examining the widths of the difference-model basis functions). On the basis of such sparse data, the trends are very simple indeed, with only one variable leading to a significant variation in *L*/*D*.

Figure 10 shows a Kriging prediction based on the 120 rear-wing-only simulations. As in the UAV example, the higher density of data allows more complex trends to be derived. However, here the trends show some similarity to the full-car model in figure 9.

The co-Kriging prediction in figure 11 can be seen (visually) to combine the aspects of both figures 9 and 10. Now, there is a clear variation in *L*/*D* for all variables. This model will be used as a basis for the search for promising designs in a surrogate-model-based optimization process.

In this 12-dimensional design space, the true global optimum is likely to remain elusive. It is perhaps worth taking a moment to consider the magnitude of the problem we face. Given one design variable, we might need at least three *L*/*D* evaluations to gain some insight into how this variable affects *L*/*D*. To extend this sampling density to 12 variables, we would require 3^{12}=531 441 *L*/*D* evaluations. We have only 120 rear-wing and 10 full-car simulations, and so we can hope only to achieve a significant level of design improvement. A leave-one-out cross-validation correlation coefficient of *r*^{2}=0.90 is encouraging though.

To decide upon a new promising design, we maximize the expectation of improving on the best design so far (Schonlau 1997). The expected improvement can be calculated because a Kriging (and co-Kriging) prediction represents the mean of a Gaussian process, that is, for any combination of design variables, it is the most probable value at that point, but other values do have probabilities. By integrating the probabilities below the best value so far, we can then calculate the expected improvement that may be obtained at that point. Jones (2001) gives an excellent description of this method. This expected improvement, although a difficult, multi-modal function to search, can be computed in a fraction of a second, and so its maximum can be found using an extensive genetic algorithm followed by sequential quadratic programming.

Once the 12 variable values that correspond to the design with the maximum expected improvement in *L*/*D* have been found, new full-car and rear-wing simulations are performed and the *L*/*D* values are appended to the dataset, i.e. there are now data for 121 rear-wing and 11 full-car simulations. The co-Kriging model is then re-built, the maximum expected improvement found, new simulations performed, and the process repeated until a convergence criterion is met, which is often simply that time allocated for the design optimization is exhausted.

Here, five so-called *infill points* have been added to the original dataset, and their corresponding *L*/*D* values are shown in figure 12. A significant improvement over the designs in the original dataset has been achieved. The rear-wing-only counterparts to the full-car simulations are also shown. Note how the *L*/*D* values follow a similar trend, with the rear-wing-only tending to have slightly higher (worse) *L*/*D*. Exceptions to this are points seven and 10, where the trend is reversed. Note also that the difference between the simulations becomes wider as the optimization progresses and that the optimum predicted by the full-car simulations is not the best point according to the rear-wing-only simulations. Clearly, the black-box calibration process has captured the various unknown trends in order that design improvements can be made.

## 4. Conclusions

Two applications of black-box calibration for complex-system simulation have been presented. The first shows the calibration of a simple two-dimensional CFD code to predict full-scale three-dimensional experimental data. Far from being ignored, that which cannot be analysed and understood within permitted time and cost constraints is accounted for using a statistical model. Here, effects due to, among others, three-dimensional and turbulent flow and surface roughness have been embedded in the co-Kriging model. While efficient computer models that directly simulate complex systems are always desirable, this potential for constructing quick problem-specific codes which may incorporate the effects that are not understood is clearly attractive.

The second application tackled the case of component design—the rear wing of a race car—where the rest of a complex geometry—i.e. the car—may influence the result. Here, the complex effects include the upstream effect of changes in the wing geometry and the effect of the wake of the car on the flow over the wing. For this example, the calibrated model is successfully used in a surrogate-model-based optimization processes. Despite the large difference between the flow over a rear wing in isolation and the whole car, the co-Kriging model yields design improvements with very few additional simulations.

Although this paper has examined aerodynamic applications of co-Kriging, as the methodology sees systems as black boxes, it is naturally applicable to other areas of science. For example, the author is currently working on biomechanical models where musculoskeletal simulations using quick inverse dynamics are calibrated with experimental data. We have considered a single system output (*L*/*D*), however, correlating and optimizing multiple quantities is also possible, and further research in this area would be welcomed.

## Author Profile

**Alexander I. J. Forrester**

Dr Alexander I. J. Forrester was born and brought up in Wirksworth, Derbyshire in the north of England. He studied for a masters in aerospace engineering, followed by a PhD in computational engineering at the University of Southampton where he is now a lecturer, teaching engineering design and optimization. He is a member of the Computational Engineering and Design Research Group and the Rolls-Royce University Technology Centre for Computational Engineering. His research interests lie in the efficient use of simulation and experiments in design optimization. His techniques have been applied to wing aerodynamics, gas turbines, satellite structures, sports equipment and Formula One race car design. He is 31 years old, a husband and father of two, a competitive cyclist and mountain runner.

## Acknowledgements

The author gratefully acknowledges the work of Alistair Ward and Nishant Taluja in computing data for the example applications.

## Footnotes

↵1 Validation using an independent set of data is desirable, but rarely practical when conducting physical or very expensive computer experiments. Leave-one-out cross-validation, where a model is assessed by its ability to back-predict successively omitted points, is though an indicator of generalization properties.

One contribution of 18 to a Triennial Issue ‘Visions of the future for the Royal Society’s 350th anniversary year’.

- © 2010 The Royal Society