## Abstract

The predominant technical challenge of the upstream oil and gas industry has always been the fundamental uncertainty of the subsurface from which it produces hydrocarbon fluids. The subsurface can be detected remotely by, for example, seismic waves, or it can be penetrated and studied in the extremely limited vicinity of wells. Inevitably, a great deal of uncertainty remains. Computational sciences have been a key avenue to reduce and manage this uncertainty. In this review, we discuss at a relatively non-technical level the current state of three applications of computational sciences in the industry. The first of these is seismic imaging, which is currently being revolutionized by the emergence of full wavefield inversion, enabled by algorithmic advances and petascale computing. The second is reservoir simulation, also being advanced through the use of modern highly parallel computing architectures. Finally, we comment on the role of data analytics in the upstream industry.

This article is part of the themed issue ‘Energy and the subsurface’.

## 1. Introduction

In the upstream oil and gas industry, our aim is to identify and produce subsurface accumulations of oil and gas. From the earliest beginnings of our industry, our key scientific and engineering challenges have been related to our ignorance of the detailed structure of the subsurface. Our abilities to image the subsurface, thereby locating oil and gas fields, and to predict the flow of these fluids through subsurface porous rocks, are often unsatisfactory, given the large capital investments required in our business. While we do apply the most advanced technologies available to these problems, much uncertainty inevitably remains: one proverb of the industry is that we only truly understand an oil or gas field the day we close it down (and perhaps not even then!)

Because of our inability to directly measure important features of the subsurface, much of our decision-making is based on models. We use a variety of models to reconstruct seismic data to create three-dimensional images of the subsurface, and we construct large reservoir models based on these images and other information to predict the flow of multiphase fluids through the reservoir during the production phase of an asset. Successful construction of these models requires the skills of expert applied physicists, mathematicians, geoscientists and engineers; these are among the most advanced modelling challenges, in scope and technical depth, to be found in any industry.

In many cases, there are insufficient data available to uniquely calibrate these models, which have a number of parameters that scales at least linearly with the volume of the earth that is represented therein. Thus, the oil and gas industry has led in the development of techniques to address the highly under-determined calibration problem, typically termed ‘history matching’ in the reservoir modelling arena. This remains an extremely active area of research.

These models are invariably embedded in software. A large service industry has grown up to provide the software needed to construct and solve these models; in addition, many oil and gas operating companies have internal software development units that develop stand-alone or plug-in software used by technical decision-makers throughout their businesses. while equipment and hardware advances remain critical to the industry, much of the innovation has moved into the software space, with research methods moving seamlessly into software used throughout our businesses.

The size of the computational problems we face has made the oil and gas industry one of the largest private sector users of high-performance computing. Numerous oil and gas operating and service companies have advertised their rapidly improving capabilities in petascale computing, and these companies have developed sophisticated technical teams to deploy, maintain and develop specialized code for these systems.

In an academic setting, ‘computational science’ refers to algorithmic and numerical techniques to solve the classical equations of mathematical physics as well as more modern approaches to multiscale problems or machine-learning-based approaches to complex systems. In the oil and gas industry, this definition of computational science is too narrow, because a number of disciplines must be integrated to achieve business impact in this area. These are as follows.

(i) Modelling physics: the construction and validation of practical multi-physics models covering the phenomena that are key to business performance.

(ii) Computational and applied mathematics: the design and implementation of robust, accurate algorithms to solve models on the time scales required by operations.

(iii) Technical software engineering: the development of computer code suitable for solution of scientific and engineering problems within the constraints posed by an enterprise IT environment.

(iv) High-performance computing: the design, procurement, maintenance and specialized code development required to solve industry grand challenge problems on leading high-performance computing architectures.

Of course, it is important to maintain strong links between computational work and the geoscientists and engineers with whom we must collaborate closely to impact our business.

In addition to the subsurface element in our business, we also manage complex logistical operations and global supply chains in both our upstream and our refining and petrochemicals businesses. These activities give rise to many classical optimization problems; however, as these are similar to challenges in other industries, I will limit the scope of this article by focusing on the subsurface alone, because the uncertainty and inaccessibility of the subsurface strongly flavours the unique computational problems of the oil and gas industry.

In this review, we will present a high-level discussion of key computational sciences problems in our industry, focusing especially on the two classical problems of seismic imaging and reservoir simulation. We will discuss recent advances and outstanding problems in these two areas, leaving technical details to the references. We shall also discuss some recent efforts to combine physical modelling with data-science-based approaches, which represents a new frontier for computational work in our industry.

## 2. Seismic imaging and full wavefield inversion

For many decades, seismic imaging has been the key technology for oil and gas exploration [1]. In seismic imaging, sound waves are created at the surface of the Earth, which then propagate through the subsurface. As they propagate, they will reflect off the various geological layers and structures in the subsurface. Observing the scattered waves at the surface allows geophysicists to reconstruct a picture of the subsurface, and of the most likely locations of oil and gas reservoirs.

Reconstruction of the positions of the sound reflectors in the subsurface, and the linked problem of creating a model for the velocity of the sound in the subsurface, is an extremely difficult problem, which has preoccupied generations of mathematicians and geophysicists. Consider the reflections from a single horizontal reflector, shown in figure 1. We suppose that the reflector is at a depth of *d* beneath the surface, in which sound waves (for the moment we make no distinction between longitudinal and transverse waves) propagate at a velocity *v*. By using a sequence of geophones laid out on the surface, it is possible to determine both *d* and *v*, because, for a sound pulse emitted from the source at time *t*=0, the arrival time *t*(*x*) of the reflected wave a distance *x* from the source on the surface will be
2.1

A two-dimensional seismic reconstruction is shown in figure 2; clearly, the complexity of the subsurface, with many different acoustic reflectors embedded in a medium of constantly varying velocity, makes the generalization of this approach to practical algorithms extremely challenging. Further complexity arises from building not two-dimensional but three-dimensional images of the subsurface (which has been the industry standard approach for the last two decades), the existence of P and S waves with different subsurface velocities, whose reflection properties at subsurface interfaces are different, and the challenge of connecting the key properties of subsurface rocks, such as lithology and porosity, to their wave propagation properties.

It is especially frustrating that often the presence of oil and gas fluids cannot be directly detected by seismic imaging, but must be inferred from extremely subtle features of the seismic image (‘seismic attributes’). In addition, high-frequency sound waves are rapidly absorbed in the Earth’s crust, limiting practical seismic imaging frequencies to tens of hertz, thereby limiting resolution well above the scale at which many interesting geological transitions occur.

An alternative approach to the construction of the seismic image is ‘full wavefield inversion’ (FWI). Originally proposed in the 1980s [2], FWI has become practicable over the past decade with the emergence of petascale computing as a widespread resource in the oil and gas industry. Suppose that we describe the full subsurface model of properties controlling seismic propagation as *c*(** x**), where these properties include density, P-wave (longitudinal) and S-wave (transverse) seismic velocities, and, in more advanced applications, viscoelastic and also attenuation parameters. The vector

**gives the position in the full three-dimensional subsurface region of interest, normally discretized. Suppose that the seismic wavefield resulting from a specified surface source**

*x**s*is

*u*(

*c*,

*s*) (the dependence on spatial position

**is implicit). We indicate the requirement that**

*x**u*satisfy the wave equation by

*F*(

*u*,

*c*)=0. Let us define an objective function

*h*by 2.2where the sum is over individual source points (‘shots’) for the seismic survey (the source characteristic is included in

*s*

_{n}) and

*d*

_{n}gives the full observational results (measured at the surface by geophones or hydrophones) corresponding to the

*n*th shot. Note that

*u*in this expression is normally evaluated at the surface, which will be the position of the data

*d*

_{n}. Full comparison of the seismic returns corresponding to each shot over time is implicit in the norm on the right-hand side of this equation.

Now the objective of FWI can be simply stated as determining
2.3In FWI, the entire model of the Earth (specified by *c*(** x**)) and its properties in the zone of interest are mathematically optimized to give the best possible fit between the observed seismic reflections and those predicted from the model. This large-scale optimization problem has only become feasible in the last few years, due to the advent of new algorithms combined with the power of massively parallel modern computers [3].

One promising algorithm is ‘encoded simultaneous source FWI’ contrasted with the ‘sequential source FWI’ described above [4]. In equation (2.2), there is a sum over individual shot records, each of which requires a forward integration of the wave equation to determine *u*(*c*,*s*_{n}) for that shot. It would be much more convenient if the various shots could be combined, so that only one forward integration of the wave equation (for a particular Earth model *c*) was needed. Of course, this would introduce spurious correlations in the ultimate record, which can be minimized using random encoding of the original seismic signature of the shots. This leads to the following objective function,
2.4where the operator *e*_{n} ⊗ represents the encoding of the original shot sequence (and the observed data) by an independent random operator for each shot.

This problem must be solved by identifying the mismatch between the signal predicted by a particular Earth model *c* and the observed data *d*_{n} and using this mismatch to update *c*; typically, up to hundreds of iterations (each requiring an independent forward integration of the equations of motion) are required to converge. Adjoint methods can be used to determine the best directions in the high-dimensional phase space of *c* to update after each iteration; it is also key to vary the random encodings *e*_{n} ⊗ between iterations to ensure that spurious correlations are eliminated from the eventual converged result. Figure 3 summarizes the workflow.

Notwithstanding its breathtaking computational complexity, FWI, enhanced by encoded simultaneous sources and many other ingenious algorithms, is emerging as a commercially practised technology in the oil and gas industry. Figure 4 shows a simple two-dimensional example of the algorithm in practice. Many enterprises are already using FWI to determine velocity models, which typically require relatively low frequency ranges to achieve acceptable accuracy; other companies are practising FWI to identify both the velocity model and the actual high-frequency seismic structure.

## 3. Reservoir simulation and development and depletion planning

Beyond the challenge of locating hydrocarbon resources lies the challenge of developing and producing them in an economical manner. Wells must be drilled into the hydrocarbon-bearing formations to produce oil and gas, and additional wells are often needed to inject water or gas to maintain pressure in the reservoir. Surface facilities to process the hydrocarbons (and produced water) must be sized and built, often on offshore oil platforms or floating production systems. The total investment to develop a significant oil or gas reservoir may be significantly more than a billion dollars, and much of this money must be spent even before production of oil and gas (and the associated revenue streams) begins. Given the uncertainty in our description of the subsurface, these investments bear significant risk, which is typically mitigated through thorough reservoir simulation studies of candidate scenarios for the reservoir geological description. These studies take place at all stages of the asset life cycle, from early-stage development planning through late-stage depletion and asset management. At later stages, these studies are an important method to increase the overall recovery from the field through careful positioning of infill wells and management of existing wells and facilities.

To simulate the flow of fluids through a permeable reservoir, one must first build a geological model of that reservoir, i.e. a three-dimensional representation of the local rock lithology, porosity and permeability throughout the region of interest. This model will include the positions of faults, including knowledge about the degree to which any particular fault may seal or facilitate fluid flow, and significant geological ‘horizons’ at which properties of the rocks change discontinuously (figure 5). The construction of these models integrates knowledge obtained through seismic imaging of the region with more detailed information obtained through the use of logging tools in whatever wells may have been drilled into the formation.

Unfortunately, these data sources are insufficient to identify key barriers to or facilitators of flow, which may be extensive but thin impermeable shale drapes within the reservoir, or alternatively high-permeability sand channels embedded within an overall low-permeability mudstone. Figure 6 shows the data sources that can be used to characterize reservoirs on both vertical and horizontal scales. It is notable that there is a ‘data gap’ at relatively small vertical resolutions within the reservoir but long horizontal resolutions; unfortunately, these are precisely the sizes and shapes of geological structures that are highly important in controlling flow inside the reservoir on production time scales. To build information into geological models on these length scales, modellers typically rely on geological concepts derived from the study of ‘analogues’, geological systems that have been carefully studied and measured, typically in outcrops. These analogues should have as many environmental features as possible in common with the reservoir of interest, and should ideally have been deposited under similar circumstances. Analogues can be supplemented with statistical models derived from known correlations in the subsurface between properties such as porosity and permeability.

The uncertainty in the underlying geological structure creates an interesting decision analysis problem for development planning. The overall uncertainty in any assessment of reservoir performance needs to be communicated to decision-makers, many of whom lack sophisticated statistical training (which anyway may create an illusory expertise in dealing with such uncertain problems). Traditionally, reservoir models have been varied parametrically, with statistical analysis then determining quantities such as ‘P50’ (the 50th percentile in the likely volume of hydrocarbons produced from the reservoir) or ‘P90’ (the volume exceeded by 90% of the likely scenarios). More recently, attention has turned to ‘scenario discovery’ as an approach to communicating uncertainty. This methodology, pioneered by Rand Corporation researchers, focuses on generating a variety of credible scenarios representing the potential reservoir structures, without generating specific (and non-credible) statistical descriptions of the likelihood of these scenarios [6].

Once a geological model has been specified, the simulation of flow through the system to wells is, in principle (although not in practice!), straightforward [7]. For single-phase flow in porous media, the fundamental governing equation, Darcy’s law, connects the pressure gradient **∇***P* with the flux ** J**,
3.1with

*k*the permeability and

*μ*the fluid viscosity. However, in hydrocarbon reservoirs there are usually three phases flowing: oil, gas and water. An additional complexity (which I shall not address here) is that the ‘oil’ and ‘gas’ phases comprise many different hydrocarbon components, and can evolve material one to the other as pressure and temperature change. Thus, in addition to responding to the pressure, the volumetric flows of the different fluids are determined by their saturations

*ϕ*

_{o,g,w}for the oil, gas and water components, respectively. Instead of one ‘absolute’ permeability, there are now three ‘relative’ permeabilities for the three components, each dependent on the saturations as well as the local pressure. There is no fundamental theory for relative permeabilities, which must be either estimated based on analogues or measured carefully on core samples from the field of interest.

Given choices for these parameters, a set of equations for the time evolution of the pressure and the saturations can be constructed. To ensure stability of forward integrations of these equations, the practice in the industry has evolved towards the use of fully implicit methods to solve the equations, which require a matrix solve at each time step of the forward integration. This has slowed the application of modern, highly parallel computer architectures to the reservoir simulation problem; this problem is much harder to parallelize compared with the forward modelling of the wave equation mentioned above. The solution of relatively non-sparse matrices typically involves communication across large numbers of CPUs at each time step, making it difficult to design algorithms that are not significantly slowed by bandwidth and communication constraints within the parallel architecture. Recently, however, a number of investigators have made progress on this problem. Figure 7 shows a successful application of a highly parallel architecture to achieve dramatic improvements in the speed of reservoir simulation [8].

In actual field management, the choices made for flow in the wells and facilities connected to the field are key to optimize overall field performance (see below). It is thus important for reservoir simulators to model effectively both the subsurface flow and the flow in wells and facilities, especially insofar as the latter flow can be affected by engineering choices implemented on relatively short time scales. This adds considerably to the numerical challenge of designing stable reservoir simulation software, as the mismatch between well and facility and subsurface time scales leads to a very ‘stiff’ numerical problem. Nevertheless, it is worthwhile to be able to simulate these types of problems to provide guidance to day-to-day field operation decisions, which are critical to the late-life value of a field.

Of course, for a field under production, the fluid flow rates into wells predicted by reservoir simulation can be directly compared with those observed in operations, and any disagreements can be used to improve the underlying reservoir model. This discipline is referred to as ‘history matching’: one or perhaps an ensemble of reservoir models are adjusted to match the history of production from a field, which are then hoped to provide better guidance to future production [9]. This discipline has generated a great deal of interesting theoretical work, approaching the problem from a variety of points of view, but the application of these technologies to practical industry workflows is still challenging.

## 4. Data analytics and production operations

Seismic imaging and reservoir simulation are disciplines devoted to creating and exploiting three-dimensional models of the subsurface. Of course, many business decisions do not require such detailed information. There has been considerable interest in the industry in the past decade in the use of data-driven methods to support business decision-making; the general term for such technologies is the ‘digital oil field’.

While digital oil field techniques can be applied in a variety of oil and gas workflows, some of the most impactful applications have been in production operations. The operation of an oil and gas field requires constant daily adjustment of a large number of operational settings on pumps, chokes, valves and other equipment across a large area. It is paramount in making these choices to maintain the safety and environmental integrity of the field; it is also important to avoid fluid mechanical or machinery ‘upsets’ that can reduce production from its optimal level. The field must also be managed in such a way as to maximize the long-term value of the resource produced, which for a major field may involve time scales of several decades.

Promoters of data-based methods in this and other industries point out the rapidly declining cost and increasing capability of both sensing technologies and data storage and communication technologies; it is true that these developments create a strong *prima facie* case for the applicability of data-driven technologies. However, the development of practical data-based workflows introduces several considerations that go well beyond these issues.

(i) Although data are relatively cheap to generate and store, enterprises only devote effort to maintaining the quality and accessibility of data they see as valuable. This creates a ‘chicken-and-egg’ problem for the digital oil field, because the value of data-based workflows must be established before the effort to maintain high-quality data repositories can be well motivated.

(ii) Although a sensor

*per se*may be inexpensive, placement or maintenance of a sensor may be extremely expensive. Often, wells are equipped with ‘downhole’ pressure and temperature sensors that measure these quantities in the subsurface reservoir depth interval. If these sensors fail, extremely expensive well workovers are required to replace them.(iii) There are a wide variety of statistical methods to identify correlations in large datasets; this is becoming a rather mature area of science, generally referred to as ‘machine learning’. In the oil and gas industry, we typically have a grasp of the underlying physical processes we are trying to understand; it is calibration of these processes or extrapolation of models outside the domain of calibration that is most challenging. There has been less focus on effective integration of machine learning techniques with underlying physical modelling.

(iv) While seismic imaging and reservoir simulation are normally practised in office environments by highly educated geoscientists and engineers, production operations decisions are taken in a very different environment. In field operations, either onshore or offshore, staffing is highly constrained, and operators are typically not trained as engineers or scientists. They operate in a highly variable environment, and must take quick decisions, based on experience, heuristics and whatever guidance technology can give them. Safety is always the first priority in their decision-making. Clearly, this is a setting in which new technology must meet high standards of robustness and reliability in order to be trusted by operators.

An example of a digital oil field approach tailored to these constraints is shown in [10]. In many oil fields, especially offshore oil fields, the precise oil, gas and water fluid flow rates are not measured continuously on a well-by-well basis. Although the total amounts of economically valuable fluids produced over a field may be measured extremely precisely, often operators rely on indirect means to estimate the flows at each well. These include the use of pressure and temperature sensors combined with models of the flow in wellbores as a function of these parameters, as well as periodic ‘well tests’ in which an individual well is routed to a test separator so that its flow rate of the three fluid components can be directly determined.

Even with such fragmentary and discontinuous data, it is possible to build computational systems that predict, with some accuracy, the three-phase flow rates at any particular time in a field. This allows rapid identification of operational problems, such as unstable multiphase flow (‘slugging’) in a particular well. It also allows field operations to be more effectively optimized on a day-by-day basis to maximize production. Finally, accurate production rates are key to history matching workflows that update and extend our knowledge of the subsurface reservoir by comparing simulation results to actual production, as discussed above.

The fundamental approach is quite simple. In modelling an oil field, there are three types of features that must be combined to create a total picture of the fluid flow. In the first case, there are permanent facilities: piping, equipment, wellbores, etc. There are engineering variables, such as precise settings on valves or chokes. Finally, there are parameters describing the subsurface that are unknown: reservoir pressure, gas/oil ratio and water cut (water percentage) flowing into each well, and ‘productivity’ index (PI), which parametrizes the dependence of flow into each well on the bottom-hole pressure in the reservoir interval of each well. Multiphase flow through wellbores and surface facilities can be modelled by standard oil industry techniques provided the bottom-hole pressures and fluid saturations of the fluids coming up the wellbores can be determined or estimated.

We can specify these (subsurface) unknowns as *x*_{i}(*t*). The subsurface unknowns will change over time relatively slowly as the reservoir pressures and fluid saturations near the wellbore change; in principle, surface engineering settings can be changed more rapidly. Were the *x*_{i}(*t*) known, we could compute all sensor pressures, temperatures and separator readings . To determine the values of the unknowns, we must then minimize the difference between these computed data and the actual readings , i.e. we should minimize
4.1where the second term involving the operator *Λ* regularizes *x*_{i}(*t*), enforcing that it should vary slowly and not overreact to the inevitable noise in . The matrix *Σ* can be simply taken to be an identity matrix, or, more carefully, may be adjusted to account for the covariance of the as well as information about the confidence ascribed to the data streams from various sensors. Much of the art of making this type of algorithm work for real systems is the careful choice of *Σ* and *Λ*. There are a variety of algorithms for actually finding the minima of expressions such as equation (4.1) [11].

## 5. Conclusion

I have chosen three notable examples from a variety of applications of the computational sciences to the oil and gas industry. While, traditionally, this industry has been viewed as the domain of geoscientists and petroleum or chemical engineers, the advance of computer technology, combined with the economic challenges of the business, has increasingly created opportunities to apply mathematics, computer science and modelling techniques of considerable sophistication. Owing to the large financial size of the industry, there is an opportunity for these approaches to create considerable value. While this should encourage academic researchers and students to try to identify and solve key problems in this domain, true impact requires careful partnering with domain experts who understand the often subtle linkages between scientific advance and business impact.

There are many outstanding problems, even in the subjects I have covered in this review. FWI, while progressing rapidly, is still often inadequate to directly identify hydrocarbon fluids or contacts between different types of fluids. The discretization schemes used in practical reservoir simulators are relatively unsophisticated, and lead to significant inaccuracies in simulation. The multi-physics linkage of fluid flow to geomechanics, which is especially important for modern heavy oil and shale gas/tight oil developments, remains challenging to model accurately and stably. Underlying all of these problems is a software engineering challenge: we cannot afford to rebuild large and cumbersome software architectures every time an incremental advance is made in modelling physics. Thus the design of effective modular software architectures that can be nimbly updated as the physical modelling and numerical algorithms improve is vital to the continued advance of computational sciences in our industry.

## Data accessibility

All data included in this work should be accessed at the original site of publication, as indicated in the references.

## Competing interests

The author declares that he has no competing interests.

## Funding

This work was entirely funded by ExxonMobil.

## Acknowledgements

I am grateful to numerous colleagues in ExxonMobil for many conversations and instruction on the material covered in this review. I would especially like to thank Bret Beckner, Damian Burch, Jerry Krebs and David McAdow for discussions and for providing figures and examples.

## Footnotes

One contribution of 12 to a theme issue ‘Energy and the subsurface’.

- Accepted April 5, 2016.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.