## Abstract

There is growing concern about the depletion of hydrocarbon resources and the risk of near-term peaks in production. These concerns hinge upon contested estimates of the recoverable resources of different regions and the associated forecasts of regional production. Beginning with Hubbert, an influential group of analysts have used *growth curves* both to estimate recoverable resources and to forecast future production. Despite widespread use, these ‘curve-fitting’ techniques remain a focus of misunderstanding and dispute. The aim of this paper is to classify and explain these techniques and to identify both their relative suitability in different circumstances and the expected level of confidence in their results. The paper develops a mathematical framework that maps curve-fitting techniques onto the available data for conventional oil and highlights the critical importance of the so-called ‘reserve growth’. It then summarizes the historical origins, contemporary application and strengths and weaknesses of each group of curve-fitting techniques and uses illustrative data from a number of oil-producing regions to explore the extent to which these techniques provide consistent estimates of recoverable resources. The paper argues that the applicability of curve-fitting techniques is more limited than adherents claim, the confidence bounds on the results are wider than commonly assumed and the techniques have a tendency to underestimate recoverable resources.

## 1. Introduction

There is growing concern about the depletion of hydrocarbon resources and the risk of near-term peaks in production. These concerns hinge upon contested estimates of the recoverable resources of different regions and the associated forecasts of regional production. Beginning with Hubbert [1], both resource estimates and production forecasts have repeatedly been derived through the use of growth curves fitted to historical data [2–6]. For example, a logistic curve may be statistically fitted to time-series data on cumulative oil production from a region and projected forward in time to estimate the *ultimately recoverable resources* (URR) of the region and to forecast the future rate of production. Similarly, a growth curve may be fitted to time-series data on cumulative oil discoveries in a region and used to estimate the URR and forecast the future rate of oil discovery. Although dismissed by critics as ‘trendology’ [7,8], these *curve-fitting* techniques have much in common with more sophisticated methods of estimating recoverable resources and forecasting future supply [9,10], as well as with the use of growth curves to forecast technological diffusion [11–16].

This paper classifies and explains these techniques and identifies both their relative suitability in different circumstances and the expected level of confidence in their results. The focus throughout is the application of these techniques to conventional oil^{1} resources because these are more depleted than other hydrocarbons. The paper updates and extends Hubbert’s classic presentation of curve-fitting techniques [18,19] and tests the reliability of these techniques using contemporary data. An earlier and less technical account of these issues was provided in Sorrell & Speirs [20].

The paper is structured as follows. Section 2 describes the concept and origin of curve-fitting techniques and the limited theory on which they are based. Section 3 identifies three groups and nine individual types of curve-fitting technique, develops a mathematical framework to describe these and highlights the nature, implications and importance of the so-called ‘reserve growth’. Section 4 explores the historical origins, contemporary application and strengths and weaknesses of each technique. Section 5 uses illustrative data from a number of oil-producing regions to assess the extent to which these techniques produce consistent and reliable results. The paper concludes by emphasizing the limitations of curve-fitting techniques and their tendency to underestimate recoverable resources.

## 2. Curve-fitting techniques

In a seminal paper, Hubbert [1] forecast future US oil production by fitting a curve^{2} to historical data on annual US production and projecting this forward in time under the assumptions that: first, production must eventually decline; and second, the area under the curve must equal the URR of the USA—or the amount of oil that is both technically possible and economically feasible to extract over the full production cycle. Using contemporary industry estimates for the US URR, 150–200 Gb (1 Gb=10^{9} barrels=1 billion barrels), Hubbert forecast that US oil production would peak some time between 1965 and 1970 (figure 1). Many commentators considered Hubbert’s approach to have been vindicated when US production subsequently peaked in 1970 (figure 2)—although this accuracy was partly fortuitous [21–24]. Hubbert went on to develop more formal curve-fitting methods using data on both oil production and oil discoveries [18], and these methods have since been applied to a variety of mineral resources and to multiple regions throughout the world [25–30]. When applied to conventional oil, curve-fitting techniques have repeatedly been used to forecast near-term peaks in regional and global production [30,31].

Typically, the rate of production from a resource-producing region increases to a peak, and then declines as the resource is depleted. Hubbert [1,18] popularized the notion of a symmetrical, ‘bell-shaped’ production cycle (a ‘Hubbert’ curve), which fits the historical experience of the USA remarkably well (figure 2). However, there is no theoretical reason for the production cycle to take this form and asymmetric cycles tend to be more common in practice (with some regions having multiple peaks) [32]. If the production cycle rises to a single peak and then declines, a plot of cumulative production over time will take the form of a sigmoidal (S-shaped) growth curve (figure 2). Hubbert modelled this cumulative production cycle with a *logistic* growth curve and the production cycle by the first differential of the logistic curve (box 1). Other authors have employed different growth curves, such as the Gompertz, Bass and Weibull functions, and (to the extent that different curves are compared) have chosen between them on the basis of goodness of fit [33,34]. The asymptote of these growth curves corresponds to the integral of the production cycle and represents the regional URR. Hence, the URR for a region may be estimated by: (i) assuming a particular functional form for the cumulative production cycle; (ii) using nonlinear regression to fit this function to a time series of cumulative production; and (iii) deriving the URR (the asymptote) from the parameters of the fitted curve. A comparable analysis can be applied to data on cumulative discoveries (i.e. the sum of cumulative production and declared reserves), although there is no reason to assume that the discovery cycle will take the same shape as the production cycle.

### Hubbert’s logistic model of the oil production cycle.

The key features of Hubbert’s logistic model are that:

— cumulative production is modelled with a logistic function;

— production is modelled with the first derivative of the logistic function;

— the production profile is symmetric (i.e. maximum production occurs when the resource is half-depleted and its functional form is equivalent on both sides of the curve);

— production increases and decreases in a single cycle without multiple peaks.

Hubbert noted frequently that these were only simplifying assumptions to allow tractable analysis.

Hubbert’s model for production is defined mathematically as follows:
where *Q*(*t*) is cumulative production, *Q*^{′}(*t*) is production, is the URR, *a* defines the ‘steepness’ of the cumulative production curve and *t*_{m} specifies the time when cumulative production reaches one-half of the URR.

The diagrams show the resulting cumulative production cycle (left) and production cycle (right) for three different values of the regional URR.

The curve-fitting technique is popular because it is simple to apply and uses data on oil production and reserves for aggregate regions (such as the USA) that are either available in the public domain or obtainable from commercial sources at relatively low cost. By contrast, the methods used by the US Geological Survey and others are complex, resource-intensive and require data on the production and reserves for individual fields, which are usually confidential and/or extremely expensive to obtain [35]. However, there are important similarities between the two approaches [36], with both assuming that there are diminishing returns to technological change and that physical depletion will ultimately cause production to decline.

Curve-fitting methods are informed by two well-established observations, namely: (i) in most oil-producing regions, the majority of oil tends to be located in a small number of large fields; and (ii) these fields tend to be discovered relatively early, with subsequent discoveries being progressively smaller and requiring more effort to locate.^{3} At some point, the additional production from small fields that are discovered relatively late will become insufficient to compensate for the decline in production from large fields that were discovered relatively early, leading to a regional peak in production [17,38]. This general pattern is illustrated by the production history of the UK continental shelf (UKCS) (figure 3) and should be reflected in the ‘shape’ of both the cumulative discovery and cumulative production cycles. For example, in the region illustrated in figure 4, the early discovery of large fields led to a rapid increase in cumulative discoveries, but, as the average size of new discoveries fell, the rate of discovery slowed. At some point, the average size of new discoveries will approach the limit of economic viability; then exploration will cease and the cumulative discovery curve will reach an asymptote representing the regional URR.

Curve-fitting is best applied to geologically homogeneous areas that have had a relatively unrestricted exploration history (e.g. without areas being closed to exploration for legal or political reasons). If this is not done, the mixing of different populations of fields or the opening up of new areas for exploration can lead to inconsistencies in the time series and undermine the basis for extrapolating historical trends [39]. But even if the region is geologically homogeneous, various technical, economic and political factors can lead to structural breaks in a time series, creating considerable difficulties in the use of fitted curves [40].^{4} Applications of curve-fitting techniques commonly neglect such factors [42,43].

The following section defines the primary variables used in curve-fitting techniques, clarifies the mathematical relationships between these variables, highlights the nature and importance of the so-called *reserve growth* and shows how the neglect of reserve growth can lead to error. The aim is to update and extend Hubbert’s analytical presentation of curve-fitting techniques [18,19].

## 3. Definitions, data sources and mathematical relationships

### (a) Definitions

Curve-fitting techniques regress an explained variable onto an explanatory variable with little or no use of covariates. The explained variable may be *oil* *production* or *oil* *discovery*, measured either in cumulative terms or as a rate of change, while the explanatory variable is either *time* or some measure of *exploratory* *effort*, for example the number of wells drilled. It is also possible to regress the rate of production onto cumulative production and the rate of discovery onto cumulative discovery. This gives nine types of curve-fitting technique, falling into three broad groups (table 1). The data needed to apply these techniques are available from both public and commercial sources, although the latter is expensive and subject to confidentiality requirements. These data sources also differ in their treatment of revisions to reserve estimates, making it important to distinguish between *current* and *backdated* estimates of oil discoveries. A mathematical notation to represent production and discovery data is summarized in table 2.

Oil production, discoveries and reserves are commonly reported on a volumetric basis (barrels of oil), although this can be misleading owing to variations in energy density [44]. Oil *reserves* are those quantities of oil in known fields which are considered to be technically possible and economically feasible to extract under defined conditions. Reserve estimates are inherently uncertain and are commonly quoted to two levels of confidence, namely *proved* reserves (1P) and *proved and probable* reserves (2P). These terms are defined and interpreted in different ways by different bodies, but commonly imply a 90% and 50% probability, respectively, of recovered resources exceeding the stated figure [45]. Only a subset of global reserves is subject to formal reporting requirements and this is largely confined to the reporting of highly conservative 1P data for aggregate regions. Country-level 2P reserve estimates are available at cost from commercial sources, but only 1P data are available for the USA, and subnational and field-level data are largely inaccessible. Most reserve estimates are neither audited nor reliable, so analysts must rely upon assumptions whose level of confidence is inversely proportional to their importance—being the lowest for those countries that hold the majority of the world’s reserves.^{5}

Reserve estimates of known fields commonly undergo repeated *revisions* as a result of cumulative production, better geological understanding, improvements in extraction technology, variations in economic conditions and changes in reporting practices. Changes in reserve estimates over time are commonly referred to as *reserve additions*, although the changes could be either positive or negative. Public-domain data sources, such as the BP Statistical Review, record reserve revisions in the year in which they are made and make no adjustment to the data for earlier years, whereas commercial data sources, such as IHS Energy, *backdate* the revisions to the year in which the relevant fields were discovered—thereby providing a more accurate indication of what was ‘actually’ found at that time. Both of these approaches have their merits, but the difference between them is not always appreciated (figure 5).

The *cumulative discoveries* in a region at a particular point in time may be estimated from the sum of cumulative production and declared reserves (figure 6). Cumulative discoveries are not changed by production because this merely transfers resources from one category (reserves) to another (cumulative production). However, cumulative discoveries may be either increased or reduced by revisions to the reserve estimates for known fields. This is commonly referred to as *reserve growth* because estimates are normally revised upwards rather than downwards [47]. However, a more accurate term is *cumulative discovery growth*, because reserves are continually being depleted by production. When backdated data are used, the estimated volume of discoveries made in a given year is constantly being revised, leading to an upward shift of the cumulative discovery curve over time (figure 7). This reserve growth is significant at both the regional and global levels: for example, using commercial data sources, we estimate that reserve growth added an average of 33 Gb yr^{−1} to global 2P reserves over the period 2000–2007, or more than twice that added through new discoveries (approx. 15 Gb yr^{−1}; figure 8) [17].

The estimated URR for an oil-producing region represent the sum of cumulative discoveries, the anticipated future reserve growth at known fields and the estimated undiscovered resources (figure 6). All of these are uncertain, but the level of uncertainty should reduce as the region is explored and as cumulative discoveries approach the URR. Production forecasts frequently rely upon estimates of regional and global URR, but these are an enduring focus of dispute. For example, we compared 14 contemporary forecasts of global oil production and found estimates of the global URR for conventional oil ranging from 1840 to 3577 Gb (compared to cumulative production of 1128 Gb through to 2008) [48].

### (b) The production and discovery cycles

Let *Q*(*t*) represent the *cumulative production* from a region as a function of time (*t*). The *rate of production* (*Q*′(*t*)) or more simply *production* is given by
3.1This is commonly measured on an annual basis. A plot of *Q*′(*t*) from when production begins to when it finally ends represents a full *production cycle* and must include one or more maxima. The URR for the region is given by the integral of *Q*′(*t*) over this cycle:
3.2

Similarly, let *D*(*t*) represent the *cumulative discovery* in a region as a function of time. This is given by the sum of cumulative production (*Q*(*t*)) and reported reserves (*R*(*t*)): *D*(*t*)=*Q*(*t*)+*R*(*t*). The rate of discovery (*D*′(*t*)) or more simply *discovery* is then given by
3.3As defined here, *discovery* means the change in cumulative discoveries from one period to the next. This is not necessarily the same as the estimated resources contained in newly discovered fields owing to reserve growth at existing fields. Hence, discovery may be positive even if no new fields are found.

A plot of *D*′(*t*) from when discovery begins to when it finally ends represents a full *discovery* *cycle*. This is unlikely to take the same shape as the production cycle, but must also include one or more maxima. As oil has to be discovered before it can be produced, the discovery cycle will precede the production cycle (*D*(*t*)≥*Q*(*t*)), but the interval between discovery and production will vary between fields and regions and over time. The URR is given by
3.4Generally, reserve growth makes a bigger contribution to reserve additions in the later stages of a region’s development and cumulative discoveries may continue to grow for many years after the last field is discovered.

### (c) The backdated discovery cycle

Curve-fitting studies commonly use data from commercial sources, for example IHS Energy [49]. To highlight how such data differ from publicly available data, it is helpful to define a measure of *backdated cumulative discoveries* (*B*), which is a function of both the time of discovery (*t*_{d}) and the time at which the estimate was made (*t*): *B*(*t*_{d},*t*) with *t*≥*t*_{d}. Hence, *B*(*t*_{d},*t*) represents the cumulative discoveries up to time *t*_{d} as estimated at a later time *t*. These estimates are made with the benefit of hindsight and are usually larger than the estimates made at time *t*_{d}, with the estimates for each field typically being larger than those made when the field was discovered. Backdated cumulative discoveries up to time *t*_{d} can be written as
3.5where *Q*(*t*_{d},*t*) represents cumulative production up to time *t* from the fields discovered before *t*_{d} and *R*(*t*_{d},*t*) represents the remaining reserves at those fields as estimated at time *t*.

Estimates of backdated cumulative discoveries (*B*(*t*_{d},*t*)) will increase with the time of discovery (*t*_{d}). The *backdated rate of discovery* (*B*′_{td}(*t*_{d},*t*)) or *backdated* *discovery* is then given by
3.6A plot of *B*′_{td}(*t*_{d},*t*) versus *t*_{d} for a particular value of *t* represents a *backdated discovery cycle*. As a consequence of reserve growth, backdated discovery estimates will typically increase with the time of the resource estimate (*t*): i.e. *B*(*t*_{d},*t*+*τ*)≥*B*(*t*_{d},*t*). Hence, the estimated discoveries in each time interval will depend upon the interval between discovery and the resource estimate (*τ*=*t*−*t*_{d}). As reserve growth will cease, allowing the backdated discovery cycle to be interpreted as the URR that were discovered in each time interval . The regional URR is then given by
3.7The rate of change of backdated discovery estimates with respect to the time of the estimate is given by
3.8This can be normalized to the time interval between discovery and estimate (*τ*=*t*−*t*_{d}) to give a *growth function*
3.9For fields discovered at time *t*_{d}, a plot of *G*(*t*_{d},*τ*) versus *τ* represents the subsequent change in the estimated size of those fields relative to the initial estimate. As an illustration, figure 9 shows two growth functions estimated for onshore US oil fields using 1P reserve data [50,51]. Both functions show rapid growth in the years immediately following discovery and, although growth subsequently slows down, it is still continuing some 80 years later.^{6} We would generally expect cumulative discovery estimates to increase over time, but in some cases and/or time intervals they may decrease. For simplicity, reserve growth is commonly assumed to be independent of the time of discovery of a field (*G*(*τ*)), but in many cases this may be incorrect. Also, discovery estimates based upon 1P reserve data are likely to grow more than those based on 2P data, because the former represents an extremely conservative estimate of recoverable resources (e.g. with a 90% probability of actual recovery being greater).

Owing to future reserve growth, backdated discovery estimates (*B*′_{td}(*t*_{d},*t*)) will typically underestimate the URR found in any time interval. The amount of underestimation will depend upon the amount of growth remaining, and hence the interval since discovery (*τ*). Also, cumulative discovery estimates (*B*(*t*_{d},*t*)) represent the sum of estimates from fields that were discovered at different times and have experienced different amounts of reserve growth. As a result, plots of backdated discoveries (*B*′_{td}(*t*_{d},*t*)) and backdated cumulative discoveries *B*(*t*_{d},*t*) can be misleading because the sizes of fields discovered at different times (*t*_{d}) have not been estimated on a consistent basis. Hence, to provide an accurate estimate of URR, backdated discovery estimates (*B*′(*t*_{d},*t*)) need to be *adjusted* to allow for future reserve growth using an estimated growth function (e.g. figure 9). Assuming first that the growth function is independent of the time of discovery (*G*(*τ*)), and second that a sufficiently long time series is available to allow to be estimated, the relevant formula is
3.10Despite the importance of such adjustments, they are rarely made, owing to either insufficient data or an erroneous belief that they are unnecessary. Moreover, cumulative discovery estimates derived from public-domain data sources (*D*′(*t*)) cannot be adjusted in this way—which is one reason why these data provide a less reliable basis for estimating the URR.

### (d) Discovery and exploratory effort

In practice, the rate of discovery will be influenced by a variety of economic and political factors that could invalidate the extrapolation of historical trends. For example, the rate of discovery may fall as a result of economic recession rather than through depletion of the resource. A preferable approach is therefore to measure discovery as a function of some measure of *exploratory* *effort* (*ε*)—for example, the cumulative number of wells drilled—which should be less sensitive to economic and political influences [54]. For example, a recession could reduce exploratory activity as well as the number of new discoveries, with the result that the rate of discovery per unit of exploratory effort could remain relatively unchanged.^{7}

Let *D*(*ε*) represent the total amount of oil discovered for a cumulative level of exploratory effort (*ε*). The rate of change of cumulative discovery with respect to cumulative exploratory effort is then given by
3.11As exploration proceeds, the rate of discovery should fall and cumulative discoveries should approach the URR for the region . However, a variety of factors will influence the observed trend, including technologies that increase the success rate of exploratory drilling.

In a similar manner, *backdated cumulative discoveries* (*B*) may be expressed as a function of the cumulative level of exploratory effort (*ε*_{d}) and the time (*t*) at which the estimate was made: *B*(*ε*_{d},*t*), where *t*≥*t*_{d} and *t*_{d} is coincident with *ε*_{d}.^{8} Hence, *B*(*ε*_{d},*t*) represents the cumulative discoveries contained in fields that were discovered through to cumulative exploratory effort *ε*_{d} as estimated at time *t*. The *backdated discovery rate with respect to cumulative exploratory effort* is then given by
3.12A plot of *B*′_{εd}(*ε*_{d},*t*) versus *ε*_{d} for a particular value of *t* represents a *backdated discovery cycle with respect to exploratory effort*. The estimated size of *B*′_{εd}(*ε*_{d},*t*) will depend upon the interval between the time required for cumulative exploratory effort of *ε*_{d} (*t*_{d}) and the time at which the resource estimate is made (*t*): (*τ*=*t*−*t*_{d}). As the plot of versus *ε*_{d} can be interpreted as the URR that were discovered in each ‘exploratory effort interval’—commonly referred to as the *yield*. The backdated discovery cycle with respect to exploratory effort (*B*′_{εd}(*ε*_{d},*t*)) is then termed the *yield per effort* (YPE) curve [54], although this term is strictly only applicable as . To provide an accurate estimate of *yield*, the estimates of *B*′_{εd}(*ε*_{d},*t*) should be adjusted to allow for future reserve growth:
3.13The URR for the region is then given by
3.14

### (e) Summary

This section has defined the variables employed in the curve-fitting technique and clarified the relationships between them (table 3). While curve-fitting to production trends is relatively straightforward, the analysis of discovery trends is greatly complicated by competing reserve definitions, uncertain reserve estimates and reserve growth. In principle, backdated estimates of cumulative 2P discoveries (*B*(*t*_{d},*t*)) should provide a more reliable basis for estimating the regional URR than current estimates of cumulative 1P discoveries (*D*(*t*)) [49]. This is, first, because 2P data provide a more accurate estimate of remaining recoverable resources, and, second, because backdating allows a more accurate estimate of the resources found in each time (or exploratory effort) interval. As an illustration, figure 10 compares these two time series for the UKCS. The ‘backdated 2P’ curve indicates diminishing returns to exploration and appears to be trending towards an asymptote, which can be taken as an estimate of the regional URR. By contrast, the ‘current 1P’ curve shows no such trend, despite the diminishing returns to exploration, the rapidly decreasing size of newly discovered fields and the fact that the UK is well past its production peak (figure 3). This difference has led Bentley [49] and others to argue that only backdated 2P estimates are suitable for studying resource depletion. However, these estimates are only available from commercial sources, and to accurately estimate the URR, they should be corrected to allow for future reserve growth using an estimated growth function (equation (3.10) and figure 9).

The following section describes the historical origin and contemporary application of each group of curve-fitting techniques and examines their strengths and weaknesses in more detail.

## 4. Overview, application and evaluation of curve-fitting techniques

### (a) Production over time techniques

The simplest method of estimating URR uses nonlinear regression^{9} to fit a curve to time-series data on cumulative production (*Q*(*t*)). This curve may take a variety of forms, with its shape being defined by three or more parameters, one of which corresponds to the URR. In Hubbert’s logistic model (box 1),^{10} the curve is defined by three parameters, representing the URR, the ‘steepness’ of the curve and the midpoint of the growth trajectory. The URR may be estimated by fitting a curve to either cumulative or annual^{11} production, although the result may not be the same [58].

Curve-fitting to production trends should be more reliable if production has passed its peak and is only viable if the rate of increase in production has passed its peak (i.e. the point of inflection on the rising production trend). US data fit the logistic model very well (figure 2) despite covering a period that includes two world wars, several recessions, two oil shocks, revolutionary changes in technology and the opening up of new oil-producing regions. By contrast, the logistic model provides a relatively poor approximation to the production cycle for most other oil-producing regions [32].

The logistic is one of a family of symmetric and asymmetric curves that are widely used to model bounded growth processes—where initial rapid growth subsequently slows down owing to some limiting factor [59,60]. Alternatives include the generalized logistic [61], Bass [62], Gompertz [63] and bi-logistic [15] as well as the cumulative lognormal, Cauchy and Weibull distributions [34,59]. Brandt [32] analysed 74 oil-producing regions and found that the production cycle was asymmetric to the left in over 90% of cases (i.e. production climbed rapidly to a peak and declined more slowly after the peak). The corresponding cumulative production cycle can be modelled with a Gompertz curve, among others, but when Moore [64] fitted this to US data he obtained a URR estimate that was almost twice as large as that from the logistic model for a comparable goodness of fit. Very similar results from US production data were later obtained by Wiorkowski [34]^{12} and Cleveland & Kaufmann [27].^{13} This highlights a generic weakness of curve-fitting techniques, namely: *different functional forms often fit the data comparably well but give very different estimates of the URR* [65].

In practice, production cycles often have more than one peak as a result of economic, technical or political changes or the opening up of a new region [3]. For example, Illinois experienced two production cycles as a consequence of early developments in exploration technology [1]. Laherrère [2] argues that most countries have several cycles of discovery and production and are best modelled by two or more curves. Several authors have followed this approach [29,30,66–68] and any cumulative production trend could in principle be decomposed into the sum of an arbitrary number of logistic curves [16]. But the better fit of a more complex model may not be justified statistically, owing to the risk of ‘over-fitting’ [5,69]. Also, the results will be unreliable if further cycles are expected in the future. This highlights the second generic weakness of curve-fitting techniques, namely *their inability to anticipate future cycles of discovery and production in aggregate regions*.

If cumulative production grows logistically, a plot of the ratio of production to cumulative production (*Q*′(*t*)/*Q*(*t*)) as a function of cumulative production (*Q*(*t*)) should be approximately linear [18]. If a linear regression is fitted to these data, the URR may be estimated by extrapolating and identifying the intersection with the cumulative production axis (figure 11). This ‘production decline curve’ (or ‘Hubbert linearization’) technique was popularized by Deffeyes [70] and is widely used because it only requires linear regression. However, as it is equivalent to using nonlinear regression to fit a logistic curve to cumulative production [36], it will be unreliable if (as is usually the case) cumulative production departs from the logistic model. In addition, because the ‘explained’ and explanatory variables are not independent, the errors cannot be normally distributed.

In summary, curve-fitting to production trends is straightforward and relies upon data that are readily available, relatively accurate and free from the complications of reserve growth. But while these techniques may sometimes provide reliable estimates in regions that are well past their peak of production, they have important drawbacks, including: the lack of a robust theoretical basis for the choice of functional form; the apparent sensitivity of the estimates to that choice despite comparable goodness of fit; the risk of ‘over-fitting’ multi-cycle models; the inability to anticipate future production cycles; and the neglect of economic, political and other variables that have shaped and will continue to shape the production cycle. These drawbacks are also shared by the discovery-based techniques described below.

### (b) Discovery over time techniques

Curve-fitting to discovery trends was first introduced by Hubbert [18,71,72] and has since been employed by other authors, most notably Laherrère [2,26,73,74]. These techniques have much in common with those described above and raise a comparable set of issues and concerns. In principle, the extrapolation of discovery trends should provide more reliable estimates of the URR because the discovery cycle is more advanced. However, discovery estimates are much less certain than production estimates, as reserves are estimated to different levels of confidence and are subject to the complications of reserve growth (§3).

Hubbert’s discovery projections were based upon the idealized life cycle model illustrated in figure 12 [57]. Hubbert assumed that both cumulative discovery (*D*(*t*)) and cumulative production (*Q*(*t*)) grew logistically, with the former preceding the latter by some time interval. As the peak rate of discovery precedes the peak in production, identification of the former could form a basis for predicting the latter.

Hubbert [71] fitted a logistic curve to US data on cumulative proved discoveries (*D*(*t*)) and used this to estimate a URR of 170 Gb. Subsequent studies confirmed this estimate [18,75–78], but several authors questioned Hubbert’s results. For example, Ryan [65,79] fitted logistic curves to US production and discovery data and found that they led to widely different estimates for the URR. He also showed that much larger estimates could be cited with equal justification and that the estimates increased rapidly with the addition of only a few more years of data. Cavallo [80] recreated Hubbert’s original dataset and found that the *R*^{2} for the best-fitting models changed only from 0.9946 to 0.9991, as the value of URR varied from 150 to 600 Gb. Similarly, Cleveland & Kaufmann [27] found Hubbert’s results to be highly sensitive to the length of data series chosen.

The assumption that the discovery cycle takes the same form as the production cycle appears neither necessary nor plausible—although it works fairly well for the USA when 1P data are used.^{14} The factors influencing discovery at different points in time are likely to be different from those influencing production at a later point in time and the skewed field-size distribution would be expected to (and frequently does) lead to a sharply rising cumulative discovery cycle and an asymmetric discovery cycle [81]. For similar reasons, there is unlikely to be a predictable time lag between the peaks of discovery and production.

While Hubbert used public-domain data on current 1P reserves to form his cumulative discovery estimates (*D*(*t*)), subsequent authors have used commercial data on backdated 2P discoveries (*B*(*t*_{d},*t*)) [31,49,82]. Since 2P estimates are generally larger than 1P estimates, they should lead to a larger estimate of the URR. A discovery cycle based upon backdated estimates will be of a different shape from one based upon current estimates and will have a different date for the peak in discoveries.

As described above, backdated 2P estimates should be more suitable for estimating the URR because the cumulative discovery curve is more likely to trend to an asymptote (figure 10). But they can also be misleading because the sizes of fields discovered at different times will not have been estimated on a consistent basis and both the height and shape of the cumulative discovery curve will change over time (figure 4). Hence, to provide reliable estimates of the URR, backdated discovery data should be adjusted to allow for future reserve growth [83,84].^{15} Despite this, adjustments appear to be the exception rather than the rule. Campbell & Laherrère claim that adjustments are unnecessary, because 2P estimates represent median estimates of recoverable resources and hence should be as likely to decrease as to increase following field discovery. However, this expectation is, first, inconsistent with the available evidence, which suggests that 2P estimates normally grow substantially (figure 8) [17], and, second, contradictory because if 2P estimates are relatively stable there should be no advantage in backdating.

The complications introduced by reserve growth are illustrated by Nehring’s study of the Permian Basin (Texas and New Mexico) and San Joaquin Valley (California) in the USA—which have both been producing oil for more than 80 years [81,85,86]. Nehring employs backdated cumulative 1P discovery estimates (*B*(*t*_{d},*t*)) and corrects these with Hubbert’s [72] growth function to estimate the ultimate resources discovered in each time interval. When using data through to 1964, the corrected cumulative discovery curve for the Permian Basin suggests a URR of 27.5 Gb, compared with only 19 Gb with the uncorrected data. But when using data through to 2000, the URR estimate is 37% larger and the estimated date of peak discovery has moved back in time. While Hubbert’s growth function predicts substantial reserve growth, it nevertheless underestimates the growth that actually occurred—especially for the older fields. Nehring [85] comments:
‘… the continuous upward movement in the [corrected] cumulative discovery curve makes this curve useless as a tool for predicting the ultimate recovery. Estimates of ultimate recovery derived from cumulative discovery curves are only valid if one can guarantee that there will be no further increases in the ultimate recovery of discovered fields… no such guarantee can be made’.

As Nehring employs 1P reserve data, his conclusions may not be applicable to studies using 2P data. Also, Nehring relies upon a growth function that is nearly 40 years old, and he only applies this to the most recent 30 years of data—despite more recent growth functions being available [50,87]. Nevertheless, Nehring’s results provide a powerful illustration of the importance of reserve growth and demonstrate how it can lead to error even when attempts are made to correct for it.

### (c) Discovery over exploratory effort techniques

If data are available, exploratory effort should provide a better explanatory variable than time, because the corresponding rate of discovery should be less affected by economic and political factors. Hubbert [72] was one of the first to fit curves to discovery data as a function of exploratory effort, and variants of this approach have subsequently been employed by other authors [54,69,88,89].

There are number of different ways of measuring cumulative exploratory effort,^{16} although the choice will be largely dictated by data availability. The most common metric (the cumulative number of ‘new field wildcat’ wells or NFWs) may not be the best, as much reserve growth derives from development rather than exploratory drilling [54]. There are also difficulties with accounting for the delays between drilling and reserve additions, in distinguishing between the search for oil and the search for gas resources, and in allowing for spatial and temporal variations in drilling patterns [92].

A ‘creaming curve’ is a plot of backdated cumulative discoveries against cumulative exploratory effort (*B*(*ε*_{d},*t*) versus *ε*_{d}) (figure 13), whereas a YPE curve is a plot of the backdated rate of discovery against cumulative exploratory effort (*B*′_{εd}(*ε*_{d},*t*) versus *ε*_{d}) (figure 14). Provided the ‘yield’ from drilling declines as exploration proceeds, an estimate of the URR may be derived from the asymptote of the former, the integral of the latter or the corresponding parameters in the fitted curves. Changes in yield represent the net effect of changes in the success rate (the fraction of exploratory wells drilled that yield commercially viable quantities of oil) and changes in the average size of discovered fields. Evidence suggests that the success rate in most regions has declined only relatively gently, if at all, indicating that improvements in exploration technology have partially or wholly offset the anticipated decline in the success rate as a result of the declining number of undiscovered fields [10,93].^{17} By contrast, the average size of discovered fields in many regions has fallen by an order of magnitude since the early days of exploration (figure 3). Hence, declining YPE is most likely to be the result of falling average field sizes.

Hubbert [72] based his forecast of future yield upon a detailed analysis of past trends in the USA, which showed a negative exponential decline. He therefore fitted a negative exponential curve to his estimates of YPE in the lower 48 US states and estimated a URR of approximately 170 Gb, consistent with his estimates from production and discovery projection. But Harris [40] showed that Hubbert’s method violated standard statistical procedures, placed excessive weight on the last (and the most uncertain) data point and led to systematically biased estimates. The only reason Hubbert’s estimate was consistent with his earlier work was that the discovery rate had increased—something that Hubbert considered to be both anomalous and temporary. Harris also showed that a YPE curve for an aggregate region, for example the USA, will not necessarily be exponential, even if the trends for individual regions are exponential. As a result, the URR estimated from a curve fitted to data from an aggregate region will be different from the sum of the estimates from curves fitted to data from the component subregions.

More recent work has used creaming curves rather than YPE curves. Laherrère [69,82,95] has estimated creaming curves for all regions of the world and found that they tend to rise steeply in the early stages of exploration, reflecting the discovery of a small number of large fields. He fits ‘hyperbolas’ to these data, but rarely provides either the functional form or the goodness of fit. Smooth curves may be the exception rather than the rule, however. While diminishing returns to exploratory effort are widely observed at the ‘play’^{18} level, the same may not always be observed at larger geographical scales, because regions are frequently developed in order of ease of exploration and development rather than size [8]. For example, a combination of geological accessibility and improvements in exploration technology led to the largest play in the Michigan Basin being developed relatively late [8]. Similar phenomena are reported by Wendebourg & Lamiraux [39], who find two exploration cycles in the Paris Basin. While a creaming curve estimated using data through to 1986 leads to a URR estimate of 15 Mt, a similar curve estimated using data through to 1996 leads to a much larger estimate of 46 Mt. The larger the geographical region, the more significant this problem could become. Laherrère [2] addresses this through the use of multi-cycle models, but provides little statistical support for his choice of curves and in some cases the appropriate number is unclear. For example, Laherrère [26] models the oil and gas resources of the Middle East with two creaming curves, but in a subsequent paper this has increased to four [2].

The potential for further exploration cycles can only be established from a detailed evaluation of geological potential and exploration history. For small, geologically defined regions where exploration is well advanced, the probability of new cycles may be relatively low, while for large, politically defined regions that are partly unexplored (e.g. owing to the depth of drilling required, or geographical remoteness or political restrictions), the probability may be much higher. The reliability of curve-fitting therefore depends heavily on the assumption that any new exploration cycles will have only a small impact on aggregate resources—either because there will be no or few such cycles or because the discovered resources will be relatively small. Such an assumption may be informed by geological screening and may be justified for some regions; however, it remains problematic for key regions, such as Iraq. Unfortunately, these are precisely the regions that account for a significant proportion of the global URR for conventional oil.

In summary, while cumulative exploratory effort may provide a better explanatory variable than time, curve-fitting to discovery data must still be used with care. Key difficulties include the uncertainty of reserve estimates, the likelihood of future reserve growth, the apparent sensitivity of the results to the choice of functional form, the existence of multiple exploration cycles and the inability to anticipate new exploration cycles in the future. Overall, these difficulties appear more likely to lead to *underestimates* of the regional URR.

## 5. Consistency of curve-fitting techniques

This section uses illustrative data from a number of oil-producing regions to investigate the consistency of URR estimates from different curve-fitting techniques, and hence the level of confidence that can be placed in their results. It assesses

— consistency over time: whether estimates for a region that are made with one technique using data through to year

*t*are consistent with estimates made by the same technique using data through to a later year*t*+*n*;— consistency between functional forms: whether estimates for a region that are made assuming one functional form are consistent with the estimates made by assuming a different functional form that has a comparable goodness of fit;

— consistency over the number of curves: whether estimates for a region that are made by fitting a single curve to the time series are consistent with the estimates made by fitting two curves sequentially; and

— consistency between techniques: whether estimates for a region that are made using one technique are consistent with those made by another technique.

Our data source was the 2007 edition of the country-level PEPS database supplied by IHS Energy. This uniquely provides data on oil production, drilling, backdated discovery, 2P reserves and other variables for all producing countries. From this, we extracted annual data on cumulative oil production (*Q*(*t*)), backdated 2P discoveries (*B*′_{td}(*t*_{d},2007)) and the annual number of new field ‘wildcat’ wells (*ε*(*t*)) for 10 oil-producing regions (A–J). For reasons of confidentiality, these regions are not identified below and the associated discovery and production figures are not disclosed. They include both individual countries and groups of countries, with all but one (region B) past their peak of discovery and five (regions A, D, F, H and J) past their peak of production. As the objective was to test the reliability of curve-fitting techniques as currently used by the majority of authors in the field e.g. [2,5], we did *not* correct the discovery estimates to allow for future reserve growth. We therefore expect the results to underestimate the URR.

We chose three widely used techniques and conducted several illustrative tests on each (table 4). We judged two sets of results to be ‘consistent’ if the mean URR estimates differed by less than 20% of the cumulative production (*Q*_{2007}) or cumulative discoveries (*D*_{2007}) in the region through to 2007. Although somewhat arbitrary, a more or less stringent definition of consistency would not significantly change the results, because most estimates were found to be either broadly consistent or substantially different. The results are summarized below and reported in detail in [36].

### (a) Consistency of production decline curves

This straightforward and popular technique involves plotting the ratio of annual to cumulative production (*Q*′(*t*)/*Q*(*t*)) as a function of cumulative production (*Q*(*t*)), taking a linear regression and estimating the URR from the intercept of this regression with the *Q*(*t*) axis (figure 11). As shown in [36], the regression will only be strictly linear if the cumulative production cycle fits the logistic model. For region A, the data points initially show considerable scatter, but subsequently settle into an approximately linear relationship that suggests a URR 32% larger than cumulative production (figure 15). However, the data only behaved in this approximately consistent fashion for four out of the 10 regions and failed to ‘settle’ into a single linear relationship for the remainder, even when the production cycle was well advanced (figure 15).

The behaviour illustrated in region J (figure 15) may result from two or more cycles of exploration and production. These may occur, for example, because different geographical areas were opened up to exploration at different times, or because technological developments allowed access to deeper or less accessible resources—such as offshore regions. The IHS database allows onshore and offshore resources to be separated, but this did not resolve the problem for the regions we tested, since most were found to have trend breaks in the time series for either onshore or offshore resources or both (figure 15).

The overall results for the production decline curve tests are summarized in table 5. For five regions, the URR estimates from aggregate data were broadly consistent with the sum of the estimates from onshore and offshore data. However, trend breaks were observed for six of the 10 regions using aggregate data and for all of the regions using either onshore or offshore data (or both). The most likely explanation is that the data include several discrete regions that were developed at different times and that the cumulative production cycle is only poorly approximated by the logistic model. The results raise serious concerns about the usefulness of production decline curves, especially for pre-peak regions, and suggest that the technique is likely to underestimate the regional URR.

### (b) Consistency of cumulative discovery projection

This technique involves plotting cumulative discovery estimates as a function of the date of discovery (*t*_{d}), using nonlinear regression to fit a growth curve to these data and estimating the URR from the value of the relevant parameter(s). While some authors have used an exponential function to reflect the early discovery of large fields, this proved a poor fit to the data for all of our regions. Hence, we compared the use of the logistic and Gompertz curves (table 6 and figure 16).

The results demonstrate that the technique is more reliable when the discovery cycle is well advanced. For many of the regions, the two functions fitted the data equally well but provided substantially different estimates of URR. For example, in region E, the difference between the *R*^{2} for each model was only 0.003 but the URR estimates differed by a third. The mean difference in the URR estimates from the two models was 59% of the cumulative discoveries through to 2007 (ranging from 1 to 362%), but the mean difference in *R*^{2} estimates was only 0.003. This serves to illustrate the frequent sensitivity of URR estimates to the essentially arbitrary choice of functional form. The Gompertz model also produced a higher URR estimate in all cases, indicating how the choice of functional form can influence the results.

The consistency over time of this technique was investigated by systematically shortening the time period through to the last recorded discovery (*t*_{d}) and re-estimating the curves. Each dataset (*B*(*t*_{d},2007)) represents the cumulative discoveries up to year *t*_{d} as estimated in 2007. This is likely to be greater than the cumulative discoveries estimated in year *t*_{d} (*B*(*t*_{d},*t*_{d})) owing to reserve growth in the intervening period. To estimate the latter, it would be necessary to obtain successive annual editions of the PEPS database. Our URR estimates are therefore likely to be both larger and more consistent than those that would have been obtained from successive annual data (*B*(*t*_{d},*t*_{d})) because they reflect the reserve growth over the intervening period (2007 to *t*_{d}).

As an illustration, figures 17 and 18 present the results for region E, which passed its discovery peak some decades ago. This shows how the estimates become more consistent as the length of time series increases. However, the estimates also fall as the length of the time series increases, suggesting that a cumulative discovery projection using a shorter time series would have overestimated the URR.

The full results of these tests are summarized in table 7. The URR estimates were relatively consistent for seven regions (A, C, D, G, H, I and J) using the logistic model and five regions (A, C, H, I and J) using the Gompertz model. Regions exhibiting consistency over time in their URR estimates did not necessarily have a ‘better’ fit in terms of *R*^{2} and the results did not suggest any systematic tendency to either underestimate or overestimate the URR. The results indicate that discovery projection should provide more reliable estimates than production decline curves, but the technique performs poorly for regions at an earlier stage in their discovery cycle and the degree of inconsistency, both between functional forms and over time, remains high for many of the regions examined. It is likely that consistency would be improved if discovery data could be obtained for lower levels of spatial aggregation and corrected for future reserve growth.

### (c) Consistency of creaming curves

This technique involves plotting cumulative discovery estimates as a function of exploratory effort (*ε*), using nonlinear regression to fit a curve to these data and estimating the URR from the value of the fitted parameter(s). The measure of exploratory effort (*ε*) used is the cumulative number of exploratory, or NFW, wells that have been drilled in the region.^{19} In this case, we examined the consistency of URR estimates from different functional forms and from using single and multiple curves.

Authors such as Campbell & Laherrère [96,97] are rarely explicit about the functional form used for creaming curves. For illustrative purposes, we took two functions that rise rapidly from zero and exhibit asymptopic behaviour, namely the rectangular hyperbola and negative exponential: 5.1and 5.2

Figure 19 illustrates both of these curves for region A. This region provides a ‘classic’ creaming curve, in that exploration initially leads to the discovery of several large fields but is soon subject to diminishing returns. While both curves provide a good fit to the data, the URR estimate from the hyperbola is 39% larger. Moreover, only six of the 10 regions exhibited asymptotic behaviour. For example, the time series for region C could be approximated by a linear regression (figure 20), despite this region being well past its peak of discovery (figure 21) and giving broadly consistent results with a cumulative discovery projection (figure 20).

A possible reason for the ‘straightening’ of the creaming curve is that technical change is offsetting depletion. Modern seismic technologies permit a better understanding of the underlying geology and have contributed to an increasing success rate of exploratory drilling in many regions [93,94,98]. While there is still a trend towards declining field sizes, modern techniques may also allow field sizes to be estimated more accurately (thereby potentially reducing the amount of future reserve growth). Other contributory factors may include the varying delays between drilling and reserve additions, the lack of a distinction between exploring for oil and exploring for gas^{20} and spatial and temporal variations in drilling patterns [92]. But whatever the explanation, the results suggest that the ‘classic’ creaming curve (figure 19) may not necessarily be representative of aggregate regions.

A summary of the results is given in table 8. Three of the six regions exhibiting asymptotic behaviour provided consistent URR estimates, while all of the estimates from the exponential function were smaller than those from the hyperbolas—showing again how the choice of functional form can influence the results. Three of the six regions exhibiting asymptotic behaviour could also be approximated by two sequential creaming curves. This multi-cycle behaviour is a common occurrence when using data from aggregate regions and may result from discrete areas (e.g. distinguished by spatial location, depth, availability for exploration or some other factor) being explored sequentially [2,5]. We tested this by fitting two rectangular hyperbolas to regions E, H and J and comparing the resulting URR estimates with those obtained from a single curve. Following Laherrère [2], the appropriate ‘breakpoints’ were identified visually and the curves fitted separately to each time period. More sophisticated approaches are available that could simulate the simultaneous exploration of multiple areas, but these have not been widely used [16].

Figures 22 and 23 show the results for region E. Each curve has an *R*^{2} of 0.996, compared with 0.983 for the single-curve model. Whereas the two-curve model leads to a URR estimate that is nearly twice the cumulative discoveries through to 2007, the single-curve model gives an estimate that is half as large again. For the three regions where both one and two curves were fitted, the mean difference in the *R*^{2} was 0.017 while the mean difference in URR estimates was 103% of *D*_{2007} (table 9). In all cases, two curves provided a better fit and a smaller URR estimate. The choice between single or multiple curves can therefore have a significant influence on results, but, without a detailed knowledge of the exploration history of the region, it is difficult to justify one choice over the other.

### (d) Summary of consistency tests

These illustrative tests raise concerns about the reliability of the URR estimates from curve-fitting techniques, at least when (as is usually the case) they are applied at the country level with data that have not been corrected for future reserve growth. In only one of the regions examined were the mean URR estimates consistent (by our definition) among all three techniques (table 10). In addition, variations in the length of time series, functional forms and number of curves led to inconsistent results more often than consistent results; the degree of inconsistency in the URR estimates was frequently very large; inconsistent results were often obtained for regions where the production and discovery cycle was mature; and both different functional forms and different numbers of curves often fitted the data equally well, but provided substantially different estimates of URR.

A key reason for these inconsistent results is that the techniques are being applied to large and geologically diverse regions that lack a consistent exploration history. In addition, we did not always distinguish between onshore and offshore regions, the data source did not classify exploratory drilling as searching for either oil or gas, and most importantly we did not use a growth function to adjust the discovery data to allow for future reserve growth. Future applications of curve-fitting techniques should therefore address each of these problems as far as the available data permit. Nevertheless, the results demonstrate the limitations of curve-fitting as currently used and suggest that the associated URR estimates should be treated with caution.

## 6. Summary

Curve-fitting techniques to estimate recoverable resources are based upon the extrapolation of historical trends in cumulative production and cumulative discovery for aggregate regions. The techniques implicitly assume that the region is geologically homogeneous and that historical exploration has been relatively unconstrained and uninterrupted. In practice, this is rarely the case.

Many applications of curve-fitting take insufficient account of its weaknesses, including: the inadequate theoretical basis; the sensitivity of the estimates to the choice of functional form; the risk of over-fitting multi-cycle models; the inability to anticipate future cycles of production or discovery; and the relative neglect of economic, political and other variables. In general, these weaknesses appear more likely to lead to underestimates of the URR and have probably contributed to excessively pessimistic forecasts of oil supply.

Curve-fitting to discovery estimates introduces additional complications, including the uncertainty of those estimates and the likelihood of future reserve growth. Commercial databases provide a more useful source of discovery estimates, both because 2P data provide a more accurate estimate of remaining recoverable resources and because the practice of backdating reserve revisions allows a more accurate estimate of the resources found in each time interval. However, these data are both expensive and confidential and are rarely adjusted to allow for future reserve growth. This oversight has further contributed to underestimates of recoverable resources.

Our illustrative tests of curve-fitting techniques using data from a number of regions have shown how different techniques, functional forms, length of time series and numbers of curves can lead to significantly different estimates of the URR for the same region. This raises concerns about the reliability of such techniques as currently applied, although the degree of uncertainty declines as exploration matures.

These limitations do not mean that curve-fitting should be abandoned. But they do suggest that the applicability of these techniques is more limited than some adherents claim, the confidence bounds on the results are wider than is commonly assumed and the techniques are likely to underestimate recoverable resources. These drawbacks deserve wider appreciation.

## Funding statement

The authors thank the UK Research Councils for their financial support.

## Acknowledgements

This paper is based upon a systematic review of the curve-fitting literature by Sorrell & Speirs [36], which in turn formed part of a wide-ranging study of global oil depletion by the UK Energy Research Centre [17]. The authors thank IHS Energy for allowing the publication of data from their PEPS database, and Roger Bentley, Richard Miller, Jean Laherrère, Robert Kaufmann and two anonymous reviewers for their helpful comments on earlier drafts. The usual disclaimers apply. The authors have declared that they have no competing interest.

## Footnotes

One contribution of 13 to a Theme Issue ‘The future of oil supply’.

↵1 Defined here as crude oil, condensate and natural gas liquids (NGLs), but excluding kerogen oil, extra heavy oil, oil sands and tight oil. See [17] for definitions.

↵2 Hubbert’s original curve was hand-drawn, but his later papers used more formal techniques.

↵3 Since large fields generally occupy a larger surface area, they tend to be found relatively early even if drilling is random [37].

↵4 Exploration is very rarely unrestricted in aggregate regions. For example, the USA imposes far fewer restrictions than most countries, but the Arctic National Wildlife Refuge, the eastern Gulf of Mexico, much of the western offshore and many onshore areas in the Rockies are largely ‘off-limits’ for environmental reasons [41].

↵5 The common practice of adding 1P estimates to form a regional or global total is statistically incorrect and likely to significantly underestimate actual 1P reserves [17]. Aggregation of 2P estimates should introduce less error but this may be either positive, negative or zero depending upon the statistical interpretation of the estimates and the shape of the underlying probability distributions—which is rarely available [45,46].

↵6 Such findings are typical for US 1P data: for example, Lore

*et al.*[52] found that the estimated size of offshore fields in the Gulf of Mexico doubled within 6 years of discovery and quadrupled within 40 years, whereas a later study by Attanasi [53] suggested an eightfold growth in 50 years.↵7 See Reynolds [55] for an alternative, economic interpretation of discovery trends.

↵8 Note that

*B*(*ε*_{d},*ε*) is an alternative, but is less useful because estimates of cumulative discoveries may continue to increase even in the absence of further exploration.↵9 Nonlinear regression is straightforward with modern computer technology, but the earlier literature uses simpler methods such as the linear transformation of the functional form followed by a linear regression [18]. If the production or discovery cycle is well advanced, it is possible to estimate the URR through visual identification of the asymptote to which the curve is trending (figure 4) [56].

↵10 Hubbert [18] begins with an assumed parabolic relationship between production and cumulative production and uses this to derive a logistic equation for cumulative production over time. But this formal derivation came more than 20 years after he first referred to the logistic model [36,57].

↵11 Data from time intervals of less than 1 year are rarely available and tend to be erratic.

↵12 Wiorkowski [34] compared a ‘generalized Richards’ model (which can take an exponential, logistic, or Gompertz form depending upon the parameters chosen) with a cumulative Weibull and found that they fitted US cumulative production data equally well but led to significantly different URR estimates (445 Gb and 235 Gb, respectively).

↵13 Cleveland & Kaufmann [27] fitted a logistic curve to US production data through to 1988 and found that the adjusted

*R*^{2}changed only from 0.9880 to 0.9909 as the value of URR varied from 160 to 250 Gb.↵14 ‘…Because petroleum exploration in the US began very early, because the initial exploration and discoveries occurred in what has proved to be relatively minor basins, because early drilling technology was very limited in its drilling depth capabilities, and because discoveries in the major basins only hit their stride between 1910 and 1950, the US comes closest to a symmetric discovery curve of any major oil producing country or region’ [81].

↵15 Lynch [43], for example, likens the neglect of future reserve growth to: ‘… comparing old orchards with newly planted saplings and extrapolating to demonstrate declining tree size’.

↵16 Including the cumulative length of exploratory drilling [72], the total number of exploratory wells [90], the number of successful exploratory wells [64], the cumulative length of successful exploratory wells [91] or the cumulative length of all wells (i.e. both exploratory and development) [54]. A distinction may also be made between the first exploratory well to be drilled (‘new field wildcats’) and subsequent wells.

↵17 The International Energy Agency (IEA) [94] reports that, over the last 50 years, the global average success rate has increased from one in six exploratory wells to one in three. Similarly, Lynch [43] reports that the average success rate in the USA increased by 50% between 1992 and 2002, and Forbes & Zampelli [93] report that the US offshore success rate doubled between 1978 and 1995. In an econometric analysis, Forbes & Zampelli [93] estimate that, over the period 1986–1995, technological progress increased the US offshore success rate by 8.3%/year.

↵18 A ‘play’ is an area for petroleum exploration that has common geological attributes and lies within some well-defined geographical boundary.

↵19 The IHS database also contains information on the number of: (i) appraisal wells at existing fields, including new-pool wildcats, deeper-pool wildcats and shallow-pool wildcats; and (ii) development wells at existing fields, which are used to produce from, inject into, monitor or dispose of liquids from reservoirs. It appears sensible to exclude appraisal and development wells from the measure of exploratory effort, because they refer to drilling activity at known fields rather than exploration for new fields. But since appraisal and development activity contributes to reserve growth at known fields, it will necessarily affect the ‘explained’ variable of cumulative discoveries. This suggests that it may be interesting to explore the relationship between

*total*drilling activity and cumulative discoveries as well as that between*exploratory*drilling activity and cumulative discoveries. However, all the published examples we are aware of solely use the latter approach.↵20 For example, Cleveland & Kaufmann [99] found that offshore exploratory effort for natural gas in the USA had a YPE that was 2–20 times greater than that for onshore exploration.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.