## Abstract

A mixed financial/physical partial differential equation (PDE) can optimize the joint earnings of a single wind power generator (WPG) and a generic energy storage device (ESD). Physically, the PDE includes constraints on the ESD’s capacity, efficiency and maximum speeds of charge and discharge. There is a mean-reverting daily stochastic cycle for WPG power output. Physically, energy can only be produced or delivered at finite rates. All suppliers must commit hourly to a finite rate of delivery *C*, which is a continuous control variable that is changed hourly. Financially, we assume heavy ‘system balancing’ penalties in continuous time, for deviations of output rate from the commitment *C*. Also, the electricity spot price follows a mean-reverting stochastic cycle with a strong evening peak, when system balancing penalties also peak. Hence the economic goal of the WPG plus ESD, at each decision point, is to maximize expected net present value (NPV) of all earnings (arbitrage) minus the NPV of all expected system balancing penalties, along all financially/physically feasible future paths through state space. Given the capital costs for the various combinations of the physical parameters, the design and operating rules for a WPG plus ESD in a finite market may be jointly optimizable.

This article is part of the themed issue ‘Energy management: flexibility, risk and optimization’.

## 1. Introduction

The large-scale use of wind power will pose unprecedented problems, because the supply wattage of a wind power generator (WPG) over a 24 h lead time is approximately a mean-reverting geometrical Brownian motion (GBM). Such processes are noisy at any lead time, inside and outside the range of seconds to 10^{5} s, and that noise can severely disrupt both system balancing and price arbitrage. As kindly pointed out by an anonymous referee, the average contribution of wind in Ireland has already reached 24% in 2015 and in Germany the contribution peaked at 36% in May 2016. Both these (and other) countries therefore face a tremendously difficult task managing large fluctuations in power. One possible way to make it run more efficiently would be the introduction of large-scale energy storage on the grid. The optimum use of an energy storage device (ESD) must hourly reset a committed rate of output, such as to allocate the ESD’s resources dynamically between the two conflicting priorities of system balancing, which minimizes balancing penalties, and price arbitrage, which maximizes income.

Mathematical models that can tackle variable flow rates into and out of a store have emerged in the literature over the last 10 years, with applications appearing in gas storage [1] as well as thermal storage [2]. However, many ‘real options’ models of electrical storage assume an infinite or a fixed rate of physical flow into or out of the ESD [3]. This infinite flow rate imitates financial trading in an inertia-free market. Such models set and solve an optimal stopping problem, of when to empty or fill the ESD instantaneously, when the selling price hits a suitable trigger level. Instantaneous emptying is not dynamically possible in a physical power system, but if ESDs were emptied at the largest physically feasible rate, this would create, rather than solve, system balancing problems for the physical system.

The present paper addresses the above problem in three innovative ways, by using a ‘proof of concept’ mathematical model. Its main features are as follows:

(i) The physical state variables include both a finite and instantaneously variable rate of in/out flow, and its integral. Using both variables makes the model physically realizable and allows it to balance two competing goals, namely system balancing (avoiding real-time penalties, by holding the rate of supply close to some committed rate

*C*) and arbitrage (which maximizes income from selling the maximum of energy at the daily peak of selling price—when balancing penalties also peak). The model assumes a market commitment process in which the joint owner of a WPG and an ESD can change*C*at hourly intervals, and there are continuous penalties in real time for deviating from the presently committed*C*. Hence*C*is a continuous control variable that is reset at discrete times. The time path of*C*has delayed and highly stochastic effects on future system balancing penalties and sales income.(ii) To imitate a future power system, having high wind penetration, the partial differential equation (PDE) model imposes the full economic cost of any system balancing needed for wind power supplied. This is done by a penalty function, which penalizes all divergences between supply delivered and commitment

*C*, continuously in real time. The penalty function can be adjusted to model the actual opportunity cost to the grid of ‘buying in’ system balancing from any source, which in turn sets the value to WPG/ESD of doing some of this balancing itself. In total, the PDE interacts with the market economy through three pricing functions: the cost of capital, the contracted sale value of committed electricity and the system balancing penalty.(iii) The model’s PDE includes stochastic dynamics for the wind supply and a stochastic electricity selling price (both have daily cycles), and it assumes simple deterministic dynamics for the ESD that are straightforward to generalize.

In the PDE, the stochastic, deterministic, financial and physical variables all interact instantaneously, and in the diffusion solution, all these variables, including the hourly changeable commitment decision *C*, also interact through space and time—an example of (joint) stochastic dynamic optimization (SDO). The hourly Bellman optimization of *C* maximizes the expected financial outcome along the infinity of dynamically realizable, continuous trajectories passing through every point in the model’s state space. As later results show, the Bellman solutions make unexpectedly subtle trade-offs between balancing penalties and selling price arbitrage, and the trade-offs vary strongly across the state space.

The structure of the paper is as follows. In §2 a brief introduction to the problem of stochastic storage for wind energy is given. In §3 the dynamics of the ESD are discussed. In §4 a ‘proof of concept’ SDO PDE is developed. Then §5 gives our results and §6 contains conclusions and prospects for future research.

## 2. Dynamics of wind and electricity price

The static distribution of wind power, relevant to long-term investments in WPG, is often assumed to be best approximated by a Weibull distribution [4,5]. However, this lacks a time-dependent behaviour, and it is not tractable over the time scales that are operationally important, which range from seconds (for system balancing) to multi-hour (for arbitrage). Helpfully Hill *et al*. [6] report finding a Gaussian time series beneath the daily seasonal patterns in wind data. As Sinden [7] notes, there are long-term seasonal variations over months (winter versus summer) and in some situations a weak diurnal cycle (e.g. stronger wind at night along coasts). There is also some ability to forecast the local wind strength rather approximately, within any 24 h period. This lead time is relevant for balancing within-day arbitrage gains against system balancing penalties. For ‘proof of mathematical concept’ purposes, we model wind strength by a stochastic differential equation (SDE), in which GBM mean-reverts towards a known weak diurnal cycle. The assumed SDE for wind strength *X* is
2.1Here *κ*_{X} is the constant local speed of mean-reversion of *X* towards its local mean *θ*_{X}(*t*) while *σ*_{X} is a constant volatility and d*W* is the increment of a Wiener variable. The parameter values assumed here are *κ*_{X}=0.1 h^{−1} and *σ*_{X}=0.2 h^{−1/2} and the daily seasonal forcing function for wind is
2.2where , *α*_{X}=0.375, *ψ*_{X}=2 h and *γ*=*π*/12 *h*^{−1}. A long-term static distribution of wind strength results from this SDE, which can be found by time-averaging over the solution to the associated Fokker–Planck equation. If we analyse the properties of the solution to the Fokker–Planck equation, we can derive the probability that *X*<0 under natural conditions, and we are able to artificially impose non-negativity if we choose, simply by using a reflecting (or no-flux) boundary condition at *X*=0 [8]. When solving the PDE later on we apply reflecting boundary conditions at *X*=0 that are outlined in §4c, meaning that even if negative values were possible we bound our domain by *X*=0. This distribution is plotted as the solid line in figure 1*a*, and the general similarity of this static distribution to the Weibull static distribution can be seen clearly. In figure 1*b*, the thick solid line represents the median of the distribution, the three dotted lines below it represent the 1%, 5% and 25% quantiles, and the three dotted lines above represent the 75%, 95% and 99% quantiles. The two distributions differ, but as noted above, the Weibull does not offer useful time dynamics for system balancing purposes, and as seen later, these different distributions of wind strength lead to rather similar distributions of electrical power output.

The main driver of arbitrage opportunities is the strong but also stochastic daily cycle of electricity price. We take both the wind strength *X* and the electricity selling price *Y* to be different GBMs over the following 24 h, and we assume that each mean-reverts towards its own daily cycle (strong for price and weak for wind), which is identical every day. In this analysis, we ignore the longer-term cycles of wind across the seasons in order not to overcomplicate matters. If a more detailed analysis were to be carried out, such effects could be taken into account most readily by solving the problem (in isolation) for representative seasonal days, and then taking the average value across them. Using a single-day cycle allows us to quickly converge to a periodic steady solution when solving the problem numerically.

We convert the wind strength *X* to a power output, through a smoothed nonlinear function *F*(*X*). We do not model changes in wind direction, because this variable is not present in most published data on WPG power output. Hence changes in wind direction, and possible engineering responses to these in WPG design, are not needed in a ‘proof of concept’ model of the optimal storage of stochastically arriving wind power. The output is scaled to a representative maximum power output of 1 MW (easily rescaled to model any desired wind power generation system or WPG), with a cut-in speed of approximately 4 m s^{−1}, a rated wind speed of approximately 13 m s^{−1} and a cut-out speed of approximately 25 m s^{−1} [9]; we show this power curve in figure 2*a*. Given the assumed daily trend and stochastics of wind described in (2.1), we can calculate the expected capacity factor at each time of the day, as shown in figure 2*b*.

Comparing these assumptions to the empirical analysis of UK wind farms in [7], the model’s assumptions give a peak output of around 50% of maximum capacity, which is typical of the empirical maximum seen in UK winter, and a trough output of 10% of maximum capacity, which resembles the empirical minimum in summer. The model’s grand mean output of 29.5% also resembles UK empirical values [9]. We note that the existence of cut-in and cut-out speeds reduces the difference between the GBM and the Weibull, as static distributions of actual power supply rates, in that the distributions differ most strongly at wind speeds that are too low or too high for generation.

We model the instantaneous market selling price of electricity as *Y* by the following (mean-reverting) SDE:
2.3where *κ*_{Y} is the constant local speed of mean-reversion of *Y* towards its local mean *θ*_{Y}(*t*), and *σ*_{Y} is its volatility. Here d*Z* is the increment of a Wiener variable, uncorrelated with d*W*. It should be noted that the linear mean-reversion term means that this is not a standard GBM model. Models similar to this have already been used many times in finance; for example, Lucia and Schwartz [10] fit the standard GBM mean-reverting model to data from the Nordic power exchange, whereas Brennan and Schwartz [11] use a variation on our SDE with linear mean-reversion to model interest rates. It is important to note again that negative prices are possible in the model since the mean-reversion coefficient could go negative. For our ‘proof of concept’ model, we choose to take an arbitrary set of parameters that demonstrate some properties of the UK market, such as an average price of around £40 MWh^{−1} and a daily wind cycle with a peak in the afternoon at 16.00,
2.4and the daily cyclical forcing function for *Y* is
2.5We choose the parameters as MWh^{−1}, *α*_{Y}=0.375, *ψ*_{Y}=14 h and *γ*=*π*/12 h^{−1}. We choose to bound possible values of *Y* ≥0 using a reflecting boundary condition at *Y* =0. Usefully, this treatment reduces the size of the computational task and simplifies the optimal strategy.

## 3. Dynamics of the energy storage device

The instantaneous energy content of the ESD is *Q*, where 0<*Q*<*Q*_{max} and *Q*_{max} is the designed capacity of the ESD. We define the charge or discharge rate from the ESD to be
3.1allowing to be both positive and negative. At any moment, the total system of the WPG plus ESD has a committed output rate of *C*. In the ideal case, the total system’s output equals *C*. However if *Q*=*Q*_{max} or *Q*=0, charge and discharge, respectively, are impossible. When *Q* is not at one of these bounds, the rates of charge and discharge are limited by the design of the ESD to a maximum charge rate *L*_{C} and a maximum discharge rate *L*_{D}. Richer ESD dynamics can be assumed, and in particular we limit our investigation to cases where *L*_{C} and *L*_{D} are equal even though there is no such physical restriction on ESDs. We assume here that there is a fixed percentage loss rate , such that a round-trip conversion into and out of the ESD will only return 70% of the initial energy. This is actually quite low for an ESD, with many ESD providers claiming up to 95% efficiency. However these figures should be offset by the potential degradation of the ESD through charge/discharge cycles, and since our model cannot currently take account of this we set our efficiency at the lower end. Mathematically, the above ESD is described by two scenarios. If the current WPG power output *F*(*X*)>*C* there is a surplus, so the ESD will if possible charge at up to its maximum rate, hence
3.2to ameliorate the discrepancy. If the WPG power output *F*(*X*)<*C* is in deficit, then the ESD will discharge at up to its maximum rate (after efficiency losses), so that
3.3The smoothing parameters, *ξ*_{C} and *ξ*_{D}, will limit the charging/discharging rate near full/empty to numerically smooth the solutions near the boundary, and they are chosen to be sufficiently large so that increasing them has little effect on the solution. This simplified model of an ESD has been previously used in [12].

Two alternative sets of parameter values are used to model two alternative designs of ESD. We define a ‘small’ ESD with *L*_{C}=*L*_{D}=250 kW, *Q*_{max}=1 MWh and we define a ‘large’ ESD with *L*_{C}=*L*_{D}=1 MW, *Q*_{max}=4 *MWh*. The ‘small’ ESD can charge or discharge at only a quarter of the WPG’s peak output rate, and it can store only one hour of the WPG’s peak output rate. The small ESD is expected to ‘smooth’ the WPG’s output over an hour or two, but not to ‘arbitrage’ the selling price of the WPG’s output, because this ESD can time-shift at most one hour of maximum wind power output towards the evening peak. In contrast, the large ESD can charge or discharge at the maximum output rate of the WPG, and it can also move 4 h of peak power output away from the early morning peak of physical supply towards the early evening peak, when physical demand and selling price both peak. Hence the large ESD is expected both to smooth the WPG’s power output and to time-shift it.

## 4. Modelling continuous-time penalties for over- and under-delivery

The smoothing of wind power is becoming an increasingly costly problem for networks around the world. At present there is a low penetration of such power in most of the world, and therefore, although each individual WPG produces very ‘noisy’ power, they are too few in aggregate to disturb the overall supply–demand balance, especially in richly connected regions like Germany. Accordingly, the scale of system balancing charges presently in force was historically designed to penalize occasional over- or under-supply by fossil generators (which are fairly stable over short periods). This scheme is often applied unchanged to WPGs, so that a WPG can deliver noisy power throughout an hour without penalty, provided that the cumulative total delivered matches the total committed.

At high levels of penetration, fluctuations of wind power have the potential to disrupt the total supply–demand balance on a historically unprecedented scale. Systems will then incur much higher balancing costs, and they are likely to pass these on as penalties to WPGs. This in turn should motivate WPGs towards some mix of under-committing their expected output (in the limit, discarding some power), and storing energy for later smoothing.

This paper’s model assumes a system of balancing charges in a generic form, which can be set sufficiently harshly (as a function of *F*(*X*)−*C*) to motivate WPGs to undertake some mix of under-commitment (of expected supply) and storage (of unexpected and/or uncommitted supply). The assumed system balancing charge penalizes both over- or under-supply in real time, so that the only penalty-free outcome is to deliver at exactly the committed wattage rate *C*. This earns the contracted selling price *Y* , a stochastic value which is fixed at the moment when *C* is fixed. Hence the delivery to the grid *D*(*X*,*Q*,*C*) at any given time is
4.1where the efficiency conversion means that only the quantity is provided from the ESD to the grid. Therefore, the instantaneous income function is defined by
4.2Here the parameter 0<*ζ*<1 defines the rate of price penalty for under- or over-delivery during d*t*. The rate of extra income earned on over-delivered power, at an instantaneous output rate *D*(*X*,*Q*,*C*)−*C*>0, is *Y* (1−*ζ*)<*Y* . Here the committed wattage rate *C* has been previously set to the system; therefore the opportunity cost of instantaneous over-delivery (e.g. after under-committing, when unable to store a surplus) is to lose the fraction *ζ* of the selling price of the wattage over-delivered. Since in our model we choose to bound *Y* ≥0 and *ζ*<1 for all *t*, the WPG has no incentive to curtail generation. It is often the case in real markets that either *Y* <0 or *ζ*≥1, causing excess power to be shed from the system. This situation could be included in future models by adjusting parameters and the relevant boundary conditions.

The corresponding penalty for instantaneous under-delivery (in addition to being unable to sell all of the wattage in the commitment *C*) is a cash ‘fine’ of the fraction *ζ* of the full spot sales value of whatever wattage fails to be delivered *C*−*D*(*X*,*Q*,*C*). A large enough under-delivery causes a negative instantaneous income from the wattage actually delivered.

### (a) Derivation of the partial differential equation for system value

Now define the contract *V* (*X*,*Y*,*Q*,*t*) in units [£] as the current value of the jointly owned WPG and ESD. This is the expectation under the real-world measure of all future discounted income, which the WPG plus ESD can make on sales in the electricity supply market. Ignoring the optimal commitment for the time being, we can write
4.3where and *C*=*C*(*X*,*Y*,*Q*,*s*) is the committed delivery amount at time *s*, which may be a function of all variables in the problem. We define *r* as a general discount rate, which can be the given risk-adjusted or risk-free value, as the nature of the problem requires. Changes to *r* will change the capital value of the WPG/ESD system over its lifetime, but should not change the optimal commitment policy *C** within any one day. The effects of discounting at a typical real rate of return (seldom outside the range 1–10% per annum) are far smaller over 24 h than the expected fluctuation of the real electricity price within that period.

Now let us define
4.4as the solution to the optimization problem at time *t*. Then to introduce our optimal commitment problem, we first assume that electricity supply sales (which commit us to a fixed rate of supply *C* for a fixed time interval Δ*T*) happen at discrete time points *t*=0,Δ*T*,2Δ*T*,… which are to be indexed by *k*. More precisely, at time *t*=*t*_{k}=*k*Δ*T* the owner must choose to commit to deliver at a rate of *C*∈[0,*C*_{max}] units of electricity supply, starting at time *t*_{k} and ending at *t*_{k+1}=*t*_{k}+Δ*T*. Then, the value of this contract given a commitment *C* is
4.5for *t*_{k}<*t*<*t*_{k+1}; here *V* * assumes that all future decisions are made optimally. We can simplify the notation at this stage by dropping variables that do not play an explicit part in the optimization, namely *X*, *Y* and *Q*, so that the quantities defined in (4.3)–(4.5) become *V* (*t*;*C*), *V* *(*t*) and *V* (*t*;*C*,*t*_{k+1}), respectively.

If we assume that the possible set of commitments is a finite set of values {*C*_{i}} where *C*_{i}∈[0,*C*_{max}], then at the decision time *t*_{k}, we must choose the optimal *C*_{i}. Once the decision is made, we must hold that choice of *C*_{i} constant over the period *t*_{k}<*t*<*t*_{k+1}, and the value of making that choice we will write as *V* _{i}(*t*;*t*_{k+1})=*V* (*t*;*C*_{i},*t*_{k+1}). Now the optimization becomes relatively simple and can be written as
4.6The optimal condition (4.6) can then be transformed into the terminal condition required to solve for the previous period [*t*_{k−1},*t*_{k}]. The appropriate condition at *t*_{k} is then
4.7for all *i*. The positive and negative superscripts take account of the fact that our choice of commitment is in the period [*t*_{k−1},*t*_{k}] or [*t*_{k},*t*_{k+1}].

Now if the market includes a delay, so that the owner must precommit to a delivery rate starting in the future, then at time *t*_{k} the owner must choose from the set {*C*_{i}} for the period starting at *t*_{k+1} and ending at *t*_{k+2}. We approximate the full set of admissible controls by a finite set of pairs, where each pair includes the commitment for the current period as well as the next. So we denote *C*_{i,j}=(*C*_{i},*C*_{j}) as the scenario in which *C*_{i} is the current commitment and *C*_{j} is the commitment at the next step, and *V*_{i,j} to denote the value of implementing that control strategy. We have
for *t*_{k}<*t*<*t*_{k+1}. Here *V* *_{j}(*t*_{k+1};*t*_{k+2}) indicates the value of committing *C*_{j} at *t*_{k+1} followed by the optimal commitments for *t*>*t*_{k+2}.

In order to optimally manage the operation of the wind farm, at time *t*=*t*_{k} we lock in the commitment at *t*_{k+1} by maximizing over all possible future commitment decisions *j* for each of the current possible commitments *i* (for which we are yet to make a decision). Mathematically, we can write the optimization as
4.8Now as we step back in time across the instant from just after *t*_{k} to just before *t*_{k}, we must take account of the fact that *C*_{i,j} refers to the current and future commitments and that the current one will become the future commitment for the previous period. So, if we write to denote the value the instant before the decision is made, and to denote the instant just after, we obtain the final condition when solving in the period *t*∈[*t*_{k−1},*t*_{k}]:
4.9

To value the contract in between agreement times *t*_{k}<*t*<*t*_{k+1}, with or without a delay before its commencement, we can use the Feynman–Kac formula (according to Itô calculus, see [13] for more details) to derive a PDE for *V*_{i,j}. It follows simply that we must solve the following PDE for all possible controls *C*_{i,j}:
4.10where *r* is the discount rate (per hour) and is the instantaneous income function as defined in (4.2). Note that the commitment that enters this equation is *C*_{i}, the current commitment, while the future commitments *C*_{j} enter through boundary conditions (4.9). When the problem to be solved does not include a time delay before the chosen commitment becomes effective, the *j* subscript can be omitted and the final condition (4.7) is used instead.

### (b) Boundary conditions

First we deal with the boundary conditions for *X* and *Y* , which are both dealt with in the same way. In order to derive boundary conditions at *X*=0 or *Y* =0, we simply set the term to zero in the PDE and solve the resulting equations. For example, on *X*=0 we solve
and a similar equation is derived for *Y* =0.

For the large *X* and large *Y* boundaries, we assume that the respective second-order differential term is small and can be neglected so that we may solve the resulting equation. So for large *X* the following equation is solved:
The boundary conditions in *Y* are derived in a similar way.

For the boundaries in *Q*, note that we have on *Q*=0, and on *Q*=*Q*_{max}, meaning that the characteristics of the solution do not travel outside the domain [0,*Q*_{max}] (helped in the discretization scheme by the terms *ξ*_{C} and *ξ*_{D}) when moving backwards in time. The result is that we are able to solve the full PDE on the boundaries.

We search for a perpetual solution to the problem which in this case does not mean that because our solution will be periodic. Observe that (2.1) and (2.3) both have 24 h periodic functions; then we might expect that any steady solution to the problem will also have a 24 h period. So we proceed by solving successive 24 h periods, using the solution from the last 24 h period as the final condition for the next period. We check on the difference between solutions at the end of each period, and once we have a solution that satisfies
for some small parameter *ϵ*, we stop solving. The convergence towards this periodic perpetual solution is laboriously slow, but we have developed some heuristic methods to speed matters up.

### (c) Numerical methods

The wind storage problem has the high dimensionality (for a PDE) of four state variables, including time. A further difficulty is introduced by the control set, meaning that the high-dimensional PDE must be solved for each and every possible control strategy. In the case of a trading delay, this means that an extra factor of *n*^{2} will be applied to the computation time, where *n* is the number of possible commitments at each time. Such calculations are time-consuming, but when their purpose is to value an investment proposition, a solution time of days, or even weeks, may be acceptable.

We apply standard finite-difference techniques, whose convergence and stability properties for this problem are described in [14]. A typical calculation will divide the domain (*X*,*Y*,*Q*,*t*,*C*) into a fixed grid comprising 251×51×51×401×21 points to solve for each contract period. We find that using a fine grid will start to take a substantial time, whereas coarser grids with, for example, 101×21×21×201×11 points provide comparable results in a much shorter time scale. Most of the results in this paper are calculated on the coarser grid. In terms of the *X* and *Y* grid, we choose *X*_{max}=100 and *Y* _{max}=100. A much smaller number of points in *Y* and *Q* can be used since there are no strong gradients in those directions.

## 5. Results

In this section, we define *C**(*t*) as the commitment that applies at time *t* and that has in general been set optimally at some time before *t*. In calculating the option’s value *V* over the four-dimensional space of *X*,*Y*,*Q*,*t*, and over all possible strategies for commitment *C*, the diffusion PDE’s numerical solution, the value *V* in £, is the result of a probability-weighted integral over infinitely many potential future trajectories for the continuous variables, between optimization at the successive discrete fixings of *C*.

Clearly the valid *C**, a scalar, contains no time-indexed information about the possible future time-domain trajectory of any of the variables *X*,*Y*,*Q* or of *V* or *C* (even though the physically and financially feasible trajectories for all these variables have been integrated over whilst calculating the *C** at each state point). The absence of time-domain output can have slightly different effects on WPG investors and on WPG operators.

Investors, in financial theory, are willing to spend any sum ≤*V* instantaneously to acquire the instantaneous value *V* of owning an (optimized) WPG plus ESD system; time-domain trajectories for any of these variables do not interest such investors. In contrast WPG operators live in the time domain and therefore need to understand how the optimized policy will drive the entire dynamic system in the time domain. They need to trust that it can and will remain economically optimized. A wiser (real-world) investor will also always check that the operators can understand, and will commit to achieving, the predicted optimality of any dynamic trajectories in the time domain which result from implementing *C**(*X*,*Y*,*Q*,*t*) at any point.

To assist intuition for operators and wiser investors, some sample plots are shown in figure 3 of the joint time trajectories of the following subset of optimized system variables: *Q*,*C**,*D*,*t*, where *D* is the instantaneous rate of power delivery from the WPG and ESD. Each plot was generated from a single 24 h realization of the joint stochastic paths of *Q*_{t},*X*_{t},*Y* _{t} and show the resulting *C**(*t*) as computed from the SDEs of the problem. The system balancing penalty payments and the resulting cash flows are omitted from the plots to avoid clutter. At the same time, the effects of day-specific stochastic realizations of *X* and *Y* , and of the other omitted variables, are partly visible within the behaviour of those variables actually plotted. The following results mostly use the default set of parameters shown in table 1.

### (a) Optimized dynamics of the physical variables: sample paths and summary statistics

Simulated paths of the physical dynamics are shown for both a large and a small ESD in figure 3. Taking first the large ESD in figure 3*a*, the fine dotted line plots the quantity of energy *Q* stored in the ESD. During the nightly peak of wind supply (hours 5–9) the optimal *C** completely fills the ESD. In this state, no more energy can be stored, so all wind power delivered (solid line of delivery *D*) has Brownian disturbances. The dashed line of the optimal commitment *C** balances the opposite risks of under-committing the Brownian output (causing a lower mean price for all amounts over-delivered) and of over-committing it (causing severe cash penalties for amounts under-delivered).

The commitment *C** is the locally horizontal dashed line, whose hourly jumps in level were optimally selected one hour previously. In hours 1–5 (from midnight) the ESD fills rapidly, and when the ESD is full (saturated) between hours 5 and 9, deliveries (solid) are free to deviate from commitment (dashed). Here the commitment does rise, but it remains low enough to ensure that almost all deliveries are over-deliveries. It is not in general optimal to avoid every risk, and close examination of the second half of hour 10 shows a brief threat of under-delivery, as a spike of demand causes a brief discharge from the ESD. Early in hour 11 the ESD is refilled swiftly, due to a low commitment. It is notable that this refilling is due to a prediction made during hour 9, not to error feedback of a stock fall-off during hour 10. This is because the demand spike arises late in hour 10, but the low optimized commitment *C** during hour 11, which swiftly recharges the ESD at the start of hour 11, was itself set at the start of hour 10.

From hour 12 to hour 24, the period of highest price, the commitment line *C* and the delivery line *D* coincide almost exactly. Because the optimal policy fills the ESD fully overnight (but not without some jumps in *C**), nothing more could have been done to increase the stock of energy available during the evening price peak.

Comparable results for the small ESD are shown in figure 3*b*. The small ESD was expected to do continuous smoothing (only) because its small total capacity (one hour of maximum output) and its small rate of charge/discharge (one quarter of peak output) both seem to forbid price arbitrage. There are interesting similarities and differences between the optimal *C** trajectories for the large and small ESDs. Like the large ESD, the small ESD takes the stored energy *Q* to higher levels during the night hours 5–9, but in this case not to saturation. Then, both the large ESD and the small ESD run *Q* down during the evening peak hours, 16–19.

The intuition here is precisely that the small ESD is so small. On inspection of figure 3*b*, the continuous delivery line *D* is mostly close to (but above) the stepped commitment line *C**. These facts, along with the high level of both *D* and *Q* at night, arise because, when using only a small ESD, the WPG must deliver most of its power during the nightly peak of wind strength. To minimize the balancing penalties on these low-priced sales, it is optimal for the small ESD to remain unsaturated all of the night, in order to absorb fluctuations in the majority of its deliveries—which must be made at night. Despite this, at the end of the night the optimal policy retains a fairly large quantity *Q* in store. This reserve, used up in the peak, allows the small ESD to deliver the small quantity of energy that it does sell at evening peak prices exactly on commitment.

Overall, both the small and large ESDs are ‘arbitraging the precision of control’ towards the evening price peak. During the price peak, both ESDs go from near full to near empty, and both deliver their cleanest power of the 24 h (closest to commitment) at this time, when both selling prices and balancing penalties are at their 24 h peak. Oddly, however, the large ESD does *not* as intuitively expected ‘both arbitrage and smooth’. This again is because of the actual size of this ESD. The large ESD ‘chooses not to smooth’ during the night because the balancing penalties, from selling un-smoothed power during the night, are less than the arbitrage value of selling almost all of the stored power (less any used up for later smoothing) during the evening peak of prices and penalties. This suggests two conjectures: (i) only an even larger store than this large ESD would both arbitrage and smooth during all 24 h and (ii) there must be diminishing returns to increasing the size of the ESD, because the optimal solution has already maximized the returns of a store of this size. These conjectures are explored in the next subsection, which briefly addresses the effect on value *V* of continuously improving the store capacity *Q*_{max} and the charge/discharge rate .

The two time plots for the different ESDs are clearly instructive and are long enough to show some statistically different variation within/between them. However, each is one realized sample path over 24 h, from an infinity of possible other paths from the same starting conditions, over all of which the diffusion solution has been integrated. It is however possible to compute the expected values (as opposed to sample values) of a wide range of statistical measures, when integrated over all the possible trajectories for a given optimized solution.

As Howell *et al*. [15] have pointed out, the optimal decision function *C**(*X*,*Y*,*Q*,*t*) has set the values of all feasible changes in *C** to maximize *V* . Here *V* is the integral (from any point in the state space) of the expected time-discounted future value of . So far has been used as an indicator for each state point of the actual rate of income earned/paid over d*t* when the system is at that state space point. The Bellman-optimized control policy *C** at each state point simply maximizes the probability-weighted (diffused) discounted value of along all trajectories from that state point.

Therefore, we are able to compute the expected and discounted value, under *C**(*X*,*Y*,*Q*,*t*), of any event or variable at each state point. To do so it is sufficient: (i) to retain the optimized *C**(*X*,*Y*,*Q*,*t*); (ii) to replace the cash flow indicator function at each state point with an indicator function for the event of interest (e.g. to record the qualitative fact of a physical over-delivery at any state point, set an indicator function 1_{C>D}; to record the size of a physical over-delivery set *C*−*D*; to record its square, set (*C*−*D*)^{2}, and so on); and (iii) to run the model to equilibrium backwards in time using the new and unchanged *C**. The resulting value in each *X*,*Y*,*Q*,*t* cell is the expected discounted value under *C** (over the time horizon being evaluated) of the outcome indicator in question. From this discounted value and the interest rate *r*, the undiscounted time average of the required qualitative or quantitative variable over all trajectories can be recovered.

### (b) Effects on annuity returns *AR*=*rV* due to varying the energy storage device design

We return to comparing different ESD designs as in the subsection above, and we use the same system balancing penalty as before, namely *ζ*=0.5, but we now examine the effects on *V* due to continuous variations in some of the important and expensive (design) parameters, namely *Q*_{max} and *L*_{C}, *L*_{D}.

Each solution is a hypercube of *V* (*X*,*Y*,*Q*,*t*) values, and £ year^{−1} annuity values are given by the simple formula *AR*=*r*_{a}*V* , where *r*_{a} is the annual interest rate so as to remove the effect of the interest rate from the valuation of a stochastic stream of cash flows. For intuitive clarity the hypercube of *V* values is averaged over a truncated state space, so reducing the value function *V* to a single representative scalar. This seems reasonable as a first approximation, because it is not likely that an investor would base a decision to buy or design an ESD based on its behaviour inside any small region of the state space, or of the design space.

The maximized value of the combined WPG and ESD is found by optimizing the commitment *C* as in (4.9) under the original forcing function (4.2) so as to calculate the net present value (NPV) of operating the ESD in perpetuity. In the case of a known limit on system life, this perpetuity value can be easily adjusted to a finite annuity. In general, there might be large positive or negative payments at the end of the asset’s life, but these are not considered here, as they are a well-understood problem in more conventional ‘real options’ analysis.

The total problem of engineering and economic design for a complete WPG/ESD system has three main steps: (i) calculate the NPV of the system’s maximized system earnings (income earned less system balancing penalties) for various combinations of design or operating parameters *Q*_{max}, *L*_{C} and *L*_{D}; (ii) calculate the capital costs of building an ESD with each set of design parameters, taking into account nonlinear economies of scale in building and technology readiness risks; and (iii) optimize the investment decision overall, using conventional real option tools to model any long-term random or non-random walks in the financial variables, also including realistic full costs of entry/exit.

In this paper, we briefly consider step (i) only, and only for the ESD, because this is where the present paper is more innovative. Hence we study the effects of changing the ESD’s operating parameters (*Q*_{max},*L*_{C}=*L*_{D}) on its income after balancing penalties. In figure 4, we vary two design parameters of the modelled class of ESD: the horizontal axis continuously varies ESD capacity *Q*_{max}, and the successively higher lines show the values of successively higher discrete rates of charge/discharge (*L*_{C}=*L*_{D}). All these curves share a near-vertical region close to the origin. This suggests that much of the value of any ESD comes from smoothing over a period of 10–15 min, where this value is near-uniform for all the designs plotted. This is because Brownian variance is linear in time, and hence any losses proportional to the standard deviation are proportional to the square root of the time interval.

Diminishing returns are seen in figure 4 to increase for both the capacity *Q*_{max} (continuous, left to right on the horizontal axis), and the charge/discharge rate *L*_{C}=*L*_{D} (successive vertical jumps in the vertical axis direction). Taking *Q*_{max}=1, *AR*_{ESD}=£11 500 *year*^{−1}, *L*_{C}=*L*_{D}=250 kW as an arbitrary datum point for comparison, more value is added by quadrupling the charge/discharge rate to *L*_{C}=*L*_{D}=1 MW than by quadrupling the capacity to *Q*_{max}=4. The nonlinear response of gross system value *V* to changes in *Q*_{max} and *L*_{C}=*L*_{D} permits quite complex economic trade-offs between *Q*_{max} and *L*_{C}=*L*_{D}. The optimum ESD design maximizes the (positive) NPV of ESD net earnings minus the capital cost of installing the ESD. The fact that there are sharply diminishing returns to increases in ESD capacity and ESD charge/discharge rate means that there can be (at most) one finite optimum size of ESD for a given WPG, if this pair is to be optimized in isolation from others, as seen easily in the unrealistically simple case where the capital cost of the ESD is linear in all of *Q*_{max},*L*_{C} and *L*_{D}.

Looking next along the lowest value curve (for the lowest discharge rate) in figure 4, ∂*AR*_{ESD}/∂*Q*_{max} is approximately zero after *Q*_{max}=2; therefore, there is little gain from storage capacity exceeding 8 h of the WPG’s peak output rate—at such a low discharge rate there is too little time for a larger ESD ever to empty or fill completely. In contrast for the higher discharge rates in figure 4, close to *L*_{D}=*L*_{C}=1 *MW*, the value is still increasing since ∂*AR*_{ESD}/∂*Q*_{max} is significantly larger than zero at the largest *Q*_{max} plotted, namely *Q*_{max}=4. This was foreseen in the previous subsection, in our discussion of figure 3. However, while there is still always a slight gain from a larger ESD (before its capital cost), the important feature of the top lines is that they crowd closer together. This must be due to the fixed size assumed for the WPG and to the fixed volatilities assumed for *X* and *Y* : eventually larger *L*_{*} rates add almost nothing to smoothing or arbitrage, and larger *Q*_{max} adds fewer and fewer opportunities for price arbitrage (due to moving deeper into the tail of the demand distribution, even before allowing for the capital costs of doing so). Within this logic the engineering design and the actual investment decision could be optimized simultaneously.

### (c) Effects of the one-hour time delay between fixing *C* and the start of its effect

In figure 5, the size of the system balancing penalty *ζ* varies left to right on the horizontal axis, and the effects of *ζ* on annuity return *AR* (vertical axis) are plotted for two states of ESD (present and absent) and for two rules of price setting (with and without an hour’s delay between setting the commitment *C* and the start of the hour committed). In all plots, an increase in *ζ* reduces *AR*. The plotted falls in *AR* are all initially nonlinear in *ζ*. Over the range 0<*ζ*<0.12 value falls steeply, but at a diminishing rate; after this the falls in *AR* in all plots tend towards linearity in *ζ*. This suggests some kind of saturation in the effects of higher *ζ* on the optimum policy itself: over the range 0<*ζ*<0.12, one can reduce penalties by making smaller commitments *C*. Losses tend to be linear in the remaining variable, the penalty itself. The limiting case is when the ESD’s capacity is zero, and when the commitment *C* must be set 1 h before it becomes operative; both apply to the lowest plot in figure 5. This plot shows the lowest level of *AR*, and also the earliest approach to linearity in *ζ*. Conversely, the highest of the value plots in this figure represents the case with the highest physical and pricing flexibility. These plots show how changes in the market arrangements (a delay in implementing *C*, and an increase in the system balancing penalty *ζ*) can change the optimum size of an ESD, as well as changing the optimum rules for operating it.

## 6. Conclusion and future work

In contrast to the classic Black–Scholes equation, which is purely financial and has no inertia or dynamic lags, the PDE of this paper incorporates three modifications to enable it to model a system which is partly physical and/or has ‘lagged’ deterministic dynamics:

(a) there is a forcing term, [£ h

^{−1}], which defines a rate of income or cost at every point in the state space (this ensures that the system’s value is decided not at the instant of its death, as in most financial options, but by the integration of income and expenditure over its complete lifetime);(b) there is a control variable

*C*which is continuous, but is reset only at discrete time intervals; and(c) there is an ‘integration term’ that can model deterministic physical and/or financial dynamics.

This integration term has the PDE’s units of [£ h^{−1}], but it is the product of a purely physical coefficient d*Q*/d*t* [MW], which describes the deterministic dynamics of the storage system, and a purely financial coefficient ∂*V* /∂*Q* [£ MWh^{−1}], which is calculated by standard backward Black–Scholes diffusion methods. This type of PDE is suited to the stochastic dynamic optimization of systems in which some of the physical and/or financial variables have deterministic dynamics. Results in the present ‘proof of mathematical concept’ model show that in a future power system which has high wind penetration, and therefore high system balancing penalties, a complex dynamic balance must be, and can be, maintained in real time between (i) present arbitrage profits, (ii) all future arbitrage profits, (iii) the present system balancing penalties and (iv) all future system balancing penalties. Because this model sets absolute physical bounds and/or absolute physical values, for both the WPG parameters and the ESD parameters, the model can in principle jointly optimize the engineering design parameters, the operating rules and the economical performance—with real options methods available to deal with longer-term stochastic price drifts.

Future work could include energy management within the supply grid and storage problems in general, not restricted to energy—e.g. financial system design/operation. Taking the energy opportunities would involve better estimates of the parameters for weather, prices, energy storage and trading, plus tests for the model’s robustness to errors in these estimates; wider opportunities within grid storage would need more realistic dynamic models of energy storage (e.g. electrical circuits and/or mechanical devices for energy storage) and more general ‘new modelling topics’ including the optimal design and/or operation of energy storage in aircraft, ships and cars.

## Data accessibility

All software used was written in C++ code; parameter inputs are provided in the text. The code is available to download from https://github.com/pjohno/WindFarmESDValuation.

## Authors' contributions

P.J. participated in the design of the model, wrote the numerical software to solve the problem, analysed the results and drafted the manuscript. S.H. conceived the model, analysed the results and helped draft the manuscript. P.D. helped design the numerical software and drafted the manuscript. All authors gave final approval for publication.

## Competing interests

We have no competing interests.

## Funding

P.J. was funded by the Engineering and Physical Sciences Research Council (EP/F027842/1).

## Acknowledgements

The authors thank two anonymous referees for providing extensive and constructive feedback that helped improve this work.

## Footnotes

One contribution of 13 to a theme issue ‘Energy management: flexibility, risk and optimization’.

- Accepted March 14, 2017.

- © 2017 The Author(s)

Published by the Royal Society. All rights reserved.