## Abstract

In recent work we uncovered intriguing connections between Otto’s characterization of diffusion as an entropic gradient flow on the one hand and large-deviation principles describing the microscopic picture (Brownian motion) on the other. In this paper, we sketch this connection, show how it generalizes to a wider class of systems and comment on consequences and implications. Specifically, we connect macroscopic gradient flows with large-deviation principles, and point out the potential of a bigger picture emerging: we indicate that, in some non-equilibrium situations, entropies and thermodynamic free energies can be derived via large-deviation principles. The approach advocated here is different from the established hydrodynamic limit passage but extends a link that is well known in the equilibrium situation.

## 1. Introduction

For systems in equilibrium, it is well known that the roles of energy and entropy can be understood rigorously in terms of large-deviation principles. We describe two examples below. Recently, we showed how large-deviation principles also allow us to understand the role of entropy in a specific *non-equilibrium* system [1]: the large-deviation behaviour of a system of independent Brownian particles connects rigorously to the entropy gradient-flow structure of the diffusion equation. We explain this connection in §3*a*.

The aim of this paper is to take this connection two steps further. The first step is to extend the connection of [1], which was studied in a discrete-time context, to the case of continuous time. The second step is to discuss a variety of examples that illustrate the breadth of this phenomenon and suggest a general principle that might hold across a wide range of systems.

In equilibrium systems, the connection is as follows. Let *X*_{i}, where *i*=1,2,…, be independent and identically distributed stochastic variables with distribution *μ* on a state space . We think of the *X*_{i} as positions of particles in the space , so that their concentration is given by the *empirical measure* . Sanov’s theorem [2], §6.2 states that the random measure *ρ*_{n} satisfies the *large-deviation principle*
1.1
where the *rate function* *I*≥0 is the *relative entropy* of *ρ* with respect to *μ*, which is
This property illustrates how the relative entropy *H*(*ρ*|*μ*) characterizes the probability of observing a state *ρ*: higher relative entropy means smaller probability, as described by (1.1). It also provides a rigorous version of the well-known thermodynamic principle that a system aims to maximize its entropy (which corresponds to minimizing *H*(*ρ*|*μ*) because the physical entropy carries the opposite sign). For in the limit of large *n*, the characterization (1.1) gives vanishing probability to all states *ρ* except those for which *I*(*ρ*)=0; in other words, only the minimizers of *I* have non-vanishing probability.

This connection between entropy and large-deviation principles extends to systems involving energy. In appendix A, we show, for instance, how coupling a system with energy *E* to a heat bath with temperature *θ* changes the rate functional *I* to the *free energy* :
1.2
In the same way as (1.1) explains why relative entropy is minimized, (1.2) explains why systems coupled to a heat bath minimize their free energy: when *n* is large, only states *ρ* with near-minimal free energy will have finite probability.

As mentioned above, the central aim of this paper is to show how this connection between entropy and free energies on the one hand and large-deviation principles on the other extends into the realm of non-equilibrium systems. We restrict our focus to the important class of *gradient flows*, where this connection explains many aspects of these systems. As the entropy appears as the driving force of the process, we will occasionally call this functional ‘energy’ to conform with the standard terminology for gradient flows.

The general philosophy is illustrated by:
1.3
The bottom row in this diagram is the classical connection between a stochastic *n*-particle system and its hydrodynamic limit: the typical case is that, as , the particle system becomes deterministic, and the empirical measure of the particle system converges to the solution of the (deterministic) continuum equation. Note that this statement concerns only the typical behaviour of the particle system; large deviations are not captured.

In the left-hand column, a large-deviation principle characterizes the behaviour in the limit in a different manner, in terms of a functional *I* or *I*_{h} of the *time-dependent* system, as we shall see below. The right-hand column is the connection between an evolution equation and the corresponding gradient-flow structure, when it exists.

The central statement of this paper is the double-headed arrow at the top. It provides a connection between representations with more information on both sides: on the left-hand side, the rate functional contains more information than just the most probable behaviour, and on the right-hand side, the gradient-flow structure is an additional structure on top of the equation itself.

In the following sections, we illustrate the double-headed arrow in a number of concrete examples, first in the discrete-time approximation (§3) and then in continuous time (§4). Section 5 generalizes the argument to non-quadratic dissipations. As the implications of this connection are best appreciated once one has an overview of the breadth of the phenomenon, we postpone most of the discussion of the consequences to §6.

The mathematical results described in this paper are not new, and are mostly due to other authors, such as Freidlin & Wentzell [3], Dawson & Gärtner [4,5], Feng & Kurtz [6], Kipnis *et al.* [7] and others. Instead, we see the novelty of this paper in extracting from these results the suggestion of a general principle connecting the broad class of gradient flows with large deviations of stochastic processes. A particularly interesting aspect of this connection is that thermodynamic quantities are derived in a non-equilibrium context.

## 2. The Wasserstein metric

Much of this paper centres on the Wasserstein metric and Wasserstein gradient flows. The (quadratic) *Wasserstein distance* between two probability measures *ρ*_{0} and *ρ*_{1} with finite second moments is [8]
2.1
where the infimum is taken over all *q* with marginals *ρ*_{0} and *ρ*_{1}, i.e. over all *q* satisfying,
We also need a local metric tensor associated with the Wasserstein distance. The Brenier–Benamou formula [9] gives an alternative formulation of *d* as an infimum over curves of measures *t*↦*ρ*(*t*) such that *ρ*(0)=*ρ*_{0} and *ρ*(1)=*ρ*_{1},
2.2
Here, the local norm ∥⋅∥_{ρ,*} at a given point *ρ* is derived from an inner product (which is the local metric tensor) formally given by
2.3
where ∇ is the usual gradient in , and the *p*_{i} solve the equation div(*ρ*∇*p*_{i})=*s*_{i} in (see [4,10] or [6], appendix D for a rigorous definition).

A *Wasserstein gradient flow* is a gradient flow of an energy with respect to the Wasserstein metric structure. A curve of measures *t*↦*ρ*(*t*) is a solution of such a gradient-flow equation if its time derivative ∂_{t}*ρ*, in the sense of distributions, satisfies
2.4
Here, can be any functional on the space of probability measures and is its variational derivative. A straightforward calculation shows that this is equivalent to the equation^{1}
2.5
By analogy with gradients in Riemannian geometry, this suggests defining the *Wasserstein gradient* of a functional as [11]
2.6

Below, we shall also use more general versions of this structure. Replacing *ρ* above by a general diffusion matrix *D*(*ρ*), we define
2.7
Repeating the construction above, it follows that the * D-Wasserstein gradient* of a functional is characterized by the equation
2.8

Gradient flows have natural time-discrete approximations, constructed in an iterative manner:
2.9
This is essentially a backward-Euler discretization, as can be recognized by comparing it with the -gradient flow . For this equation, the backward-Euler discretization is constructed by solving
for *x*_{k}, which is equivalent to minimizing
2.10
Note the similarity between (2.10) and (2.9): in both expressions, the first term measures the distance between old and new states, whereas the second term favours a reduction of the functional , respectively *E*.

## 3. Discrete time

We can now formulate the first example.

### (a) A system of independent Brownian particles

We consider *n* independent Brownian particles *X*_{n,i}(*t*) in , with deterministic initial positions *X*_{n,i}(0)=*x*_{n,i}, each hopping to a new position *X*_{n,i}(*h*) at time *h*>0 with a Gaussian probability with mean *x*_{n,i} and variance^{2} 2*h*.

As in the equilibrium case discussed above, we describe this system by the empirical measure at a given time *t*, and we assume that the initial measure *ρ*_{n}(0) converges to a given measure *ρ*^{0} as . In the limit of large *n*, the probability of this jump process attaining any *ρ*^{1} at time *t*=*h* is again characterized in terms of a large-deviation principle,
3.1
where the rate functional *I*_{h} has an explicit expression that can be derived from Stirling’s formula (see [1] for the expression; in [1], *I*_{h} is only the limit of a sequence of rate functionals, but can be shown to be a rate functional in its own right [12,13]).

The main result of [1] is that
3.2
where
3.3
Here, *d* is the Wasserstein distance defined above and
is the relative entropy of *ρ* with respect to the Lebesgue measure . The rigorous formulation of (3.2) is a Gamma-convergence result of *I*_{h} to *K*_{h} after both have been desingularized.

The functional *K*_{h} has the same form as the functional in (2.9), because the term Ent(*ρ*^{0})/2 does not influence the minimization with respect to *ρ*^{1}. Therefore, the time-discrete approximation that one constructs with this *K*_{h} is an approximation of the Wasserstein gradient flow of the entropy Ent, which is the diffusion equation [14]
3.4
This is the connection referred to above: the large-deviation behaviour of the system of particles is represented by the rate functional *I*_{h}, and this functional is asymptotically equal to the functional *K*_{h} that defines the gradient-flow formulation of the diffusion equation. The approximation result (3.2) therefore creates a link between the gradient-flow structure of the deterministic limit equation on the one hand and the large-deviation behaviour of the system of particles on the other. The same result can be shown for Gaussian measures on the real line [15]. In the rest of this paper, we shall see many more versions of such connections.

*Consequences*. While most of the discussion is deferred to §6, we mention here a few consequences of the fact (3.2) that the large-deviation rate functional *I*_{h} and the constructing functional *K*_{h} of the gradient flow are equal in the limit *h*→0.

First, the construction of a time-discrete approximation (2.9) to the diffusion equation (3.4) was motivated in [14] by analogy with the backward-Euler discretization (2.10). This is an indirect and purely mathematical motivation, which explains neither the reason for the appearance of the entropy and the Wasserstein distance in *K*_{h} nor the reason for minimizing just this combination.

The connection between *K*_{h} and *I*_{h}, however, gives a direct motivation. By (3.1) and (3.2), *K*_{h}(*ρ*;*ρ*^{0}) is a measure of the likelihood of observing a state *ρ* after time *h*. For large *n*, the characterization (3.1) implies that only the global minimizer of *I*_{h}, and therefore of *K*_{h}, is observed with non-vanishing probability. The stochastic minimization (3.1) of *I*_{h} thus becomes converted into an absolute minimization of *K*_{h}.

Second, in the limit *h*→0, the *proof* that *I*_{h}≈*K*_{h} explains the origin of the two terms of *K*_{h}. The entropy arises from the indistinguishibility of the particles after transforming to an empirical measure. The origin of the Wasserstein cost functional |*x*−*y*|^{2} in (2.1) can be traced back to the exponent of the term e^{−|x−y|2/4h} in the Gaussian transition probability of the Brownian particles. We return to this issue in §6.

## 4. Continuous time

The construction in the previous section is discrete in time: the rate function *I*_{h} describes the probability distribution of the state *ρ*_{n}(*h*) at time *h*>0. A continuous-time large-deviation principle, where one considers deviations from a whole path of empirical measures for a fixed terminal time, provides a different kind of insight and may be even closer to the gradient-flow formulation. We start with some preliminaries.

### (a) An alternative formulation of the gradient-flow structure

In a formal sense, Wasserstein gradient flows and many others can be written in the form
4.1
where is the ‘energy’ functional driving the evolution and *M*_{ρ} is a *ρ*-dependent symmetric mapping.^{3} In the case of Wasserstein gradient flows, for instance,
as follows by comparing (2.5) with (4.1). Taking this case of Wasserstein gradient flow as an example, we shall encounter equation (4.1) in a different form, connected to the functional *J* given by
4.2
where
and the norm is the norm defined in (2.3). The norms ∥⋅∥_{ρ} and ∥⋅∥_{ρ,*} are dual norms, and ∥⋅∥_{ρ,*} has the alternative characterization
By writing the energy difference as
using the inner product defined in (2.3), the functional *J* in (4.2) can now be written as
This expression shows that *J* is non-negative. It also implies that if *ρ* satisfies *J*(*ρ*)=0, then equation (4.1) holds at almost each time 0<*t*<*T*; therefore,
4.3
In the examples in this paper, *J* is a large-deviation rate functional, and this equivalence is the connection between the large-deviation behaviour, given by *J*, and the gradient-flow structure of the limiting equation.

If we take for the operator *M*_{ρ} in (4.1) not the Wasserstein operator but a general operator, then we find a similar statement:
4.4
where
4.5
and the two norms are defined, at least formally, by
and
We now discuss a number of examples.

### (b) Continuous-time large deviations for the diffusion equation

Taking the same system of particles as in §3*a*, the continuous-time large-deviation principle for that system of Brownian particles is as follows. Fix a terminal time *T*>0 and consider the whole path of empirical measures [0,*T*]∋*t*↦*ρ*_{n}(*t*). Then the probability that the *entire curve* *ρ*_{n}(⋅) is close to some other *ρ*(⋅) is characterized [4,10] as a *pathwise large-deviation principle*,
where now
4.6
This rate function *I* has the structure of *J* in (4.2). Using the fact that
we find that
Therefore, the Entropy–Wasserstein gradient flow is connected to the large-deviation behaviour of a system of stochastic particles, in the sense of (4.3). We discuss this further in §6.

Incidentally, the convergence result (3.2) can be derived from the pathwise large-deviation principle, as is worked out in detail in [17].

### (c) Diffusive particles with interactions

We extend the previous example by including interaction of the particles with a background potential *Ψ* and with each other via an interaction potential *Φ*, and modelled by Itô stochastic differential equations. Specifically, we take the microscopic system of *n* particles to be described by
4.7
where, for each *i*, *W*_{i} is a Brownian motion in . The hydrodynamic limit of this system is the equation
4.8
The large-deviation rate functional describing fluctuations of the system is given by (see [6], theorem 13.37 and also [4] for weakly interacting diffusive particle systems)
4.9
which again can be written as
where the free energy is given by the sum of entropy and potential energy,
4.10
Indeed, equation (4.8) is the Wasserstein gradient flow of the functional .

### (d) The symmetric simple exclusion process

The diffusion equation (3.4) is the continuum limit for various stochastic processes, one of which is the system of Brownian particles described above. Here, we briefly describe the symmetric simple exclusion process, which has the same limiting equation in a parabolic scaling. However, it has a different large-deviation behaviour, which gives rise to a different gradient flow.

Consider a periodic lattice *T*_{n}={0,1/*n*,2/*n*,…,(*n*−1)/*n*} and its continuum limit, the flat torus . Each lattice site contains zero or one particle; each particle attempts to jump from its site to a neighbouring site with rate *n*^{2}/2, and they succeed if the target site is empty. We define the configuration *ρ*_{n}: *T*_{n}→{0,1} such that *ρ*_{n}(*k*/*n*)=1 if there is a particle at site *k*/*n* and zero otherwise. For this system, the large deviations are characterized by the rate function [7]
4.11
where the norm ∥⋅∥_{ρ(1−ρ),*} is given by (2.7) with *D*(*ρ*)=*ρ*(1−*ρ*). This functional can be written as
where the mixing entropy Ent_{mix} is defined as
This is true as −∂_{xx}*ρ* is the ‘*ρ*(1−*ρ*)’-Wasserstein gradient of Ent_{mix}, by
(compare this with (2.8)). Therefore, *I* is of the form (4.5), with operator
and the equation ∂_{t}*ρ*=∂_{xx}*ρ* is (also) the gradient flow of Ent_{mix} with respect to this ‘*ρ*(1−*ρ*)’-Wasserstein structure ∥⋅∥_{ρ(1−ρ),*}.

## 5. Further generalizations

The arguments of the integrals in (2.2), (4.6), (4.9) and (4.11) are quadratic. This arises from a parabolic rescaling and the central limit theorem, and it leads to a gradient flow with an (formal) inner-product structure, or, equivalently, to a linear operator *M*_{ρ} in (4.1). Other types of randomness lead to non-quadratic gradient-flow structures, as we now describe.

A close inspection of the arguments of §4*a* shows that they hinge on the inequality
together with the observation that equality holds if and only if . This can be generalized by introducing a Legendre pair of convex functions *ψ*_{ρ} and , where the subscript *ρ* serves to indicate that they may depend on *ρ*, in the same way as the operator *M*_{ρ} does; in this context, *ψ*_{ρ} is often called the dissipation potential. In terms of this pair, we then derive that
and equality holds if and only if
5.1
The case of the *M*-gradient flow (5.1) corresponds to

The obvious generalization of (4.3) then is
5.2
where *J*_{ψ} is given by
5.3

### (a) Birth–death processes

A simple example of a stochastic process with non-quadratic dissipation *ψ* and a corresponding generalized gradient flow is a birth–death process, which is a continuous-time jump process on . The system may only jump to neighbours, from position *k* with rate *a*_{k} to *k*+1 and with rate *b*_{k} to *k*−1. We construct a continuum limit by defining the new stochastic variable *U*_{n} by rescaling time *t* and position *k*(*t*) with *n*
A standard argument gives the large-deviation behaviour for *U*_{n} in terms of the rate functional (see [18] for a finite-lattice proof of the claims made below). If we choose the jump rates so that
for *α*>0 and some smooth function , then the rate functional is
with
Writing
it follows that *ψ*(*ξ*)=*α*(*e*^{ξ}+*e*^{−ξ}), and *I* can be written in the form (5.3).

The corresponding generalized gradient flow in , given by (5.1), reads Observe how this differs from the standard (quadratic–dissipation) gradient flow, which is ; the non-quadratic dissipation preserves the sign of the velocity, but not its amplitude. Because of the preservation of sign, the energy is monotonic along a solution,

This example shows how the connection between large-deviation principles and (generalized) gradient flows extends to the case of non-quadratic dissipations. Note that here the large deviations refer to a single process, and henceforth are not due to an averaging process as in the empirical measure case.

### (b) Spin–flip processes

For , let be the one-dimensional *n*-torus . An Ising spin at sites of takes values in {−1,+1} and is subjected to a rate-1 independent spin–flip dynamics. We consider the trajectory of the magnetization, i.e. , where *σ*_{i}(*t*) is the spin at site at time *t*. The generator for the process (*m*_{n}(*t*))_{t≥0} is given by
for *m*∈{−1,−1+2*n*^{−1},…,1}. The trajectory of the magnetization satisfies a large-deviation principle, i.e. for every trajectory *γ*=(*γ*_{t})_{t∈[0,T]},
where the Lagrangian *L* can be computed following the scheme of Feng & Kurtz [6], example 1.5. We obtain
This can similarly be written as , where
and
the involved energy is
Then, the limiting equation (5.1) can be written as . This is consistent with the optimal trajectory via the Euler–Lagrange equation, *m*(*t*)=*m*_{0} e^{−2t}.

## 6. Discussion

In the sections above, we have described a number of pairs of systems, each consisting of a stochastic process and its continuum limit. Each pair has the property that the large deviations of the stochastic process are closely linked to a gradient-flow structure of the limit equation. These links are time-dynamic versions of the equilibrium connection mentioned in the Introduction. We now comment on some issues that are shared among the different examples.

### (a) Example: Wasserstein gradient flows

There is a long tradition of work on variational structures for irreversible systems, going at least as far back as Rayleigh [19] and Onsager [20,21]; see, for example, [22] for a good review of the history and a discussion of the relationship with concepts such as minimal and maximal entropy production.

Most of the existing literature, however, treats the presence, or absence, of a gradient-flow structure as a simple fact. By contrast, we put forward the conjecture that the gradient-flow structure is not an accident; it arises from the fluctuation behaviour of the microscopic system. Therefore, its properties and its origin can be understood from that same fluctuation behaviour.

As an example, let us consider the large class of Wasserstein gradient flows. Upon its introduction in a variational setting [23,14], this structure raised many questions, one of which is: why does the Wasserstein metric figure as the measure of dissipation in this structure?

The examples above show how this can be understood. To be precise, we claim that *the Wasserstein metric characterizes the mobility of the empirical measure of a large number of Brownian particles.* Indeed, this claim can be made meaningful in a number of different ways:

— In discrete time, letting

*ρ*_{n}be the empirical measure of a system of Brownian particles, we have which follows from (3.3) and was proved independently in [12].— In continuous time, for the whole path of empirical measures up to a fixed terminal time

*T*, we have where*I*, defined in (4.6), measures the size of the deviation by the Wasserstein metric tensor ∥⋅∥_{ρ}.— When the particles also undergo a deterministic drift, the same statement holds with

*I*defined by (4.9), where again the size of the deviation is measured by the norm ∥⋅∥_{ρ}.

The origin of this role of the Wasserstein metric as the mobility of Brownian particles can be understood by considering the geometric relationship between and the space of measures endowed with the Wasserstein distance. Consider the embedding
Note that *e* is not one-to-one, because the numbering of the particles is lost: the particles have become indistinguishable. Indeed, one can identify the set of empirical measures of the form with the space obtained by identifying all elements in that are rearrangements of each other, i.e. the quotient space , where *S*_{n} is the set of all permutations of *n* elements.

Now the Wasserstein metric on makes the embedding of in *isometric*. This follows from the simple property that
6.1
With this property, the role of the Wasserstein distance can be fully explained. The Freidlin–Wentzell theory for Brownian particles [3] shows how the mobility of a *vector* *X*=(*X*_{1},…,*X*_{n}) of *n* Brownian particles has a stochastic mobility given by the Euclidean norm , in the sense that
The loss of information upon introducing indistinguishability, or equivalently upon transforming to empirical measures, implies by the contraction principle [2], §4.2.1 that the exponent becomes replaced by its minimum under rearrangement,
This expression is equal to *n*/(4*h*) times (6.1). If we gloss over the approximations in different limits (*h*→0 and ), this explains how the Wasserstein distance is the natural measure of the mobility of an *empirical measure* of Brownian particles, through transformation of the original mobility of a single Brownian particle.

### (b) Consequences for modelling

Gradient flows can be thought of as overdamped systems, in the sense that any inertial effects are damped out quickly by the effects of viscous, frictional or other damping forces, and can therefore be neglected. One way of modelling such overdamped systems is therefore by assuming an abstract gradient-flow structure from the start and making it concrete by postulating an energy and a dissipation potential *ψ*. These choices should be motivated, and in the case of Wasserstein and Wasserstein-like dissipations this motivation is non-trivial.

One area where this is particularly visible is in the modelling of lower-dimensional structures, such as threads and surfaces, moving through a viscous fluid. The biology of subcell structures knows many such examples, including microtubules and lipid bilayers. The assumption of overdampedness is reasonable in this viscosity-dominated situation, but the interplay of geometry and mechanics makes the direct formulation of evolution equations complicated and error prone (e.g. [24]). In this context, the construction of evolution equations through the postulation of energy and dissipation is often simpler and allows for clearer separation of the various assumptions. However, it remains necessary to motivate the choices made for the energy and the dissipation.

To take the Wasserstein metric as an example, its interpretation as the measure of mobility of empirical measures of Brownian particles provides such a motivation, and because of the connection to the Brownian mobility of the particles it also allows for generalization to other situations.

But similar arguments apply to other dissipations, coupled to other underlying stochastic processes. For instance, the symmetric simple exclusion process leads to *ρ*(1−*ρ*) mobility, implying that if such an exclusion process is one’s idea of the underlying system, then the *ρ*(1−*ρ*)-dissipation is the natural choice.

One might go even further. The diffusion equation (3.4) is known to be a gradient flow in many different ways: in addition to the two mentioned above, also as the *L*^{2}-gradient flow of the Dirichlet integral , for instance, as the *H*^{−1}-gradient flow of the *L*^{2}-norm, and even as the *H*^{s−1}-gradient flow of the *H*^{s}-seminorm for each . For the two structures that we have discussed, the different underlying stochastic processes provide clear reasons for the differing dissipations and energies. Here, we formulate:

### Conjecture 6.1

*Each gradient-flow structure can be connected to an appropriate stochastic process via a large-deviation principle.*

To the extent that this conjecture turns out to be true, it provides an explanation for the occurrence of multiple gradient-flow formulations of the same differential equation.

### (c) Geometry and reversibility

There are interesting connections between the geometry of the Brownian noise, the reversibility of the stochastic process and the question of whether the resulting evolution equation is a gradient flow or not.

This becomes apparent when we modify the system of §4*c* by introducing a diffusion matrix and replacing the scalar *σ* by a mobility matrix , thus obtaining
6.2

The large-deviation rate functional of the system is similarly given by
6.3
where the norm ∥⋅∥_{D(ρ),*} is induced by (2.7) with *D*(*ρ*)=*ρσσ*^{T}. Formula (6.3) implies that the hydrodynamic limit of this system is the minimizer of *I*, satisfying
6.4
With this additional parameter freedom, it is not always possible to write (6.3) in the form (4.5). This depends on whether the cross term in (6.3) is an exact differential, i.e. whether there exists a functional such that
This is the case if and only if *σσ*^{T} is a positive multiple of *A*, a condition that is familiar from the fluctuation–dissipation theorem, also known as the Einstein relation. In that case, and writing *σσ*^{T}=*kTA* for some ‘temperature’ *T*>0 and the Boltzmann constant *k*,
where *M*_{ρ}*ξ* is defined as −*div* *D*(*ρ*)∇*ξ* and the free energy is a modification of (4.10),
Then the rate functional *I* can be written in the form (4.2) as
and the evolution equation (6.4) is the (modified, *D*-) Wasserstein gradient flow of .

Our freedom to choose *A* and *σ* separately gives us the insight that, for this system, the following statements are equivalent:

—

*σσ*^{T}=*kTA*for some*T*>0;— the evolution (6.4) is a

*D*(*ρ*)-Wasserstein gradient flow of ;— the rate functional

*I*can be written in the form (4.2); and— for any finite number

*n*of particles, the system (6.2) is reversible.

This equivalence, which holds for this specific system, suggests a much deeper connection between reversibility and gradient-flow structure. This is an interesting topic that we shall return to in a future publication.

### (d) Diffusion with decay

Yet another generalization concerns systems with decay, which is implemented as a jump process. Peletier & Renger [13] have derived a similar connection for the case of diffusing particles that are convected and may also decay, given by the equation (in one space dimension)
6.5
with and *λ*≥0.

In [13], the particles perform a Brownian motion in the spatial dimension, augmented by a deterministic drift given by −∂_{x}*Ψ*. This part of the process gives rise to the two terms ∂_{xx}*ρ*+∂_{x}(*ρ*∂_{x}*Ψ*). In addition, the particles change their state from ‘normal’ to ‘decayed’, after an exponentially distributed time; this part gives rise to the term −*λρ*. The opposite transition is not allowed: decay is irreversible.

An analysis similar to §3*a* then connects the large-deviation rate functional for this stochastic particle system to a corresponding minimization problem describing the time-discrete evolution, i.e. the equivalent of (2.9). In this case, the time-discrete minimization problem is
6.6
where and the free energy is defined as
Peletier & Renger [13] explain how the structure of (6.6) can be understood: if we define
and
then the terms inside the infimum in (6.6) can be written as . In this decomposition, the first term describes diffusion and convection by *Ψ* of the joint measure *ρ*+*ρ*_{ND} starting from the previous state *ρ*^{k−1}, similar to (3.3) and (2.9). The second term describes the decay process, in which the joint diffused-and-convected measure *ρ*+*ρ*_{ND} is split into a part *ρ* that remains ‘normal’ and the remainder *ρ*_{ND} that becomes decayed.

While the structure of (6.6) is not the same as (2.9) and (6.6) does not represent a time discretization of a gradient flow, both are minimization problems that define the next step in the iteration, and in both cases one can identify a driving force (the free energy , in the case of (6.6)) and a mechanism that acts as a brake. In the ‘brake’ is the Wasserstein metric , and in it is the two terms . In both cases, these terms restrict the movement of , respectively *ρ*, and this restriction becomes more and more severe as *h*→0.

### (e) General remarks on interacting particle systems

Section 5 explained how, once a large-deviation principle for the interacting particle system with rate functional *I*(*ρ*) is established, different Wasserstein-type metrics occur in a natural way. Such large-deviation results are stronger than results on limit equations. Indeed, a part of the standard proof of a large-deviation result involves modifying the process by adding a forcing such that a given path which does not solve the original limit equation solves the limit equation of the modified process. So, the question arises of whether the point of view advocated in this paper has the potential to derive limit equations without using large-deviation results which contain limit results derived in the classical way. This open question is of particular importance because limit points of the implicit time discretization provide a weak notion of solution of the limit equation in cases where distributional solutions are not appropriate, e.g. for problems with a sharp interface such as the mean curvature flow (this is the concept of generalized minimizing movement, as used in, for example, [25,26]). In situations such as (4.7), where a particle interacts with the average of many others, the distribution of a family of initially independent particles stays close to a product measure (propagation of chaos), so a modification of the techniques for independent particles seems promising.

## 7. Conclusion

The examples in this paper illustrate how the two concepts of large-deviation principles for stochastic particle systems and gradient flows are closely entwined. Further examples are currently under study, such as Brownian particles with inertia, which lead to the Kramers equation, and rate-independent systems such as friction and fracture. We expect that many more examples of this kind will be uncovered.

## Funding statement

M.A.P. and J.Z. have received funding from the Initial Training Network ‘FIRST’ of the Seventh Framework Programme of the European Community (grant agreement no. 238702). S.A. was supported by EPSRC grant no. EP/I003746/1.

## Acknowledgements

The authors wish to thank Dejan Slepčev and Rob Jack for various interesting discussions. We thank the referees for valuable comments and advice on the first draft.

## Appendix A. Free energy and the Boltzmann distribution

In this appendix, we show how the *free energy*
A 1
arises from the coupling of a system of particles with a *heat bath*. Here, *θ*>0 (in joules) is the temperature of the heat bath, and the *Boltzmann constant* *k* has the value 1.4×10^{−23} J *K*^{−1}. The measure is the probability distribution of the particles in a state space , and *E* is the average energy of the particles
where is a fixed function that we call the *energy* of a state . We now construct an explicit system in which arises as the large-deviation rate functional. This will allow us to interpret all these concepts in the context of large deviations.

We start by choosing a system *S* and its connection to a heat bath called *S*_{B}. Both are probabilistic systems of particles; *S* consists of *n* independent particles , with probability law ; similarly *S*_{B} consists of *m* independent particles , with law . The total state space of the system is therefore .

The *coupling* between these systems is done via an *energy constraint*. We assume that there are energy functions and , and we will constrain the joint system to be in a state of fixed total energy, i.e. we will only allow states in that satisfy
A 2
The physical interpretation of this is that energy (in the form of heat) may flow freely from one system to the other, but no other form of interaction is allowed.

Similar to the example in the Introduction, we describe the total states of systems *S* and *S*_{B} by empirical measures and . We define the average energies and , so that the energy constraint (A.2) reads *nE*(*ρ*_{n})+*mE*_{B}(*ζ*_{m})=constant.

By Sanov’s theorem, each of the systems *separately* satisfies a large-deviation principle with rate functions and . However, instead of using the explicit formula for *I*_{B}, we are going to assume that *I*_{B} can be written as a function of the energy *E*_{B} of the heat bath alone, i.e. . For the coupled system, we derive a joint large-deviation principle by choosing that (i) *m*=*nN* for some large *N*>0 and (ii) the constant in (A.2) scales as *nN*, i.e.
Formally, the joint system then satisfies a large-deviation principle
with rate functional
Here, the constant is chosen to ensure that .

The functional *J* can be reduced to a functional of *ρ* alone,
In the limit of large *N*, one might approximate
The first term above is absorbed in the constant, and we find
We expect that *I*_{B}′ is negative, because larger energies typically lead to higher probabilities, and therefore smaller values of *I*_{B}. Now we simply define , and we find
This is the same expression as (A.1). Note that the right-hand side can be written as , where is the tilted distribution
This derivation shows that the effect of the heat bath is to *tilt* the system *S*: a state *ρ* of *S* with larger energy *E*(*ρ*) implies a smaller energy *E*_{B} of *S*_{B}, which in turn reduces the probability of *ρ*. This is reflected in the approximation of *I*_{B}(*ζ*). The role of temperature *θ* is that of an exchange rate, because it characterizes the change in probability (as measured by the rate function *I*_{B}) per unit of energy. When *θ* is large, the exchange rate is low, and then larger energies incur only a small probabilistic penalty. When temperature is low, then higher energies are very expensive, and therefore rarer. From this point of view, the Boltzmann constant *k* is simply the conversion factor that converts our Kelvin temperature scale for *θ* into the appropriate ‘exchange rate’ scale.

In thermodynamics, one often encounters the identity (or definition) *θ*=d*S*/*dE*. This is formally the same as our definition of *kθ* as −d*I*_{B}/*dE*, if one interprets *I*_{B} as an entropy and adopts the convention to multiply the non-dimensional quantity *I*_{B} with −*k*.

## Footnotes

One contribution of 11 to a Theme Issue ‘Entropy and convexity for nonlinear partial differential equations’.

↵1 To reduce the use of parentheses, The operator ‘div’ will always be assumed to apply to the whole product that follows it.

↵2 In this paper, we consider Brownian particles with generator

*Δ*, rather than (1/2)*Δ*, and therefore the transition kernel is .↵3 This way of writing the gradient flow highlights the fact that a gradient flow is an instance of a

*generic*evolution, in which the conservative evolution term is absent [16].

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.