## Abstract

We discuss a new class of approaches for simulating multiscale kinetic problems, with particular emphasis on applications related to small-scale transport. These approaches are based on a decomposition of the kinetic description into an equilibrium part, which is described deterministically (analytically or numerically), and the remainder, which is described using a particle simulation method. We show that it is possible to derive evolution equations for the two parts from the governing kinetic equation, leading to a decomposition that is dynamically and automatically adaptive, and a multiscale method that seamlessly bridges the two descriptions *without introducing any approximation*. Our discussion pays particular attention to stochastic particle simulation methods that are typically used to simulate kinetic phenomena; in this context, these decomposition approaches can be thought of as control-variate variance-reduction formulations, with the nearby equilibrium serving as the control. Such formulations can provide substantial computational benefits in a broad spectrum of applications because a number of transport processes and phenomena of practical interest correspond to perturbations from nearby equilibrium distributions. In many cases, the computational cost reduction is sufficiently large to enable otherwise intractable simulations.

## 1. Introduction

Computational methods that can efficiently but also accurately capture a wide range of length and time scales remain highly desirable and are the subject of considerable research effort. In the area of kinetic transport, multiscale problems are common in micro- and nanoscale gas dynamics [1], plasma physics [2], micro- and nanoscale solid-state heat transfer as mediated by phonon transport [3], electron transport [4,5] and neutron transport [6,7]. The primary challenge associated with such problems stems from the computational cost associated with capturing dynamics occurring over a wide range of scales.

One typical example of such problems is thermal response to laser irradiation: the impulsive nature of laser heating means that at early times (compared with the mean time between collisions associated with the carrier) transport is ballistic [8,9]; only at late times the Fourier description becomes appropriate. During the ballistic and transition [1] regimes, a kinetic description is necessary; on the other hand, a kinetic description of this problem for macroscopic times is typically intractable. Multiscale methods aim to provide means for solving such problems without introducing approximations; instead, they seek to combine a number of different descriptions into a formulation that is globally consistent and more computationally efficient than simulation of the problem at the highest level of fidelity (which is often intractable).

Multiscale problems abound in science and engineering. Reviewing methods and applications in this general context is a truly daunting task. For this reason, our discussion in this paper focuses on methods applicable to the primary topic of this review, namely kinetic transport. Moreover, in order to simplify the discussion, technical details will be presented in the context of dilute gases which have been the subject of a considerable part of recent method development efforts. The more general validity of the ideas presented and discussed here is demonstrated in §5*b*(ii), where their application to nanoscale solid-state heat transfer as mediated by phonons is discussed.

### (a) Background

Phenomena involving kinetic transport but in which a limiting description (e.g. a set of continuum equations with Navier–Stokes closure^{1}) may be used in part of the computational domain can be treated using hybrid atomistic-continuum methods [10–13]. These methods minimize the computational cost by using the kinetic description only in regions where it is needed, while the remainder of the computational domain is treated by the less expensive continuum description. In addition to a clear spatial separation of kinetic and continuum regions, these methods assume that the region where kinetic effects are important is sufficiently small that the kinetic part of the calculation is feasible. These formulations typically employ domain decomposition techniques [14] to separate the two regions in question and thus coupling of the two descriptions becomes central to the success of the method [15]. In this context, a region where the two descriptions are simultaneously valid—for matching to take place—is required; criteria for transition from one description to the other [12,16] are also necessary. Fortunately, continuum descriptions of kinetic systems are typically well characterized—e.g. in a dilute gas, the molecular distribution function is given by the Chapman–Enskog result in the Navier–Stokes (diffuse transport) limit [17–19]—and thus *rigorous* matching of the two descriptions is possible [12]. This has led to the development of fairly sophisticated hybrid simulation methods [12,13,20] in which the transition from one description to the other is automatic and, in some cases, seamlessly integrated within a mesh refinement process. The situation is not entirely resolved in the case of dense fluids [14,15,21] and solids [22] where the molecular distribution function corresponding to the continuum limit is not known, and thus constructing a microscopic (e.g. molecular) state that exactly corresponds to the continuum solution in the matching region is not *rigorously* possible yet [15].

Projective integration [23] also relies on a clear separation of scales. In this method, a microscopic (molecular) description is simulated for a small amount of time, and based on this simulation, the coarse dynamical behaviour can be extracted (e.g. through an eigenvalue analysis) and projected forward in time using timesteps appropriate for a numerical solver of the coarse, macroscopic, description. In direct analogy, one can perform homogeneous simulations of a microscopic description in small, *suitably terminated*, regions of space (e.g. located at the nodes of a macroscopic numerical solver) thus obtaining local closure (constitutive) information which informs a numerical solution of the macroscopic set of conservation equations [23–25]. The above approaches are strictly valid (and worthwhile) when a clear separation between the microscopic and macroscopic time scales and length scales, respectively, exists; under these conditions, treatment of the inner model in a local and/or quasistatic fashion is justified. Examples include formulations that require steady solutions from the microscopic solver (e.g. for steady problems), where the error owing to imprecise initialization of the microscopic solver may be avoided. Another example is the *special case* of kinetic problems in which only initialization of microscopic states corresponding to continuum fields is necessary; in such cases, the existence of Chapman–Enskog-type results [3,17–19] which provide analytical descriptions for the distribution function makes this rigorously possible, as in the case of hybrid Navier–Stokes/kinetic domain-decomposition-based methods discussed above, where coupling happens in a region where both descriptions are valid (see [11,12] for an example). Unfortunately, in the general (and most practically relevant) case where the distribution function is not known analytically, ad hoc initialization and domain termination introduces approximation error [26,27] which is expected to corrupt the closure information subsequently extracted from the microscopic solver.

For a large class of problems of practical interest, the separation of scales is not sufficiently large to justify the approaches described above. In such cases, methods which go beyond spatial decomposition or assumptions of local and/or quasistatic behaviour are needed. The approach reviewed in this paper instead relies on an *algebraic decomposition* of the distribution function. This approach, introduced in §3, is viable because, for kinetic problems, the governing equation is typically sufficiently tractable to allow rigorous descriptions of the evolution of the constituent parts into which the distribution function is decomposed.

The resulting methods are powerful and exhibit a number of desirable features. Let us consider the approach to the Navier–Stokes limit as an example, where it is well known [17] that the distribution can be decomposed into a (local) equilibrium distribution and a (small) correction which describes deviations therefrom. In contrast to standard particle simulation methods, which become increasingly more expensive as this limit is approached (larger length scales imply not only more simulation particles, but also longer evolution time scales), a formulation comprising an equilibrium part and a part simulated by particles becomes more efficient as this limit is approached, because it is able to relegate increasingly more of the description to the equilibrium part, thus *reducing* the number of particles required for the simulation. As we show in §5, such decompositions can also be thought of as variance reduction techniques, which result in simulation methods that exhibit significantly reduced statistical uncertainty [28] compared with traditional particle simulation methods.

We begin our review with an introduction to kinetic transport descriptions and typical methods of simulation.

## 2. Boltzmann equation

We consider kinetic transport as described by the Boltzmann equation in the absence of external fields, which can be written in the form
2.1
where *f*(**r**,**c**,*t*) is the velocity distribution function [17], **r**=(*x*,*y*,*z*) is the position vector in physical space, **c**=(*c*_{x},*c*_{y},*c*_{z}) is the molecular velocity vector, *t* is time and [d*f*/d*t*]_{coll} denotes the collision operator which takes different forms depending on the carrier modelled. Extension of this work to include external fields does not present any additional conceptual challenges.

The original Boltzmann collision operator was written for a hard-sphere gas [17] in the form
2.2
where is the gas-molecule effective diameter, is the differential collision cross-section for hard spheres and *f*^{′}=*f*(**r**,**c**^{′},*t*), *f*_{1}=*f*(**r**,**c**_{1},*t*), ; here, {**c**_{1},**c**} are the pre-collision velocities and are the post-collision velocities, related to the pre-collision velocities through the scattering angle ** Ω**. Integration in velocity space extends over all possible velocities, and the solid angle is integrated over the surface of the unit sphere ().

The global equilibrium distribution
2.3
for a gas at (constant) reference temperature *T*_{0} and number density *n*_{0} is a special case of a Maxwell–Boltzmann distribution
2.4
where in the latter **u**_{MB}=**u**_{MB}(**r**,*t*), , *T*_{MB}=*T*_{MB}(**r**,*t*) and *n*_{MB}=*n*_{MB}(**r**,*t*). A gas described by (2.3) has a mean free path , while the molecular collision time is given by , where is the most probable molecular speed, *R*=*k*_{b}/*m* is the gas constant, *k*_{b} is Boltzmann's constant and *m* is the molecular mass. Based on these quantities, we can define a Knudsen number *Kn*=*λ*/*L*, where *L* is the characteristic hydrodynamic length scale.

The hard-sphere operator predicts transport coefficients that vary with temperature according to *T*^{1/2}, while real gases exhibit a more general temperature dependence *T*^{ϖ}, where the temperature coefficient (*ϖ*) is typically closer to . For this reason, the variable hard sphere (VHS) model [29] has been developed. This model, which finds wide application in engineering studies, achieves the desired temperature dependence (*T*^{ϖ}) by introducing a velocity-dependent collision cross-section into the Boltzmann operator (2.2) through a hard-sphere cross-section with a velocity-dependent diameter , where and is the reference diameter at temperature *T*_{ref}.

Owing to the complexity associated with the above collision operators, simpler approximations have been proposed. The most well known, perhaps, is the Bhatnagar–Gross–Krook (BGK) model [17]
2.5
which, by construction, retains (2.3) as the global equilibrium distribution and also preserves mass, momentum and energy conservation (collisional invariants) [17]. This model has also found widespread application in many areas of physics where it is frequently used to model the transport of electrons [4,5], photons [18] and phonons [30], and is more generally referred to as the relaxation-time approximation. In this model, *τ* is the relaxation time and *f*^{loc} is the local equilibrium distribution; for a dilute gas, the latter is of the Maxwell–Boltzmann form
2.6
where
2.7
is the local gas number density,
2.8
denotes the local flow velocity, and , where
2.9
is the local gas temperature. The molecular mean free path associated with the BGK gas is given by .

### (a) Particle methods for solving the Boltzmann equation

Owing to the high dimensionality associated with *f*(**r**,**c**,*t*), as well as the need to accurately and *stably* capture the propagation of travelling discontinuities in the distribution function [31], particle methods are generally the method of choice for solving the Boltzmann equation [32]. The most prevalent Boltzmann solution method is a particle method known as direct simulation Monte Carlo (DSMC) [29], which in addition to satisfying the above requirements, combines simplicity with an intuitive formulation which naturally employs importance sampling, and very low memory usage.

DSMC solves [33] the Boltzmann equation by simulating molecular motion as a series of timesteps, each of length *Δt* (≪*λ*/*c*_{0}) [33,34], and during which collisionless advection and collision substeps are performed [29]. This can be thought of as a splitting scheme in which the collisionless advection substep integrates
2.10
while the collision substep integrates
2.11
During the advection substep, the positions of all particles are updated according to their velocities. During the collision substep, the distribution function is updated by processing binary collisions between collision partners chosen from the same computational cell of size *Δx* (≪*λ*) [33,35] using an acceptance–rejection procedure [36]. This splitting procedure leads to a second-order accurate in time algorithm provided the latter is appropriately symmetrized [34,37]. DSMC is also second-order accurate in space [35,37]. Detailed descriptions of the DSMC algorithm can be found in [36].

## 3. The Chapman–Enskog expansion

The Navier–Stokes limit can be recovered from the Boltzmann equation through the celebrated Chapman–Enskog procedure, which seeks to expand the distribution in a power series of the Knudsen number *Kn*. The essential result of this procedure is the expansion [18]
3.1
where the first term in the expansion is the *local equilibrium* which gives rise to the Euler set of hydrodynamic equations (no diffusive transport); keeping terms up to and including *O*(*Kn*) recovers the Navier–Stokes set of equations [18] (diffusive, linear-gradient transport). More details, as well as a complete mathematical treatment of the hard-sphere collision operator (2.2), can be found in [19]; a less detailed exposition can be found in [18].

## 4. Algebraic decomposition based on the Chapman–Enskog expansion

The Chapman–Enskog result (3.1) provides a natural framework for reducing the computational cost associated with approaching the *Kn*≪1 limit with particle simulations. Specifically, the algebraic decomposition
4.1
motivates a simulation scheme in which a particle method is used to simulate *g*(**r**,**c**,*t*) and a method more suited to the characteristics of *f*^{loc}(**r**,**c**,*t*) is used to capture the evolution of the latter [38]. In fact, because
4.2
the latter method only needs to update *f*^{loc} by only considering its advection (2.10). In contrast to traditional particle methods which become stiff in the *Kn*≪1 limit, methods based on this decomposition should, in theory, relegate increasingly more of the description to *f*^{loc}(**r**,**c**,*t*) as (since ) and require a smaller number of particles in this limit.

Unfortunately, decomposition (4.1) is no longer preserved by the advection operator. This can be illustrated by considering the form of the distribution function
4.3
following the action of (2.10) over a timestep *Δt*. In general, *f*^{loc}(**r**−**c***Δt*,**c**,*t*) is not a Maxwell–Boltzmann distribution and thus *f*^{loc}(**r**−**c***Δt*,**c**,*t*)≠*f*^{loc}(**r**,**c**,*t*+*Δt*). This implies that a scheme based on (4.1) will, in general, require some form of reconstruction process that restores
4.4
after every time the advection operator is applied (e.g. every timestep). Perhaps the simplest approach would be to generate particles by sampling *f*^{loc}(**r**−**c***Δt*,**c**,*t*)−*f*^{loc}(**r**,**c**,*t*+*Δt*) and add them to the particles which represent the distribution *g*(**r**−**c***Δt*,**c**,*t*). This requires the generalization [39] of the particle description by including negative particles, since *f*^{loc}(**r**−**c***Δt*,**c**,*t*)−*f*^{loc}(**r**,**c**,*t*+*Δt*) will, in general, be of indeterminate sign. Such approaches are discussed in §§5 and 6; in the next section, we discuss a number of approaches [38,40,41] where the decomposition (4.1) is instead constrained to be a convex combination of two positive functions.

### (a) Applications

In one class of approaches aimed at problems in the *Kn*≪1 limit [38,40,41], decomposition (4.1) is constrained to be of the form
4.5
with *w*(**r**,**c**,*t*)∈[0,1] determining the ‘fraction’ of the equilibrium distribution *f*^{loc} to be ‘separated out’. The existence of *w* is a result of the requirement that *g*(**r**,**c**,*t*)≥0, which allows the latter to be straightforwardly interpreted as a probability density of positive particles, but means that, in general, *g*(**r**,**c**,*t*)≠*Kn*⋅*h*(**r**,**c**,*t*)+*O*(*Kn*^{2}), i.e. the existence of a local equilibrium distribution is not fully taken advantage of. On the other hand, in the limit, the correct limiting distribution is recovered ( and ).

A number of implementations of this procedure have been reported. In [38], spatially homogeneous relaxation problems were considered. In [40,41], one-dimensional shock problems were studied. As discussed above, in spatially dependent problems, the advection operator does not preserve the form of (4.5), but instead produces
4.6
which requires some form of reconstruction. Reconstruction schemes are first discussed in [38], where it is proposed to convert *w*(**r**−**c***Δt*,**c**,*t*)*f*^{loc}(**r**−**c***Δt*,**c**,*t*) into particles and add those to the population representing *g*(**r**−**c***Δt*,**c**,*t*)—in other words, create an intermediate representation with *w*=0. From this intermediate representation, *w*(**r**,**c**,*t*+*Δt*) and *f*^{loc}(**r**,**c**,*t*+*Δt*) can be determined using a number of approaches, the most direct of which would be to sample the particle population (measure its moments) to determine the parameters of *f*^{loc}(**r**,**c**,*t*+*Δt*).

In [40], both components of *f*, namely *w*(**r**,**c**,*t*)*f*^{loc}(**r**,**c**,*t*) and *g*(**r**,**c**,*t*), are simulated by particles. The algorithm for advecting *w*(**r**,**c**,*t*)*f*^{loc}(**r**,**c**,*t*) is found to be ‘somewhat clumsy’ [40]; it is proposed that a Navier–Stokes-based approach for the advection step of *w*(**r**,**c**,*t*)*f*^{loc}(**r**,**c**,*t*) is preferable. Various approaches using a numerical (e.g. finite volume) solver to perform the advection step of *w*(**r**,**c**,*t*)*f*^{loc}(**r**,**c**,*t*) are presented in [41,42], where also a number of reconstruction schemes are discussed and evaluated. Results demonstrate accurate simulation of shockwave propagation in the range 10^{−5}≤*Kn*≤10^{−1}.

In these studies, algebraic decomposition is coupled to a time-relaxed [43] time-integration scheme which removes the stiffness associated with time integration of (2.11) in the limit. The time-relaxed scheme uses the formal representation of the solution to (2.11) in the form of an infinite Wild expansion [44] to emulate an implicit integration method, which is stable for all timestep sizes, and approaches the correct limiting distribution (*f*^{loc}(**r**,**c**,*t*)) for ; however, complexity considerations limited implementations to a first-order scheme, obtained by *approximating* the high-order terms in the expansion by a Maxwellian distribution [38], which amounts to thermalizing particles that have had more than a single collision. More recently, higher-order schemes for the VHS model have been presented [45], but not implemented within an algebraic decomposition framework.

## 5. Algebraic decomposition and variance reduction

The property
5.1
where *f*^{MB} is *any* Maxwellian distribution, means that this distribution plays a central role in kinetic problems, even away from the *Kn*≪1 limit. It is particularly useful in cases where the deviation from a well-defined equilibrium state is small. In such cases, it can be used [46,47] for obtaining solutions to the Boltzmann equation; the solution proceeds by linearizing the governing equation and boundary conditions about the reference state *f*^{0}(**c**), resulting in a linear problem in terms of the deviation from equilibrium, *f*^{d}(**r**,**c**,*t*), defined by
5.2
Small deviation from equilibrium typically implies [46] that the variation of local conditions throughout the domain remains sufficiently small that *f*^{0} may be based on any convenient (but physically relevant) reference properties (e.g. uniform initial condition if applicable, system average properties, etc).

In the interest of brevity, we will denote small deviation from equilibrium by *ϵ*≪1, where *ϵ* is a problem-dependent measure of the departure from equilibrium which usually can be taken to be the Mach number *Ma* () or a dimensionless temperature difference *ΔT*/*T*_{0}, where *U* is the characteristic flow speed, *ΔT* is a characteristic temperature difference, and *γ* is the ratio of specific heats ( for a monoatomic gas).

We emphasize here that a small *f*^{d} does *not* require *Kn*≪1. We can see this by noticing that *ϵ*≪1 does not involve a length scale restriction. For example, in the case of gas flow, the von-Karman relation *Ma*≈*Re* *Kn* implies that the Mach number can be small for *Kn*≫1, provided the Reynolds number *Re* is sufficiently small, which can be realized when the characteristic length scale *L* is sufficiently small. Here, *Re*=*ρUL*/*μ*, where *μ* is the gas viscosity and *ρ* (=*mn*) is the gas density.

### (a) Variance reduction

Decomposition (5.2) can be used [28] for reducing the statistical uncertainty of direct Monte Carlo simulations of the Boltzmann equation by describing part of the solution, namely *f*^{0}(**c**), deterministically. The need for this arises because modern applications associated with nanoscale science and engineering typically involve small deviations from equilibrium [1,47], which translates to small hydrodynamic signals (e.g. flow velocity, heat flux, etc.). In this limit, DSMC, *as well as all molecular methods* [48], become very expensive because resolution of the hydrodynamic signals requires very large numbers of samples. Specifically, in the limit of small deviation from equilibrium, statistical fluctuations are dominated by the equilibrium spectrum (which can be quantified using equilibrium statistical mechanics [48]) that does not depend on *ϵ* and makes resolution of smaller signals increasingly difficult [48].

Decomposition (5.2) lends itself to a control variate approach for reducing the statistical uncertainty of Monte Carlo integration. In control variate Monte Carlo integration, we write
5.3
where the function *ξ*(**c**) fulfils the following requirements:

— captures most of the variation of

*f*(**c**) (i.e.*ξ*(**c**)≈*f*(**c**)) and— can be deterministically (analytically or otherwise) evaluated.

Using a Monte Carlo method to evaluate *only the term* results in significantly reduced statistical uncertainty, because most of the statistical uncertainty is removed through the deterministic evaluation of [49].

As stated above, decomposition (5.2) lends itself naturally to control variate integration, since it clearly identifies a control *ξ*=*f*^{0} that is guaranteed to satisfy the above requirements ((1) *f*^{0}≈*f* and (2) equilibrium moments are easy to calculate deterministically, if not intuitively obvious); moreover, it has a number of desirable attributes which alleviate the limitations arising from using a time-dependent equilibrium distribution (4.1), discussed in §4. Specifically, since *f*^{0} does not vary in time, it does not need to be evolved by any ‘auxiliary’ numerical method, nor does it need to be reconstructed every timestep.

A simulation method using the above construct reduces to a method for simulating the dynamics of the deviation from equilibrium *f*^{d}=*f*−*f*^{0}; such a method has been developed [32,50–52] and is referred to as low-variance deviational simulation Monte Carlo (LVDSMC), which highlights its connection to DSMC to which it is related. LVDSMC proceeds in time using the standard splitting scheme (2.10) and (2.11) on the ‘deviational population’ of *signed* particles (i.e. positive or negative) sampling *f*^{d}. Specifically, it follows directly from (2.10) and (5.2) that
5.4
and thus the motion of deviational particles is identical to that of physical particles. The effect of collisions is implemented by interpreting the collision integral in terms of source and sink terms
5.5
which modify the distribution by generating signed particles (with sign sgn(*f*^{loc}−*f*^{0})) from the distribution |*f*^{loc}−*f*^{0}| and deleting particles from the deviational distribution.

In the interest of simplicity, the above discussion considered the BGK model with a constant relaxation time; more details for this implementation can be found in [51]. An example with a variable relaxation time is discussed in §5*b*(ii). The hard-sphere case was first treated in [32], while recently, collision algorithms with no timestep error for the more general VHS model were developed [53] and implemented [52]. A discussion of stability problems associated with integrating [d*f*^{d}/d*t*]_{coll} by extending DSMC collisional processes, rather than the generation/deletion interpretation discussed above, can be found in [32,39]. Applications to microscale transport problems are discussed in §5*b*.

An interesting feature of these algorithms is that by removing the fluctuations associated with equilibrium, the resulting methods exhibit statistical uncertainty which is proportional to the non-equilibrium signal. In other words, the relative statistical uncertainty (standard deviation in the signal over the signal characteristic magnitude) is independent of the signal magnitude; this means that these methods can resolve, at a constant relative statistical uncertainty, arbitrarily small deviations from equilibrium at a cost that does not increase as the signal decreases. This is demonstrated in figure 1, which shows the relative statistical uncertainty, *σ*_{T}/*ΔT*, in the case of heat transfer between two parallel plates at temperatures *T* and *T*+*ΔT*, respectively, as a function of *ΔT* [51]; in the figure, *σ*_{T} is the standard deviation in the temperature estimate. The figure also shows the relative statistical uncertainty for DSMC, which grows as *ΔT* decreases, since in that case *σ*_{T} is dominated by the equilibrium fluctuations [48,51] (and is thus a constant). We note that the theoretical speedup scales as the square of the ratio of the two relative statistical uncertainties.

### (b) Applications

Below, we review a few examples that illustrate the computational savings of variance-reduced computational approaches based on algebraic decomposition of the distribution function.

#### (i) Simulation of Knudsen compressors

Figure 2 shows an LVDSMC simulation of thermal transpiration flow of a hard-sphere gas [52] through a single stage of a Knudsen compressor at a Knudsen number *Kn*=*λ*/*L*=1; the lower boundary is diffusely reflecting with a temperature (*T*_{w}) given by
while the upper boundary is specularly reflecting, imposing a symmetry boundary condition. The system is periodic in the axial (flow) direction (*T*(*x*,*y*)=*T*(*x*+2*L*,*y*), etc.).

Knudsen compressors are devices with no moving parts, which exploit the kinetic phenomenon of thermal transpiration—the tendency of a rarefied (*Kn*>0) gas in contact with a wall along which a temperature gradient exists to move from cold regions to hot regions [46]—to pump or compress a gas and more recently to separate gases [54]. These devices generally consist of many stages, each with a capillary section (*x*<*L* in the figure) with a positive streamwise temperature gradient, followed by a connector section (*x*>*L* in the figure) with a negative temperature gradient. Simulation of these devices is challenging because of the high cost associated with obtaining noise-free results for representative (moderate) temperature differences [55]. LVDSMC overcomes this limitation by simulating only the deviation from equilibrium.

#### (ii) Phonon transport for microscale heat transfer

Applications of the methods discussed above can also be found in the field of microscale solid-state heat transfer. Heat carriers (phonons) feature a mix of ballistic and diffusive behaviour at scales comparable to the phonon mean free path (O(100 nm) for Si at 300 K), which manifests itself through the breakdown of Fourier's law. In spite of the conceptual and physical differences between molecular gases and so-called phonon gases (Bose–Einstein distribution, existence of a dispersion relation, frequency dependent scattering rate, etc.), direct Monte Carlo simulations are widely used to solve phonon transport problems [56–60]. Practical applications of such problems are drawn from thermoelectrics, manufacturing, design and cooling of microelectronic devices, and more generally from any application involving microscale solid-state thermal management. Recent applications typically involve small temperature differences, making variance reduction very useful.

At an equilibrium state with temperature *T*, phonons follow Bose–Einstein statistics leading to an equilibrium distribution given by
5.6
Non-equilibrium transport of phonons is well approximated by a Boltzmann transport equation (BTE) that is very similar to the BGK model presented in §2; however, in the phonon case, the distribution function represents the occupation number of the energy levels and is a function of space, the phonon wavevector (**k**), time and polarization *p*; the main polarizations for three-dimensional crystals are longitudinal acoustic, transverse acoustic, longitudinal optical and transverse optical [3]. Although phonon wave vectors are quantized, the energy levels can be treated as a quasi-continuum by introducing the density of states which accounts for degeneracies of energy levels and therefore allows sums over **k** to be transformed into integrals. Owing to the presence of a dispersion relation that is typically of the form *ω*=*ω*(*k*,*p*), where *ω* is the phonon angular frequency and *k*=|**k**|, it is usually more convenient to use where ** Ω** is the solid angle associated with the representation of

**k**in spherical coordinates. The density of states

*D*(

*ω*,

*p*) then allows one to write . The resulting Boltzmann equation reads 5.7 where

**v**

_{g}=

**v**

_{g}(

*ω*,

*p*) is the phonon group velocity,

*τ*=

*τ*(

*ω*,

*p*,

*T*

_{loc}) is the relaxation time and

*T*

_{loc}is the (local) temperature defined through the implicit relation 5.8

In contrast to the BGK model (see §2), here the scattering rate depends on frequency and on temperature. As a result, the local equilibrium distribution in the collision term is evaluated [17,60] at the pseudo-local temperature (i.e. ), which ensures that energy is conserved during scattering, namely 5.9

Given that the only conserved quantity during scattering is energy, it is preferable [61,62] to multiply the BTE (5.7) by (energy of a phonon) and to consider particles that simulate phonon bundles with fixed energy. Using this approach, energy conservation is automatically ensured if we conserve the number of computational particles, whereas in previous formulations [58,59], a phonon deletion/creation process was required.

Variance reduction can again be achieved by considering the difference between the actual distribution and an appropriate nearby equilibrium described by . The equation obeyed by the difference can be readily written down as
5.10
and deviational particles can be used to simulate it. In this case, particles representing the distribution are deleted at a frequency-dependent rate *τ*(*ω*,*p*,*T*_{loc})^{−1}, while particles of sign are generated from the distribution .

As discussed in §5*a*, this formulation removes the statistical uncertainty associated with equilibrium fluctuations which can become prohibitive when the deviation from equilibrium is small. An additional advantage of this formulation manifests itself in problems where a large fraction of the computational domain remains in equilibrium, such as the transient problem discussed briefly below and shown in figure 3. In this problem, a thin slab of crystalline material, initially at uniform temperature and resting on a silicon substrate, is subjected to a localized heating by a laser source. As a result of the localized heating, large parts of the computational domain are in thermal equilibrium, especially for early times. Algebraic decomposition enables the use of particles only in regions where , thus significantly reducing the computational cost compared with the standard approach.

Figure 3 shows three snapshots of the temperature field solution of the laser heating problem discussed above obtained using the variance-reduced method. More details on the problem formulation and simulation method can be found in [62]. Despite the very small signal (temperature differences are typically much smaller than 1 K and of the order of 10^{−3} K at late times), the time-dependent temperature field is resolved to very small statistical uncertainty. The computational savings compared with a standard Monte Carlo simulation method have been estimated [62] to be of the order of 10^{9}. In other words, simulation of this problem (at the same statistical uncertainty) is infeasible using traditional methods with our present computational resources.

## 6. Variance reduction using a local equilibrium

As shown in §5, a deviational algorithm based on a global equilibrium distribution provides significant acceleration in the form of variance reduction for *ϵ*≪1, but also computational savings in cases where part of the simulation domain is in equilibrium. Moreover, this algorithm is reasonably simple and does not require methods for evolving or reconstructing *f*^{0}. However, although as the variance reduction improves (see also figure 1), the variance reduction does not improve as (for a constant *ϵ*) because in this limit . In other words, in the limit, the degree of variance reduction can further be improved using the decomposition
6.1
where *f*^{MB} varies in space. We note, however, that *f*^{MB}≠*f*^{MB}(*t*) and thus *no fluid-dynamic solver is required for its evolution*.

The seemingly impossible task of approximating *f*^{loc}(*t*) by *f*^{MB}(**r**,**c**)≠*f*^{MB}(*t*) can be achieved [32] by updating *f*^{MB} once every timestep (e.g. before the start of a new timestep). In other words, after every timestep, the update seeks to find a new *f*^{MB}_{t+Δt}≈*f*^{loc}(*t*+*Δt*) without trying to construct the complete distribution from its samples, in contrast to the approach of §4*a*. This ‘change of basis’ can be achieved [32] by introducing a change such that at time *t*+*Δt* decomposition (6.1) is given by
6.2
6.3
where
6.4
and ** ζ**=(

**c**−

**u**

_{MB})/

*c*

_{MB}. Once the parameters

*Δn*

_{MB},

*Δ*

**u**

_{MB},

*Δc*

_{MB}are determined, the change of basis can proceed (); at the same time, deviational particles that sample −

*Δf*

^{MB}need to be generated and added to the

*f*

^{d}population.

It was shown in [32,50] that it is best to combine this step with the particle generation step of the collision algorithm, which for the BGK model can be summarized as
6.5
This formulation is advantageous because in addition to not requiring reconstruction of *f*^{MB}_{t+Δt} from its samples every timestep, it exploits analytical cancellation (between positive and negative particles) by sampling only the net function (*f*^{loc}−*f*^{MB})*Δt*/*τ*−*Δf*^{MB} (as opposed to generating particles from the distributions (*f*^{loc}−*f*^{MB})*Δt*/*τ* and −*Δf*^{MB} separately). The reduction in complexity is immediately obvious, while the computational savings are considerable, provided *Δf*^{MB} is chosen appropriately (see below for a discussion): in the special case of the linearized BGK model, a choice of *Δf*^{MB} exists for which the cancellation is perfect ((*f*^{loc}−*f*^{MB})*Δt*/*τ*−*Δf*^{MB}=0) and therefore no particle generation is required; this could perhaps be expected from the physical interpretation of the BGK model (the action of the collision operator is to move the distribution towards local equilibrium) [50].

The above algorithm depends on an appropriate choice for *Δf*^{MB} which has the objective of tracking *f*^{loc}(*t*) using *f*^{MB}_{t}. In [32,50], this was reliably achieved by requiring that *Δf*^{MB} be chosen such that it absorbs the mass, momentum and energy change *due to the action of the particle-generation part of the collision integral*; in other words, *Δf*^{MB} is determined from
6.6
where *ϕ*={1,**c**,||**c**||^{2}}. The rationale for this requirement is as follows [32]: in problems where the final distribution is of the Maxwell–Boltzmann type (say ), if *f*^{d} is constrained to have no net mass, momentum and energy, it has to be zero since it is also constrained to be equal to the difference of two Maxwell–Boltzmann distributions (). In other words, it is reasonable to expect that for problems where a local equilibrium solution exists, through this formulation, the algorithm will be able to arrive at this solution (i.e. ). Given the above, one may expect that in general problems, this formulation drives *f*^{MB} towards an appropriate Maxwell–Boltzmann distribution so as to make *f*^{d} small.

As stated above, in the case of the linearized BGK model this formulation reduces [50] to
6.7
with
6.8
6.9
6.10
which is both simple and in line with the physical interpretation of the BGK model, namely that the effect of the collision operator is to drive *f*^{MB} towards *f*^{loc} while eliminating deviational particles.

The resulting simulation process is very efficient and can approach the *Kn*≪1 limit without experiencing the stiffness problems associated with traditional molecular methods. This can be illustrated in a number of ways. Figure 4 shows the time evolution of a start-up Couette flow at ; this essentially noise-free solution required a CPU time of 70 s (on one core of a 3 GHz Core2 Quad). The steady-state flow field obtained via DSMC calculation at *Ma*=0.02 (based on the wall velocity *U*) using the same CPU time is also shown for comparison.

Figure 5 shows the total number of samples required to resolve the flow velocity (to a fixed level of statistical uncertainty) in the cell adjacent to the wall for a steady Couette flow. Here, the number of samples is defined as the number of particles in the simulation multiplied by the number of independent statistical ensembles. The figure verifies that deviational simulations based on the local equilibrium require increasingly fewer samples (an indication of computational effort) than simulations based on a global equilibrium as the limit is approached.

*Note on the advection step:* when *f*^{MB}=*f*^{MB}(**r**,**c**), (2.10) requires the deviational distribution function to obey
6.11
during the advection substep. In the implementation discussed above, *f*^{MB} is local to each cell and thus varies in a piecewise constant fashion from one cell to the next. This leads to discontinuities of this function at the boundaries between cells which leads to fluxes of deviational particles across these boundaries. The solution to (6.11) can be written as a superposition of a free molecular advection for the deviational particles (solution of the homogeneous equation) and a correction term accounting for the contribution of the right-hand side. The latter, which can be written analytically [32], can be implemented as a deviational-particle flux term at cell boundaries; more details can be found in [32,50].

One disadvantage of this implementation is that particle generation at cell interfaces becomes cumbersome in high-dimensional problems. An alternative formulation that may alleviate this limitation would consist of a continuously variable *f*^{MB}, whereby particle generation at cell boundaries will be replaced by volumetric particle generation, which will reduce the complexity associated with the dependence of the number of cell interfaces on dimensionality.

## 7. Discussion

Multiscale kinetic phenomena lend themselves naturally to algebraic decomposition of the distribution function into a part treated by a particle method and a part that can be treated using an appropriate limiting description, either numerical (§4) or analytical (§§5 and 6). These approaches are very promising, and in some cases have already been used to solve problems that had been otherwise intractable *without introducing approximations*. We note that ideas similar in spirit have been used [64,65] in the context of modelling plasma dynamics, namely the simulation of gyrokinetic equations using particle-in-cell approaches.

We also note a recently proposed approach for reducing the statistical uncertainty of DSMC simulations by coupling them with a deterministic solver of an extended Euler system of equations through a moment-matching procedure. The rationale behind this formulation, referred to as moment-guided Monte Carlo [66], is that constraining the lower moments of the DSMC solution to match a significantly less noisy solution field injects information into the computation that should reduce the statistical uncertainty of the overall solution. In the first implementation [66], moment matching between the two simulations is achieved by a rather arbitrary additive and multiplicative readjustment of particle velocities in DSMC until the desired mean and variance is achieved. Applications to problems where the distribution function is not known will in general require rigorous methods for matching the moments of the two simulations without affecting the particle dynamics in the DSMC procedure.

Our discussion has mostly focused on the spatial aspects of the multiscale problem, primarily because the majority of recent work was devoted to these challenges. Despite this, developing methods that can span a wide range of temporal scales is equally important and challenging. In the context of kinetic problems, algebraic decomposition (4.1) can be exploited for this purpose: Cheremisin [67] developed a *deterministic* method for solving the Boltzmann equation which uses such a decomposition to remove the stiffness associated with integrating the Boltzmann equation in the limit . For particle methods, time-relaxed [43] integration schemes, briefly described in §4, provide a promising new direction. These schemes remove the stiffness associated with time integration of (2.11) in the limit by emulating a number of desirable features of implicit integration methods, namely stability for all timestep sizes, and the ability to approach the correct limiting distribution in the limit of large timesteps. Although originally developed for the BGK model and limited to first-order accuracy, higher order schemes for the VHS model have recently been presented [45].

The ultimate challenge, both in the context of spatial and temporal scales, is the extension of algebraic decomposition methods to more complex systems (e.g. dense fluids) where governing equations for the distribution function reside in higher-dimensional spaces. Such systems are more challenging because the governing equation is typically more complex, making it harder to derive explicit expressions for the dynamics of the equilibrium part and the deviation therefrom; moreover, although nearby equilibrium states are easy to identify based on physical considerations, mathematical descriptions of those states will be significantly more complex.

## Acknowledgements

The work presented here was supported, in part, by the Singapore-MIT Alliance. J.-P.M.P. gratefully acknowledges financial support from Ecole National des Ponts et Chaussées and the MIT Department of Materials Science and Engineering through a Graduate Fellowship. The authors would like to thank H. A. Al-Mohssen and C. D. Landon for useful comments and suggestions.

## Footnotes

One contribution of 13 to a Theme Issue ‘Turbulent mixing and beyond: non-equilibrium processes from atomistic to astrophysical scales I’.

↵1 In the literature, it is customary to refer to the Navier–Stokes system of equations as the continuum description, despite the fact that continuum conservation laws, sometimes with other closures, can still be written when the Navier–Stokes closure is not valid. Owing to the long historical precedent, we will also use the term continuum to refer to the Navier–Stokes (more generally, diffusive transport) description valid in the macroscopic limit.

- © 2012 The Author(s) Published by the Royal Society. All rights reserved.