## Abstract

We argue that the *freezing transition scenario*, previously conjectured to occur in the statistical mechanics of 1/*f*-noise random energy models, governs, after reinterpretation, the value distribution of the maximum of the modulus of the characteristic polynomials *p*_{N}(*θ*) of large *N*×*N* random unitary (circular unitary ensemble) matrices *U*_{N}; i.e. the extreme value statistics of *p*_{N}(*θ*) when . In addition, we argue that it leads to multi-fractal-like behaviour in the total length *μ*_{N}(*x*) of the intervals in which |*p*_{N}(*θ*)|>*N*^{x},*x*>0, in the same limit. We speculate that our results extend to the large values taken by the Riemann zeta function *ζ*(*s*) over stretches of the critical line of given constant length and present the results of numerical computations of the large values of ). Our main purpose is to draw attention to the unexpected connections between these different extreme value problems.

## 1. Introduction

Over the past 40 years, considerable evidence has accumulated for connections between certain statistical properties of the Riemann zeta function, *ζ*(*s*), and those of large random matrices. For example, correlations between the non-trivial zeros of the zeta function on the *critical line* , are believed to coincide, in the limit , with those between the eigenvalues of large random unitary or Hermitian matrices, and the value distribution of is believed to be related to that of the characteristic polynomials of large random unitary or Hermitian matrices. Our purpose here is to connect these two areas of research to a third, the statistical mechanics of disordered energy landscapes. The analogy we develop suggests that the freezing transition observed in the statistical mechanical problem also governs the extreme values taken by the characteristic polynomials of random matrices and the zeta function. This sheds new light on the long-standing problem of determining the maximum size of the zeta function.

The Riemann zeta function
1.1is of central importance in mathematics because it encodes the distribution of the primes *p* in the positions of its *non-trivial zeros*. The Riemann hypothesis places these zeros on the critical line. Some of the most important questions in the theory of the zeta function concern the distribution of values it takes on the critical line. It was proved by Selberg [1,2], for example, that when , satisfies a central limit theorem:
1.2which implies that its typical size is of the order of when (e.g. [3,4]). As regards the exceptionally large values taken by the zeta function over long ranges, the Lindelöf hypothesis asserts that for any *ϵ*>0, the Riemann hypothesis implies that
1.3where *c*_{1} is a constant, and, unconditionally, we know that
1.4meaning that takes the value in the argument on the right-hand side infinitely often (e.g. [3]). The exceptionally large values of thus lie in the range between (1.3) and (1.4). The problem of determining where precisely within this range they lie have attracted considerable attention in recent years, but remains unresolved. The extreme values in question are so rare that extensive numerical computations have thus far failed to settle the matter.

It was observed by Montgomery (in relation to a conjecture, to be described below, of Farmer *et al*. [5]) that treating the local maxima of as being statistically independent, assuming that their values satisfy the central limit theorem (1.2), and using the fact that the number of local maxima of in 0≤*t*≤*T* is of the order of (the number of zeros in the range), implies that the typical size of the maximum value of is of the order of , where *c*_{2} is a constant.^{1} Note that this is considerably larger than the typical size of and that it is closer to (1.4) than to (1.3), implying that the extreme values are not much larger than the largest value known to be reached infinitely often. Significantly, this calculation makes clear that the square root in the estimate of the maximum size is related to the fact that the exponential in the integrand on the right-hand side of (1.2) is quadratic in *x*. Had the exponential been linear in *x* then the result would have been closer to upper limit (1.3). It is worth remarking that the Montgomery model would predict a Gumbel distribution (see §3) for the fluctuations of the extreme values of around this typical size.

It is important to note that Montgomery's observation rests on two key assumptions. The first is that central limit theorem (1.2) extends out to the range of the large values predicted. This is far beyond the range for which it has been established. The second is that the values of the local maxima of are uncorrelated. In fact, the values of are known to be correlated in a way that is significant from the point of view of the ideas we shall explore here. Specifically, let us define, for a fixed ,
1.5(The factor of −2 is introduced for reasons to be explained below.) The central limit theorem (1.2) implies that, when , behaves like a Gaussian random function of *x*. To characterize such a random process, it is natural to consider the two-point correlation function , with brackets 〈⋯ 〉 denoting the average over an interval [*t*−*h*/2,*t*+*h*/2] such that . A simple argument, sketched in appendix A, shows that when
1.6This illustrates the fact that values of are indeed correlated (see also [6]). The significance of the precise form of the correlations will become apparent when we draw comparisons with corresponding problems in random-matrix theory and statistical mechanics.

Over the past decade it has become a well-established paradigm that many statistical properties of the Riemann zeta function along the critical line can be understood by comparing them to analogous properties of the characteristic polynomials of random matrices [7–11]. Let *U*_{N} be an *N*×*N* unitary matrix, chosen uniformly at random from the unitary group (i.e. *U*_{N} lies in the circular unitary ensemble (CUE) of random matrices), and denote its eigenvalues by . Let
1.7be the corresponding characteristic polynomial. To this end, it is instructive to compare from (1.5) to . (Again, the factor of −2 is introduced for reasons to be explained below.) satisfies a central limit theorem that is the analogue of (1.2) [7,12]. Specifically, the values of normalized by have a limiting distribution as given by the right-hand side of (1.2). Identifying the mean density of the eigenvalues, *N*/2*π*, with the mean density of the Riemann zeros near to height *t*, , renders the agreement complete.

Importantly for us here, has the following representation [8]:
1.8The coefficients for any fixed finite set of integers *n* tend, in the limit , to independent and identically distributed (i.i.d.) complex Gaussian variables with zero mean and variance [13]. A simple calculation (cf. (3.2) and (3.3)) then shows that tends in the limit to , and so exhibits precisely the same logarithmic behaviour at small distances as we found for the zeta function. For large but finite *N*, the logarithmic divergence can be shown to saturate at |*θ*_{1}−*θ*_{2}|∼*N*^{−1}, so after associating the correspondence between and becomes complete. This is significant from the point of view we seek to develop.

Our goal here is to determine the maximum value of |*p*_{N}(*θ*)| over some specified interval 0≤*θ*≤*L*, or, more precisely, the distribution of these maximum values when *U*_{N} ranges over . Our second goal is then to use the random-matrix results to motivate predictions for the extreme values of the Riemann zeta function. The first steps in this direction were taken by Farmer *et al.* [5], who determined the tail of the distribution of the maximum values of |*p*_{N}(*θ*)| when *L*=2*π*. They modelled in the range 0≤*t*≤*T* by the characteristic polynomials of a set of approximately *T* independently chosen random matrices (recall that the number of zeros in the range is of the order of and the identification ). This corresponds to asking for the typical size of the maximum value that |*p*_{N}(*θ*)| takes when 0≤*θ*≤2*π* and *U*_{N} is sampled independently a large (exponentially in *N*) number of times from within . The resulting conjecture is in accord with the Montgomery heuristic (and the random-matrix approach is sufficiently refined to predict a value for the constant *c*_{2}). This is not altogether surprising, because, as discussed above, the central limit theorem for corresponds to that for , and moreover in the random-matrix case, we know that the Gaussian distribution extends to the large deviation regime [8].

The focus here will differ from that of [5] in the following ways. We shall be concerned with the maximum values of the characteristic polynomials of single matrices, rather than with large numbers of matrices, and will obtain the full value distribution of the maxima in the limit as , rather than concentrating on the tail that is relevant when maximizing over many matrices. This leads to a model for the distribution of maximum values of over *T*≤*t*≤*T*+*L*, *L*≤2*π*, rather than 0≤*t*≤*T* as . Furthermore, it makes the problem of numerical computation of the distribution in question significantly easier, because one is finding the maximum only of rather than numbers (the values of the local maxima).

Our main purpose here is to link the problems of finding the extreme value statistics of the characteristic polynomials of random matrices and of to an interesting and important class of problems in statistical mechanics. The maximum value of *p*_{N}(*θ*) over the interval in question can be characterized in terms of the moments
1.9where . Specifically, if , then
1.10The key point is that (1.9) takes the form of a partition function for a system with energy *V* _{N}(*θ*) and inverse temperature *β*, and may then be associated with the corresponding free energy. Recalling that the values of *V* _{N}(*θ*) are Gaussian distributed and logarithmically correlated, it is natural to draw comparisons with a class of problems that has attracted a good deal of attention recently in the area of disordered systems, namely the statistical mechanics of a single particle equilibrated in a random potential energy described by Gaussian random processes with logarithmic correlations. For example, the statistical mechanics of systems in which the energy is given by a random Fourier series, similar to (1.8), was addressed in [14,15]. In the statistical mechanical problem, there has been a particular focus on the freezing transition which dominates the low temperature limit and determines the extreme value statistics. We shall argue that a similar freezing transition determines the extreme value statistics of the characteristic polynomials and hence, conjecturally, of and that therefore the logarithmic correlations exhibited by and play an important role. Explaining this observation represents our main objective. The implications are wide ranging, in that they allow us to predict explicit formulae for the extreme value statistics. It should be emphasized that we see this as the first step in exploring the connections and that we are aware of the speculative nature of many of the predictions we shall make. We see it as a major challenge to explore these ideas more rigorously. We put forward these speculations in the hope that they will stimulate new directions of research into the long-standing problems we address.

The structure of this paper is as follows. We summarize our predictions in the next section, and also briefly discuss preliminary numerical evidence for and random matrices in support of them. Some of these numerical results were first outlined in our short communication [16], which we refer to below as FHK. As the statistical mechanical problems we shall compare to play a central role in this story, and as they may be unfamiliar in the context of the connections between random-matrix theory and number theory, we give an overview of the key concepts and main results in §3. This review of the literature is necessarily lengthy because the intuitions coming from statistical mechanics are essential to explain our ideas. In §4, we show how to employ the statistical mechanics methodology in the context of the extreme value statistics of the characteristic polynomials of random unitary (CUE) matrices. Finally, in §5 we outline briefly how these calculations may be used to motivate predictions for the extreme value statistics for the Riemann zeta function.

## 2. Summary of predictions

As explained in the introduction, we see our main goal in this paper as being to explain the analogies between the statistical mechanics of certain disordered systems, in particular the freezing transition observed there, and the extreme value statistics of the characteristic polynomials of large random unitary matrices and the Riemann zeta function. These analogies lead to a range of predictions, a representative selection of which we list here. These predictions follow from the calculations outlined in §§4 and 5, and are motivated by similar calculations in statistical mechanics reviewed in §3.

### (a) Statistics of the moments of the modulus of circular unitary ensemble characteristic polynomials

Let us define the moments as in (1.9), and concentrate for simplicity on the case *L*=2*π*. Then for 0<*β*<1 and *N*≫1 the probability density of the random variable , where
2.1is given by
2.2with *Γ*(*x*) and *G*(*x*) denoting, respectively, the Euler gamma-function and the Barnes G-function (which satisfies *G*(*x*+1)=*Γ*(*x*)*G*(*x*),*G*(1)=1). In the region *β*>1, the probability density of the moments is conjectured to change to a much more complicated distribution. Defining the scaled moments for all *β*>1 as , the most salient feature of for *N*≫1 is predicted to be the following tail:
2.3Both the change of the tail exponent from 1+1/*β*^{2} to 1+1/*β* as well as the presence of the logarithmic factor in (2.3) are different manifestations of the freezing transition occurring at *β*=1. They are expected to be universal features for all values of *L* in the range 0<*L*≤2*π*.

### (b) Freezing of the mean free energy

Perhaps the simplest consequence of freezing manifests itself in the temperature dependence of the free energy. For ease of presentation, we again focus on the case when *L*=2*π*. Let us define the normalized free energy by
2.4When *β* is small, the average of with respect to is dominated as by typical values taken by *p*_{N}(*θ*), those governed by the central limit theorem [7,12]. We have seen above that the typical scale for in that case is , and so we expect that
2.5

On the other hand, from (1.10) we know that for the free energy is dominated by the extreme values taken by *p*_{N}(*θ*). The latter will be shown below to scale as . We therefore expect that as
2.6

*Freezing* in this context simply means that the transition between the two types of behaviour occurs at *β*=1 and is sharp: in the limit
2.7Specifically, the term is motivated, after interpreting *β*^{−1} as the temperature *T*, by the fact that the free energy remains *T*-independent, i.e. ‘frozen’, below the critical temperature *T*=1.

### (c) Statistics of the maximum of the modulus of circular unitary ensemble characteristic polynomials

The analogy we develop suggests that the maximal value of the modulus of a CUE characteristic polynomial *p*_{N}(*θ*) in an interval *θ*∈[0,*L*), 0<*L*≤2*π* (which contains on average *N*_{L}=*NL*/2*π* eigenvalues of the associated matrix *U*_{N}) can be written in the limit as
2.8where , with, conjecturally, and and where the random variable *x* is distributed with a probability density *p*(*x*). The value of *c*, , is significant because it is different from the value, , characterizing the extrema of short-range correlated random variables; see the detailed discussion around equation (3.15). The nature of the variable *x* and the form of the density *p*(*x*) both depend on the arclength *L*, and are understood presently only for two specific choices, as detailed below.

(i) The

*full-circle*case*L*=2*π*when*N*_{L}=*N*. Then the random variable*x*is of the order of unity and its probability density is predicted to be given by 2.9where*K*_{ν}(*z*) denotes the modified Bessel function of the second kind. The function*g*_{βc}(*x*) has the following expansion when : 2.10where*γ*_{E}is Euler's constant.(ii) A

*mesoscopic interval*of small arclength*L*≪2*π*such that*N*_{L}≫1, so it typically contains many zeros of the characteristic polynomial. In this case, we have , where*u*is a standard (mean zero, unit variance) Gaussian random variable and*y*is independent of*u*and is of the order of unity. The probability density function for*y*can be written again as*p*(*y*)=−*g*_{βc}′(*y*), but*g*_{βc}′(*y*) is now given in terms of a contour integral: 2.11where the contour is parallel to the imaginary axis*s*=*s*_{0}+i*ω*, with*s*_{0}sufficiently large that all singularities lie to the left, and*M*(*s*) can be expressed in terms of the Barnes G-function*G*(*x*) 2.12The function can be evaluated numerically [15], and the cumulants for the random variable*y*can be evaluated to be , , and for general*n*≥3 2.13It is instructive to compare the behaviour of

*g*(*y*) at to that predicted for the full circle (2.10). We have 2.14with and*C*=−0.253846,*B*=1.25388 and*A*=−5.09728. Significantly, we see the same asymptotic tail*g*(*y*)−1∼*y*e^{y}shared by the two functions, but that the higher order terms in (2.10) and (2.14) differ. The asymptotic behaviour is conjectured to be the universal backward tail shared by extreme value distribution of all logarithmically correlated random functions (see [17]).

The significance of these results is that they differ from the usual Gumbel distribution,^{2} which holds for maxima of a long sequence of i.i.d. random variables with finite moments (see §3). This difference is essentially owing to the logarithmic correlations exhibited by and discussed in the introduction: had the correlations been short range (see §3), the Gumbel distribution would have applied. This difference shows up in the form of the distributions *p*(*x*), and in particular in the asymptotic decay of *p*(*x*) as , but also, importantly, in the value of *c* in (2.8). Our prediction, which is conjectured to be another universal feature of logarithmically correlated processes (see [19]), differs from that, , which would be expected when the correlations are short range (or absent).

### (d) High points of circular unitary ensemble characteristic polynomials

It follows from (2.8) that the typical value of the maximum of |*p*_{N}(*θ*)| in the interval *θ*∈[0,*L*] is of the order of *N*_{L}. The simplest quantity that quantifies the structure associated with the high values of |*p*_{N}(*θ*)| is the relative length *μ*_{N}(*x*;*L*) (as a fraction of the total length *L*) of those intervals in [0,*L*], where , where 0<*x*<1. This can be expressed as
2.15where the characteristic function *χ*{*u*}=1 if *u*>0 and zero otherwise. In the language of the theory of random processes, quantities similar to (2.15) are known as *sojourn times* of the random function above the level . Explicit expressions for the probability density of *μ*≡*μ*_{N}(*x*;*L*) can be provided again in the two limiting cases.

(i) The full-circle case

*L*=2*π*, when*N*_{L}=*N*. We denote the typical value*μ*_{e}(*x*) of the length*μ*_{N}(*x*;*L*) by 2.16The probability density for the variable*ξ*=*μ*_{N}(*x*;*L*)/*μ*_{e}(*x*) is then predicted to have the following form: 2.17Note that the mean value of the length stays finite as

*x*→1. A direct calculation shows that such an expression for the mean is valid for any*x*>0, without restricting to*x*<1. However, when approaching*x*=1 the mean value is significantly larger than the typical value*μ*_{e}(*x*), and is dominated by rare fluctuations. This is directly related to the fact that*x*=1 is, to leading order, the typical value for the highest maximum of |*p*_{N}(*θ*)| (the ‘extreme value region’), so the statistics of the corresponding length is indeed dominated by rare events. As will be explained, it is such a difference between the mean and typical values which helps us to conjecture*c*=3/2 in (2.8).The non-trivial leading-order scaling with

*N*^{−x2}in (2.16) is directly related to the multi-fractal-type structure of the measure of intervals supporting high values. Formula (2.17) is expected to be valid for all*ξ*of the order of unity, more precisely as long as*ξ*≪*ξ*_{c}, with a certain cut-off scale as , the specific form of which is, as yet, unknown to us.(ii) For a mesoscopic interval

*L*such that 1≪*N*_{L}≪*N*, the random variable*μ*_{N}(*x*;*L*) is distributed as the product of two statistically independent factors: , with the random variable*u*being a standard mean-zero unit-variance Gaussian. The variable has a typical scale related to (2.16) by , thus sharing the same multi-fractal scaling of the length of intervals supporting high values. The probability density of the random variable is expected to share the power-law tail for with the full-circle case, but the exact shape of the distribution will be different.Explicitly, let us define for complex

*s*, at fixed 0<*x*<1. Then the density can be written as the contour integral 2.18Note that for the full-circle case*M*_{x}(*s*)=*Γ*(1−*x*^{2}(1−*s*)), so that performing the integral (2.18) by the sum over residues reproduces (2.17).For the mesoscopic case, we have 2.19with 2.20

Here

*Γ*_{2}(*z*|*x*)≡*G*_{x}(*z*) is the Barnes double*Γ*-function [20]: for ℜ(*z*)>0 2.21where*Q*=*x*+1/*x*. This function satisfies 2.22For*x*=1, the function*G*_{x}(*z*) coincides with the standard Barnes function*G*(*z*) discussed after (2.2). Like the standard Barnes function,*G*_{x}(*z*) has no poles and only zeros, and these are located at*z*=−*nx*−*m*/*x*,*n*,*m*=0,1,…. We note in passing that*G*_{x}(*z*) plays a fundamental role in the Liouville model of quantum gravity (e.g. [21]) and in recent calculations of the asymptotics of the spacing distribution at the hard edge for*β*-ensembles [22].

### (e) Extreme values of

The relationship between the values of , as *t* varies along the critical line, and those taken by |*p*_{N}(*θ*)| when the corresponding matrix *U*_{N} is chosen uniformly at random from the unitary group was first considered in [7], where it was argued that, statistically, behaves like the characteristic polynomial of a random unitary matrix of dimension . It has since been the subject of a number of studies (e.g. [9–11,23]), but we are still far from a complete understanding. For example, the role played by arithmetic is still being elucidated, although the hybrid model of [10] suggests that at leading order this contribution decouples from the random-matrix component. For this reason, the way in which the random-matrix predictions listed above model the extreme value statistics of is not entirely clear to us. Nevertheless, we believe that formulae at least similar to those listed below should hold at leading order. We give our reasons for believing this in §5. This belief is also supported by preliminary numerical experiments, the results of which we present below.

In the light of the understanding that behaves like the characteristic polynomial of a random unitary matrix of dimension , it is natural to expect, approximately, a single matrix to model in a statistical sense the zeta function over a range *T*≤*t*≤*T*+2*π*, as such a range contains zeros on average. One can thus consider splitting the critical line into ranges of length 2*π*, and modelling each by a different unitary matrix (cf. [5]). In each range, one can then find the maximum value taken by , and finally one can consider the distribution of these maximum values for all of the ranges. More generally, one can consider ranges *T*≤*t*≤*T*+*L*, where *L*≤2*π*.

Thus, if
2.23where 0<*L*≤2*π*, then we can anticipate that is given by (2.8), with |*p*_{N}(*θ*)| replaced by , the maximum replaced by (2.23), *N*_{L} replaced by , and where *p*(*x*) is given approximately by the formulae above when *L*=2*π* and when *L*≪2*π*, respectively. Specifically, we expect
2.24where the random variable *x* has a value distribution *p*(*x*) that is given, when *L*≪2*π*, in terms of (2.11), and, when *L*=2*π*, that is approximated (because characteristic polynomials are 2*π*-periodic, unlike ) by (2.9). This has two significant implications: that this formula holds with , rather than , as would be the case if the zeta correlations were short range, and that the tail of the distribution decays like |*x*| e^{x} as . It is not at this stage completely clear to us how, if at all, the arithmetic will modify these expressions, but there are reasons, discussed in §5, to believe that it will not influence them at leading order.

Furthermore, we expect, with the same identifications,
2.25to be given by the corresponding expressions listed above. In this case, we do expect the scale (2.16) to be multiplied by the arithmetical factor
2.26that appears in the moment conjectures at leading order [7]. Specifically, we expect the value distribution of *μ*_{T}(*x*;*L*) to be given by the formulae in §2*c*, but with replaced by
2.27

Finally, we expect to see freezing of the quantity corresponding to the free energy; that is, defining
2.28and
2.29then we expect that the mean of with respect to *T* satisfies
2.30in the limit as .

It is worth remarking that our results relate to extreme values over much shorter ranges than those considered by Farmer *et al.* [5]—our focus is on ranges of lengths that are *O*(1), whereas theirs was on ranges of length *T* as . If one extrapolates (2.8) to ranges of length *T* (well beyond where we can justify it), our result for the typical scale of the extreme values agrees with theirs. Where we are able to go further is in predicting the value distribution of the fluctuations. In the context of the question of the extreme values for *t*<*T*, it is worth noting again that the tail of the distribution we predict for much shorter ranges decays like |*x*| e^{x} as ; that is, the exponential is linear rather than quadratic. If this were to persist to much longer ranges than we understand at present, it would suggest that may take much larger values than the Montgomery heuristic (or, more precisely, the Farmer–Gonek–Hughes conjecture) predicts, maybe even close to the upper limit (1.3), but there are several reasons for thinking this unlikely. Specifically, it is not difficult to see that in the absence of correlations the large deviation tail of the extreme value density again becomes quadratic and we believe the same is also true when logarithmic correlations are present.

### (f) Numerical experiments

In order to test the extreme value predictions for the Riemann zeta function, we now summarize the results of preliminary numerical computations performed by Ghaith Hiary and published in our short communication FHK. These involved evaluating over ranges of length 2*π*, at various heights *T* and finding the maximum value *ζ*_{max}(2*π*;*T*) in each range. Values of were computed using the amortized-complexity algorithm of [24], which is suitable for computing at many points. Pointwise values of that were computed are typically accurate to within ±5×10^{−11}, which is sufficient for the purposes of this experiment. The maximum of between consecutive zeros was computed to within ±10^{−9}.

The first test concerns the value of the constant *c* in (2.24). We expect the logarithmic correlations to lead to , rather than , as would be the case if the zeta correlations were short range. The mean of *ζ*_{max}(2*π*;*T*) suggested by the model in (2.24) is , with or , and *γ*_{E}=0.57721…, where we set *N* to be the nearest integer to . At each height, a sample that spans ≈10^{7} zeros is used yielding ≈10^{7}/*N* sample points (because there are roughly *N* zeros in each range of length 2*π*). The numerics presented in FHK and reproduced in table 1 clearly shows that fits the data considerably better than , thus supporting the logarithmic correlations model.

Testing the distribution *p*(*x*) is more difficult, because the data converge extremely slowly at that scale. The results of some initial experiments were presented in FHK and are reproduced in figure 1. Specifically, we considered
2.31based on a set of approximately 2.5×10^{8} zeros near *T*=10^{28}. The data were normalized so that *σ*(*T*) had empirical variance . The overall agreement was supportive of (2.9), especially in the important tail when and in view of the fact that lower order arithmetical terms [7,9] had not been incorporated, but it cannot be said to be conclusive at this stage. The behaviour in the tail is significant because if it were to persist into the large deviation regime it would suggest that the Montgomery heuristic (or, more precisely, the Farmer–Gonek–Hughes conjecture) significantly underestimates the maximum values achieved by ; however, as noted above, there are strong reasons for believing that it does not persist this far.

Finally, we performed a test of the prediction relating to freezing in the mean free energy. This involved calculating , and hence the free energy numerically, and then averaging with respect to *T* over 10^{6} values near to *T*=10^{28}. If freezing is operative, is expected to be equal to *β*+1/*β* for *β*<*β*_{c}=1 and remain frozen to for all *β*>1. In order to account for the finite height at which the computations could be carried out, it was found to be efficacious to normalize so as to incorporate lower terms in the full asymptotic expansion of the moments [9,11] rather than just the leading-order asymptotic term. Specifically, what was computed was the *T*-average of
2.32where *P*_{β}(*x*) denotes the moment polynomial considered in [9,11] (and its extension as an infinite series when *β* takes non-integer values). The results shown in figure 2 would appear to support freezing.

Timothée Wintz also assisted us by performing similar numerical experiments on randomly generated unitary matrices. Specifically, he computed the maximum values of characteristic polynomials of large numbers of unitary matrices drawn uniformly from the CUE. He tested the value of *c* in (2.8) with the results set out in table 2, obtained as for the zeta function. There is again good agreement with our conjectured value . He also tested the distribution of the random variable *x*. In this case, it proved efficacious to rescale *x* by a factor , cf. (2.8), where the constant *B* was obtained from a best fit. The results for a sample of a million matrices with *N*=50 are shown in figure 3.

## 3. Statistical mechanics in disordered landscapes and extreme value statistics

The idea of complicated energy landscapes pervades the theoretical description of glasses, disordered systems, proteins, etc. [25] and has recently re-emerged in string theory and cosmology (e.g. [26]). In this respect, the Parisi solution for mean-field spin-glasses is especially important: it reveals that in that case the energy landscape of a system of many randomly interacting spins has a surprisingly complex, hierarchical structure of valleys within valleys within valleys (for a short recent account see [27] and references therein). Such a structure manifests itself in both dynamics and thermodynamics via a non-trivial phase transition occurring at some finite temperature *T*_{c}>0. Below *T*_{c}, dynamics associated with wandering in this maze of valleys is non-ergodic and shows many distinguished features similar to ageing [28]. Such features are commonly observed in real experiments, although the extent to which the Parisi theory describes energy landscapes typical for finite-dimensional disordered systems is still a matter of debate.

Investigating the energy landscape of a real interacting disordered or complex system is a notoriously difficult problem, and an important role is played by studying effective single-particle counterparts. Here, the main goal is to describe the behaviour of the whole complex system, or one of its subparts, by focusing on the statistical mechanics of a single point particle (or sometimes higher dimensional objects like lines or membranes) moving in a random potential, which encodes the complexity of the original system. The hope then is to be able to classify the possible types of landscapes and to establish generic, universal properties, not unlike those emerging in random matrix theory. The most famous models of this type are the random energy model (REM or GREM [29,30]) and its later ramification describing the model of a polymer on a tree with a disordered potential [31].

Recall that the equilibrium statistical mechanics for a system characterized by a (discrete) set of energies *E*_{1},…,*E*_{M} and a temperature *T*>0 is represented by the set of Boltzmann–Gibbs probability weights, *p*_{1},…,*p*_{M}, where
3.1
*β*=1/*T*, and we set the Boltzmann constant *k*_{B} to unity to measure the temperature and the energy in the same units. It is clear that at low temperatures, *β*≫1, the set of probabilities is dominated by the lowest available energies in the set. It is then not surprising that one of the most fundamental questions arising in the landscape paradigm is the problem of understanding the statistical properties of low, or even ‘extreme’ (i.e. minimal) energy values typical for various classes of disordered landscapes [32,33]. Such an understanding is certainly needed for a detailed description of the freezing phenomena in systems with disorder, with the spin-glass-like arrest being the paradigmatic example. From that angle, the analysis of the statistics of minimal-energy configurations of various random systems is attracting a good deal of attention at present; for a review of some recent developments related to Tracy–Widom-type statistics, see [34,35].

In mathematics, the distribution of the minimum/maximum in a sequence of *M*≫1 random variables *V* _{1},…,*V* _{M} is an important research area with numerous applications. The classical results in this area can be found, for example, in [36], and we attempt to summarize the facts most pertinent to this study below. If *V* _{i} are i.i.d. random variables there are only three possible shapes (up to shifts and rescaling) of the limiting distribution of the minimum . In particular, for i.i.d. variables *V* _{i} with all moments finite the relevant distribution has a characteristic double-exponential form and is known as the Gumbel distribution. More precisely, there exist non-random sequences *a*_{M} and *b*_{M} such that the random variable is characterized when by the limiting distribution . In the particular case of i.i.d. normal, mean zero variables *V* _{i} with variance the sequences *a*_{M} and *b*_{M} behave asymptotically like and .

An important general question, relevant for applications, is to what extent, if at all, the above picture holds for correlated random sequences. The most complete answer is known for Gaussian mean-zero stationary sequences with covariance . It turns out that if *C*(*r*) decays to zero faster than the limiting distribution of the minima is still given by the Gumbel distribution, and the leading-order scaling behaviour for *a*_{M}, *b*_{M} is the same as that for uncorrelated sequences. We will call such variables *short-range correlated*. Not much is known at present beyond the short-range correlated case. There has been particular interest in *scale-invariant* sequences with stationary increments, also known as fractional random walks, which have numerous applications. Such sequences are conveniently characterized by structure functions , with parameter 0<*H*<1 known as the Hurst exponent. In particular, scale-invariance implies that the typical minimum in such a case should scale for large *M* as , which should be contrasted with the short-range scaling . The probability distribution of the minimum is known explicitly only for the case of the standard diffusive random walk [37], when one can exploit path-integral methods based on the underlying Markovian structure. Characterizing extreme value statistics for non-Markovian random walks remains a considerable challenge; even the constant *C*(*H*) is not yet known explicitly beyond .

### (a) Random energy model and the two-dimensional Gaussian free field

The results reviewed above can already be used to generate some insight into the equilibrium statistical mechanics of a single particle in a disordered landscape. To this end let us identify and look at the sequence *V* _{i} as representing a set of energies available for a particle at various ‘sites’ *i*=1,…,*M* of a disordered system. Recall that the free energy can be represented as *F*(*β*)=*U*−*TS*, where is the mean energy of the system, with standing for the thermal average with respect to the Boltzmann–Gibbs measure (3.1). The quantity *S* stands for the *entropy*, which effectively controls how the available mean energy is spread over all the available energy levels (i.e. over the *sites*). In particular, the entropy of a system in which all *M* states are equally likely is given, according to the Boltzmann formula, by . One may then attempt to understand the structure of a Boltzmann–Gibbs measure at a crude qualitative level by invoking the argument of a competition between the entropic term and the minimal available energy [17]. Recalling that for short-ranged random sequences, we immediately conclude that for any fixed *T*>0 the entropic contribution will dominate over the energetic component in sufficiently large systems. This should result in a Boltzmann–Gibbs measure being spread more or less democratically over all the sites of the landscape. The normalization then implies the scaling *p*_{i}∼*M*^{−1}. In such a situation, it is conventional to say that for any *T*>0 the system stays in the *high-temperature* phase with a delocalized Boltzmann–Gibbs measure. For the long-range correlated case, however, the situation is somewhat the opposite: for any temperature and Hurst exponent *H*>0 the magnitude of the minimal energy for large enough *M* grows faster than the entropic contribution. In this situation, the Boltzmann–Gibbs measure for large enough systems will be essentially *localized* on one or a few sites of minimal energy, with the corresponding *p*_{i}=*O*(1), whereas for the majority of the sites *p*_{i} is expected to be negligible. It is conventional to say that effectively such a system is *frozen* in the low-temperature phase, for all temperatures.

It is possible to augment the picture just described in several ways. The simplest is to *rescale* the variance *σ* of the short-ranged random potential with *M* in such a way that . We can thus ensure that both the minimal energy and the entropy grow logarithmically with *M*, and therefore which of the two dominates will depend on the temperature *T*. If in addition we assume the random energies to be i.i.d. random variables, the resulting model is precisely the much-studied REM, introduced by Derrida [29,30], which is sufficiently simple to be amenable to rigorous analysis and has played a paradigmatic role in the statistical mechanics of disordered systems. In particular, the model displays a non-trivial ‘freezing’ phase transition at a finite temperature *T*_{c}, which by an appropriate choice of the variance can be made equal to unity. Namely, the mean free energy of the system in the limit behaves in the high-temperature phase *T*>*T*_{c}=1 as and ‘freezes’ to the minimal value for all temperatures below the transition, for 0≤*T*≤1.

In fact, both above and below *T*_{c} the Boltzmann–Gibbs probability measure is neither truly localized nor delocalized, but rather provides the simplest example of a random *multi-fractal* measure. Namely, in the thermodynamic limit the weights *p*_{i} scale differently on different sites: *p*_{i}∼*M*^{−αi} , with the exponents *α*_{i} filling some interval [*α*_{−},*α*_{+}] in such a way that , with a well-defined smooth concave function *f*(*α*) (see [19,38] for a more extensive discussion and further references). A detailed analysis of the low-temperature phase reveals even more intricate probabilistic structure: the weights *p*_{i} for *T*<*T*_{c} can be described in terms of Ruelle probability cascades [39]. Physicists usually refer to the low-temperature structure arising as one reflecting the simplest non-trivial mechanism of a spin-glass-type phase transition—the so-called one-step spontaneous replica symmetry breaking. A ramified version of the same mechanism in more sophisticated spin-glass models gives rise to the Parisi picture of hierarchical valleys for the effective (free) energy landscape mentioned above.

Despite the success of Derrida's idea to rescale the variance of the i.i.d. variables with , inspired by a similar scaling in infinite-dimensional mean-field spin-glass models, such a procedure for generating a non-trivial freezing transition looks somewhat artificial from the point of view of random landscapes over finite-dimensional lattices. It is therefore worth noting that a similar scaling of the variance arises naturally in certain finite-dimensional models of physical interest. It was revealed in [40] how the corresponding landscape model emerges in the case of a quantum Dirac particle moving in the two-dimensional plane subjected to a transverse random magnetic field. Specifically, it was noted there that the profile of a (normalized) eigenfunction corresponding to zero energy of the Dirac Hamiltonian has, formally, the shape of (a continuous space version of) the Boltzmann–Gibbs measure on , with the role of the random potential *V* (**x**) played by the two-dimensional Gaussian free field (2dGFF). The latter is defined as a mean-zero random Gaussian field in a domain such that its covariance is given by , where *G*(**x**,**x**′) is the Green function of the Laplace operator in *D*, with specified conditions on the boundary ∂*D*. Let us consider, for definiteness, Dirichlet boundary conditions and take *D* to be a two-dimensional disc |*z*|≤*L*, where we have employed the complex coordinate *z*=*x*+i*y*. The corresponding Green function is then . In particular, for any two distinct points *z*_{1} and *z*_{2} well inside the disc |*z*_{1,2}|≪*L* we find , the latter expression interpreted as the Green function on the full two-dimensional plane. To make the construction well-defined from the point of view of an underlying random Gaussian field, one has to ensure its variance is finite by taking appropriate care of the divergence when *z*_{1}→*z*_{2}. The most natural way is to think of an underlying lattice structure, with the lattice spacing *a*≪*L* and the random field defined on the lattice sites only.^{3} We have *M*∼(*L*/*a*)^{2} lattice points inside the disc, and it is consistent to require showing that the 2dGFF indeed naturally gives rise to a REM-like scaling of the variance. In contrast to the REM landscape, however, the values of the 2dGFF at different lattice points are strongly (logarithmically) correlated. Nevertheless it is natural to conjecture that the freezing transition typical for the REM (and shared by the model of directed polymers on disordered trees [31]) will also occur in such a situation and will give rise to a multi-fractal structure in the associated Boltzmann–Gibbs measure, and hence for the zero-energy wave function of the Dirac particle. Numerical simulations and further analytical studies confirm the validity of such a picture (see [17]). This has provided significant insight into the associated statistics of the minima of the regularized 2dGFF landscape. In particular, it suggested a powerful, albeit heuristic, real-space renormalization group approach. This approach substantiated the claim of a REM-like freezing scenario in logarithmically correlated landscapes and led to a conjecture for the minimum value of such a landscape: , where , and the value distribution of the random variable *y* is given asymptoticially, as , by *Φ*(*y*)∼1−|*y*| e^{y}, and so is different from the Gumbel expression, for which *Φ*(*y*)∼1−e^{y}.

In the mathematical literature, the probabilistic properties of the extremes of the lattice version of 2DGFF have attracted considerable attention [42–44]. Actually, these papers addressed rather than , but the two statistics are obviously trivially related. In particular, Bramson & Zeitouni [44] proved that . This agrees with the leading order terms in *a*_{M}. A more detailed characterization of the extreme value distribution was beyond the reach of the methods used until very recently,^{4} but the work of Davioud [43] provided key insights into the very high, *almost extreme* values of the 2dGFF. Specifically, let us define , where *x*∈(0,1), to be the number of lattice points such that the potential *V* _{i} satisfies . It turns out that . This implies that the typical value of scales in every realization roughly as . Such a dependence is natural to interpret again as a kind of multi-fractal scaling, not unrelated to the multi-fractality of the Boltzmann–Gibbs weights. Note that when *x*→1 the typical number of points above such a level becomes of the order of unity, which agrees with the scaling of the extreme values described above. We will return to these issues later on in the paper.

Finally, we mention that a mathematically rigorous framework for dealing with general fields and processes with logarithmic correlations was developed in [46] and is known by the name ‘Gaussian multiplicative chaos’. This has undergone substantial development in recent years (e.g. [47]) and has in particular been exploited in the context of probabilistic aspects of quantum gravity (e.g. [48] and references therein). Very recently, such a framework was shown to be of substantial utility also for studying Boltzmann–Gibbs measures associated with 1/*f* noise landscapes and closely related problems [49–52].

### (b) 1/*f*-noise as a random landscape I: statistical mechanics in the high-temperature phase

To attack the problem from a different angle, the idea was proposed in [14,15] to use the full-plane logarithmic GFF to construct various *one-dimensional* Gaussian random landscapes with logarithmic correlations. The associated Boltzmann–Gibbs measures are expected to be qualitatively analogous with those in 2dGFF landscapes, but are amenable to much more detailed quantitative analysis. Arguably the simplest example of such a one-dimensional landscape can be generated by sampling the values of the full-plane 2dGFF along a circle of unit radius parametrized as *z*=e^{it}, *t*∈[0,2*π*). One is thus led to a 2*π*-periodic mean zero Gaussian process *V* (*t*) whose covariance for *t*_{1}≠*t*_{2} is formally given by
3.2By employing the well-known identity we see that a representation of *V* (*t*) is given by a random Fourier series of the form
3.3where the coefficients , *n*=1,2…, are i.i.d., mean-zero, complex Gaussian variables with variance . In this way, we sample every Fourier series (3.3) according to the Gaussian probability weight . As the power associated with a given Fourier harmonic with index *n* decays like 1/*n*, the function *V* (*t*), viewed as a time-dependent random signal, is a representative of the so-called 1/*f* noise believed to be ubiquitous in Nature (e.g. [53]). In this section, we shall focus on this particular model, although later on in the paper we will also consider another model of 1/*f* noise, which corresponds to sampling the values of the full-plane 2dGFF along a finite interval of a straight line.

Pretending for the moment that (3.3) defines a well-behaving function and further exploiting the identity
3.4we see that in the space of functions *V* (*t*) defined by (3.3) we should have
3.5where we have introduced the formal derivative *V* ′(*t*)=d*V* (*t*)/d*t*. Relation (3.5) then implies that the Gaussian weight associated with every Fourier series (3.3) is proportional to so that the expected value for any functional *Φ*(*V*) of 1/*f* noise could be written symbolically as a formal ‘path integral’
3.6much in the same way as the expected value of a functional *Φ*(*W*) of the standard Brownian motion *W*(*t*) with respect to the Wiener measure can be symbolically written as with (see [54] for historical remarks and further references). The crucial difference however is that in the case of the Wiener measure, the evaluation of expected values in many practically interesting cases can be rigorously performed by the Feynman–Kac formula, reducing calculations to the solution of a differential equation, whereas a mathematically rigorous theory behind (3.5) and (3.6) is not yet available.^{5} Nevertheless, we eventually will be able to employ heuristic methods to conjecture the explicit value of (3.6) in regularized versions of the theory for the particular functional with real parameters *p*>0 and |*β*|<1; see (3.18).

The similarity between (3.3) and (1.8) is very significant in motivating the analogy we wish to draw between the statistical mechanical problem under discussion and the value distribution of the characteristic polynomials of CUE matrices, especially in the light of the discussion following (1.8). Unfortunately, the random Fourier series (3.3) is pointwise divergent with probability one, as reflected in the logarithmic divergence of the covariance (3.2) when *t*_{1}→*t*_{2}. To be able to use such a function as a random energy landscape it is therefore necessary to provide a regularized version of the process with a well-defined variance.^{6} To this end, Fyodorov & Bouchaud [14] constructed a lattice version of the model, called the *circular-logarithmic* model. In this case, one replaces the function *V* (*t*), *t*∈[0,2*π*) with a sequence of *M* random mean-zero Gaussian variables *V* _{i} obtained by sampling the 2dGFF equidistantly along the unit circle at points , and then multiplying by a constant factor ; that is , *i*=1,…,*M*. Such a sequence automatically inherits the *M*×*M* covariance matrix from (3.2) so that the off-diagonal entries are given by
3.7To have a well-defined collection of Gaussian-distributed random variables, we have to ensure the positive definiteness of the covariance matrix by choosing the appropriate diagonal entries *C*_{kk}. A simple calculation shows that one has to choose
3.8In practice, it was further assumed that *W*_{k}=*W*, ∀*k*, and finally the limit *W*→0 was taken.

The crucial observation made in [14] was that in the large-*M* limit the positive integer moments of the partition function corresponding to such a landscape converge in a certain temperature range to the Dyson–Morris version of the Selberg integral (see [57]). More precisely, for *β*<1 and positive integer *n* we have, asymptotically, that
3.9where *Γ*(*x*) is the Euler gamma-function.

From the moments , Fyodorov & Bouchaud were able to reconstruct the probability density of the partition function *Z*(*β*) in the high-temperature phase 0<*β*<1. Actually all the statistical properties of the partition function remain invariant when changing *β*→−*β*, so we can formally consider *β* to be of arbitrary sign and define the high-temperature phase by the condition |*β*|<1. The density in this domain turned out to consist of two pieces, the *body* and the *far tail*. The body of the distribution has a pronounced maximum at *Z*∼*Z*_{e}=*M*^{1+β2}/*Γ*(1−*β*^{2})≪*M*^{2} and is given explicitly by
3.10The most important feature of the above distribution is the power-law decay in the parametrically wide region *Z*_{e}≪*Z*≪*M*^{2}. For *Z*≫*M*^{2} the above expression is replaced by a lognormal tail
3.11where the unknown function *R*(*x*) is of the order of unity for *x*∼*O*(1). It is easy to check that (3.10) and (3.11) indeed match at *Z*∼*M*^{2}. Let us stress once again that this picture is valid only in the ‘high-temperature’ phase |*β*|<*β*_{c}=1.

### (c) 1/*f*-noise as a random landscape II: fluctuating multi-fractal patterns of heights and the threshold of extreme values

In this section, we review the arguments from [19] allowing one to exploit the high-temperature moments (3.9) to analyse the statistics of the number of ‘low’ or ‘high’ values in such a sequence. In this way, we will be able to determine the position of the thresholds *V* _{±} of the extreme values. For the high values, such a threshold *V* _{+} is defined as the level above which typically we should have only a few (i.e. of the order of one) points of the sequence *V* _{1},…,*V* _{M} when *M*≫1. The definition of *V* _{−} for low values is similar, with ‘above’ replaced by ‘below’. We already have mentioned that at the leading order we must have . Correspondingly, we will call the value *V* _{i} of the sequence *x*-*high* provided with 0<*x*<1, and similarly define *x*-*low* values as those for which −1<*x*<0. Introducing further the notation *h*_{i}=*e*^{Vi}>0 the condition becomes equivalent to *h*_{i}>*M*^{x}. In the literature, the general sets of values *h*_{i} associated with points of the lattice in such a way that they scale in the large-*M* limit as *h*_{i}=*M*^{xi}, with *singularity exponents* *x*_{i} filling in a finite interval [*x*_{−},*x*_{+}] are called *multi-fractal* sets. Their most important characteristic is the so-called *singularity spectrum* *f*(*x*), which characterizes the asymptotic growth of the counting function *N*_{>}(*x*)∼*M*^{f(x)} of the number of *x*-high points. Thus, counting *x*-high/low values is intimately related to revealing the multi-fractal structure of 1/*f* noise, and we give a brief account of the procedure below. This will help in quantifying the picture outlined in §2*a*.

To that end, we define the density in terms of which the counting function is given by . In the large-*M* limit the density can be described by the following *multi-fractal ansatz*:
3.12where *n*_{M}(*y*) is a random coefficient of the order of unity which fluctuates strongly from one realization of the sequence *V* _{i} to another. To understand (3.12) we note that *ρ*_{M}(*y*) provides a direct link between *N*_{>}(*x*) and the partition function , as the latter can obviously be expressed in terms of the same function as . Here, it will be convenient to allow *β* to be of any sign. Substituting ansatz (3.12) into the above formula for *Z*(*β*), we can perform the integral over *y* in the limit by the Laplace method, with the stationarity condition resulting in the relation *y*=−2*β*. We arrive at the asymptotic relation *Z*(*β*)≈*n*_{M}(*y*=−2*β*) *Z*_{e}, with *Z*_{e}=*M*^{1+β2}/*Γ*(1−*β*^{2}). Note that the condition |*y*|<2 translates into |*β*|<1. At the same time, we know that *Z*(*β*) for |*β*|<1 must be distributed according to density (3.10). To ensure this property, we therefore conclude that the probability density of the random variable *n*=*n*_{M}(*y*) for a fixed value of *y*∈(−2,0)∪(0,2) must necessarily be of the form
3.13We expect this form of the density to be valid as long as *n*≪*n*_{c}, with *n*_{c} being some cut-off value diverging for . The precise dependence of *n*_{c} on *M*, as well as the shape of for *n*≫*n*_{c}, cannot be extracted from the above arguments, although the internal consistency of the Laplace method suggests that (3.13) cannot hold when *n*∼*M*^{ν} when *ν*>0.

These facts can now be used to determine the counting function statistics. Indeed, substituting (3.12) into the integral for the counting function we find by the same method with the same random factor *n*_{M}(*x*) distributed according to (3.13) and the typical value being given by
3.14The significance of the above is, in particular, the fact that it allows one to determine the precise position *V* _{+} of the typical threshold of extreme values. By definition the threshold is determined by the condition when *M*≫1. A straightforward calculation then allows one to show that
3.15The same large-*M* asymptotic must also hold for the position of the absolute maximum *V* _{m} of the sequence which is always among a few values above that threshold. The statistical properties of *V* _{m} will be discussed in more detail in the next section.

Note finally that had we instead decided to use the condition this would result again in (3.15) but with replaced by . The latter value is indeed known to be characteristic of short-ranged correlated random sequences. The difference is owing to the fact that for such sequences the mean and the typical values of the counting function are always of the same order, whereas for log-correlated sequences in the vicinity of the threshold the typical value becomes parametrically smaller than the mean value . Similar behaviour is believed to be shared by a broad class of disorder-dominated multi-fractal processes and fields (see [19] for a more detailed discussion).

### (d) 1/*f*-noise as a random landscape III: duality, freezing and statistics of extremes

Now we turn our attention to the generating function which will underpin many subsequent calculations. It may be checked easily that the leading-order large-*M* behaviour is dominated by the ‘body’ density (3.10) rather than by the lognormal tail (3.11), and so we find after straightforward manipulations
3.16where we have employed the notation *ϕ*_{β}=*F*(*β*)−*F*_{e}(*β*) for the deviation of the free energy from its typical value in the high-temperature phase
3.17

Note that after identifying *M*^{−1}*Z*(*β*) as a regularization for the 1/*f* noise integral , (3.16) can be interpreted as the evaluation of the ‘path integral’ (3.5) and (3.6) for the functional with *p*>0, and in this way is equivalent to the identity
3.18expected to hold when |*β*|<1 for any regularized version of the 2*π*-periodic Gaussian 1/*f* noise with finite variance . In particular, let us take the limit in such a way that . Note that for the periodic 1/*f* noise (3.3) we obviously must have . Using this fact, we easily find that, in the limit in question, (3.18) yields the identity
3.19where is a constant depending on the chosen regularization. In turn, (3.19) means that the quantity (which can be interpreted as a measure of *roughness* of the 1/*f* signal) is Gumbel-distributed: *R*=*z*′_{e}(0)+*r* with random *r* whose probability density is . This result was derived for the first time in [58] by a completely different method, and remained, until recently, one of only a few explicit results on the statistics of 1/*f* noise.

The integral on the right-hand side of (3.16) belongs to a class of special functions that enjoyed a detailed investigation in the mathematical literature only a few years ago (see [59]). It possesses several non-trivial properties. In particular, it was noted in [15] that *g*_{β}(*y*) satisfies a remarkable and important *duality relation*: *g*_{β}(*y*)=*g*_{1/β}(*y*). Indeed, after some algebraic manipulations one can rewrite the integral in the right-hand side as
3.20
3.21where in the right-hand side formally 1<1/*β*^{2}≠integer, but it is easy to check that in the limit 1/*β*^{2}→integer the series in fact retains a well-defined finite value.

Let us again stress that (3.16) holds only for 0<*β*<1 and the duality of the right-hand side cannot be used to evaluate the left-hand side for *β*>*β*_{c}=1. Instead, there is accumulating evidence that a type of *freezing* phase transition happens at *β*=*β*_{c}. We have already mentioned the simplest instance of that transition at the level of the mean free energy which manifests itself (to leading order in *M*) in the change from the value for *β*<1 to the temperature-independent value for *β*>1. This freezing of the leading-order mean free energy, which follows the pattern of Derrida's uncorrelated REM, was, for the present model, rigorously proved recently in [49]. However, the idea of freezing has been elevated far beyond the level of the first moment and conjectured to extend to a much richer *freezing scenario*: the whole generating function *g*_{β}(*y*) ‘freezes’ to the temperature-independent profile *g*_{β=1}(*y*) everywhere in the ‘glassy’ phase *β*>1 [14,17].

This scenario is supported by the following arguments: (i) heuristic real-space renormalization group calculations [17] revealing an analogy to the travelling-wave analysis of polymers on disordered trees [31] where such a scenario can be rigorously shown to hold, (ii) the duality relation mentioned above can be shown to hold also for other types of logarithmic landscapes [15,60]—in particular, the duality forces the ‘temperature flow’ of the function *g*_{β}(*x*) to stop at the critical point *β*=*β*_{c}=1, (iii) by relations between the freezing scenario and the mechanism of one-step replica symmetry breaking in logarithmic models [60], and (iv) finally, by direct numerical simulations in these papers. All these facts taken together inspire our confidence in the validity of the freezing scenario, although it remains a major challenge to prove this conjecture rigorously.

One of the main consequences of the freezing conjecture is that it implies the possibility of obtaining the distribution of the (properly rescaled) free energy in the low-temperature phase. Namely, following [14] we make an additional assumption that for all *β*>1 when freezing is operative the expression (3.17) for the typical value of the free energy *F*_{e}(*β*) should be replaced with the value of the position of the threshold of minimal values , so that *ϕ*_{β}=*F*(*β*)−*V* _{−}. Introducing again the notation *p*=e^{βy} and using the fact that for our particular choice of the model *g*_{β=1}(*y*)=2 e^{y/2}*K*_{1}(2 e^{y/2}), where *K*_{1}(*z*) denotes the modified Bessel function of the second kind, we see that the freezing scenario allows us to rewrite relation (3.16) for all *β*>1 as
3.22This formula can be used as a generating function for the random variable *ϕ*=*ϕ*_{β}, and can be further employed to extract the probability density for that variable in a closed form:
3.23where *ψ*(*x*)=*Γ*′(*x*)/*Γ*(*x*). In particular, as , one can gain access to the distribution of the minimum of the random potential sequence. Introducing, correspondingly, one finds in the large-*M* limit the probability density for the variable *x* to be
3.24Such a distribution is manifestly different from the Gumbel law, which has a probability density of the form and which holds for short-range-correlated Gaussian random sequences. Moreover, the density *p*(*x*) does indeed exhibit the universal Carpentier–Le Doussal tail . One can easily find all cumulants of the distribution (3.24) (see [15]), e.g. the mean , the variance , etc., in terms of Riemann-zeta values. All this agrees with the available numerics.

## 4. Statistics of extreme and high values of circular unitary ensemblecharacteristic polynomials

The apparent similarity between the circular-logarithmic model discussed in the previous section and the logarithm of the characteristic polynomial of a CUE matrix makes it evident that they exemplify two different ways of regularizing the same 2*π*-periodic 1/*f*-noise (3.3). The latter can be properly defined only as a random generalized function (see [41]). The two regularizations are however of a rather different nature: the log-circular model replaces 1/*f* noise with a finite sequence of *M* random variables, whereas the log-modulus of the characteristic polynomial for any finite *N* is a piecewise-continuous random function with *N* logarithmic singularities in the interval [0,2*π*]. Nevertheless, if the limiting 1/*f* noise is a meaningful object at all, it would be natural to expect the two processes to share the same extremal and high/low-level values statistics in the limits and , respectively.

To substantiate this claim, we follow the same strategy for characteristic polynomials as we did for the log-circular models. Our main object of interest in this section will therefore be the moment (1.9), which is analogous to the partition function discussed in the previous section, and in the associated free energy. We shall be interested in various choices of the interval *L*.

The first step is to consider the positive integer moments
4.1where the expectation is with respect to Haar measure on the unitary group . As is well known, the expectation value in the integrand is given by a Toeplitz determinant:
4.2where
4.3The asymptotic behaviour of such a Toeplitz determinant in the limit was conjectured in [61] and proved in [62]:
4.4where *G*(*x*) is the Barnes function. Further progress is possible in two cases.

### (a) The full circle: *L*=2*π*

In this case, substituting (4.4) back to (4.2) and (4.1), we see that the resulting expression is the standard Dyson–Morris version of the Selberg integral (see [57]) convergent for *k*<1/*β*^{2} and divergent for larger *k*. As we have *k*≥1, the procedure makes sense only for *β*^{2}<1. Introducing the notation , we find
4.5

To understand how to deal with the case *k*>1/*β*^{2}, we recall that Widom's asymptotic formula (4.4) is valid only when all of the differences |*θ*_{i}−*θ*_{j}| remain *finite* when , and should be replaced by a different expression when |*θ*_{i}−*θ*_{j}|∼*N*^{−1}. One can check that the divergence of the integral for *k*>1/*β*^{2} is owing precisely to the fact that these near degeneracies become important. Relying on our experience with the corresponding situation for the circular-logarithmic case suggests that taking into account the correct short-scale cut-off cures the formal divergence, but changes the asymptotics of the moments with *N*: namely, these become of the order of *N*^{1+k2β2} for *k*>*β*^{−2}, whereas they are of the order of *N*^{(1+β2)k} for *k*<*β*^{−2} (cf. the large-*N* asymptotics of the moments of the characteristic polynomials derived in [7]). Such a change of behaviour will lead to a lognormal (far) tail in the distribution.

The above expression (4.5) for the moments is exactly the same as (3.9), except that the characteristic scale has a different value, with the ratio of the two scales given by the factor *G*^{2}(1+*β*)/*G*(1+2*β*). Note that the factor tends to unity when *β* approaches the freezing temperature *β*=1. Relying then on the freezing paradigm we conclude that such a factor will not affect the statistics of the maximum, and we can simply translate, *mutatis mutandis*, all results from the regularized circular-logarithmic model to the values of characteristic polynomial sampled along the full circle *θ*∈(0,2*π*].

### (b) A mesoscopic arc: 1≪*N*_{L}=*NL*/2*π*≪*N*

Another case when the partition function moments can be explicitly evaluated corresponds to an arc of length *L* that is much smaller than that of the full circle, but still much larger than the typical distance between the eigenvalues, 1/*N*. We will call such an intermediate scale *mesoscopic*. Following the same route as before, in the limit *N*≫1 we have
4.6which after expanding the exponents e^{iθp}≈1+i*θ*_{p}, and further rescaling *θ*_{p}=*Ly*_{p} acquires the form
4.7Note that the factor *L*^{−β2k2} can be written as , where the averaging is performed over the standard mean-zero Gaussian variable *u* of unit variance. We conclude that the characteristic polynomial *p*_{N}(*θ*) in the interval [0,*L*] has the same probability law as a product of two independent random variables: , such that for the ‘reduced’ partition function we have
4.8The integral in the above equation is the standard Selberg integral [57], which gives finally
4.9where we have introduced the scale , with *N*_{L}≡*NL*/2*π*. For the case *k*>*β*^{−2}, the integral is divergent and we can apply a consideration similar to that applied for the full-circle case, which is of secondary importance for our goals and will not be repeated here.

We see that expressions (4.9) for the moments of are precisely the moments of the distribution analysed in [15]. That paper dealt with the aperiodic version of the 1/*f*-noise sampled from the two-dimensional full-plane GFF along an interval of unit length. The analysis followed essentially the same steps as for the circular-logarithmic model, but technically was much more involved owing to the complicated structure of the moments. The main difficulty was to find a method of continuing to complex *k* in a meaningful way, which allows one to find the probability density of *z* from the above moments. The goal was successfully achieved in [15] by employing heuristic methods. Independently, a very similar problem arose in the context of financial mathematics, and an elegant mathematically rigorous solution was proposed in [63,64], with the results of the two approaches coinciding.

The extreme/high value statistics for the modulus of the characteristic polynomial in the interval *N*^{−1}≪*L*/2*π*≪1 can again be translated straightforwardly from those results, and was summarized in the beginning of §2.

### (c) Multi-fractal properties of the modulus of characteristic polynomials

The results for mesoscopic intervals presented above can be also interpreted as revealing the multi-fractal structure of the function |*p*_{N}(*θ*)|. To see this more clearly we resort to a standard tool of multi-fractal analysis, *box counting*, and subdivide the full interval [0,2*π*] into *M*=2*π*/*l*_{b} subintervals (‘boxes’) of equal length *l*_{b} chosen in such a way that the number of boxes satisfies 1≪*M*=2*π*/*l*_{b}≪*N*. We further define the set of coarse-grained values *h*_{n}(*l*_{b}),*n*=1,…,*M* of the function |*p*_{N}(*θ*)| by averaging it over the *n*th box and the corresponding ‘partition functions’ *ζ*_{q}(*l*_{b}) via
4.10Now, straightforward calculation gives for integer *q* the ensemble-averaged value
4.11
4.12where we have used the rotational invariance of the ensemble-averaged values, definition (4.1) and formula (4.7), with *S*_{q}(*β*^{2}) standing for the *β*-dependent Selberg integral featuring in (4.7) and convergent for 1<*q*<*β*^{−2}. Though formally derived for the integer *q* one expects the above to be valid for real *q*<4 as well, so we finally arrive at the relation
4.13In view of the nonlinear dependence of the exponent *τ*_{q} on *q*, this scaling of the ratio *i*_{q}(*l*_{b}) associated with coarse graining on the scale *l*_{b} implies multi-fractality of the underlying function |*p*_{N}(*θ*)| (see [19] with discussion and further references therein). In fact, multi-fractality could be immediately anticipated from the presence of the lognormal factor in the ‘mesoscopic’ statistics of |*p*(*θ*)| with an exponent whose variance, , is linear in the log of the number of boxes.

### (d) Measure of high points of characteristic polynomials

The multi-fractal structure of 1/*f* noise also shows up in a somewhat different, but related way in the statistics of high values of |*p*_{N}(*θ*)|. Note that as follows from our discussion the typical value of the maximum of |*p*_{N}(*θ*)| in the interval *θ*∈[0,*L*] is *N*_{L}. The simplest quantity which helps to quantify the structure of high values is the relative length *μ*_{N}(*x*;*L*) (as fraction of the total length *L*) of those intervals in [0,*L*], where |*p*_{N}(*θ*)|>(*N*_{L})^{x} for a fixed 0<*x*<1. Statistics of this quantity are naturally related to the statistics of the partition function in the high-temperature phase *β*<1, as was informally discussed for the case of the log-circular model in §3. To substantiate the claim for the characteristic polynomials, we choose here to follow an alternative procedure. Though some parts of the method remain of somewhat heuristic nature, we believe that it grasps the mathematical structures correctly. It remains a challenge to justify it by rigorous methods.

We start from the definition
4.14where *χ*{*u*}=1 if *u*>0 and zero otherwise. Using the Fourier transform, one can rewrite this as
4.15Our goal is again to calculate the integer moments 〈*μ*_{N}(*x*;*L*)^{p}〉 for *p*=1,2,…. These can be found by the same procedure we used for the ‘partition function’ moments above and are given by
4.16where
4.17

The ensemble average is again a Toeplitz determinant
4.18where
4.19As already noted, the asymptotic behaviour of such a Toeplitz determinant in the limit is given by the Fisher–Hartwig conjecture [61,62]:
4.20where *G*(*x*) is the Barnes function, and the formula is valid for . For purely imaginary *λ*_{j}=i*k*_{j}, which is our case, it gives
4.21In the limit , the integral is dominated by the saddle point *k*_{j}=−i*η*_{j},∀*j*=1,…,*p* which gives
4.22Substituting this back into (4.16) one may expect that all *η*-integrals for and fixed *x*>0 will be dominated by the lower limit *η*_{j}=*x*,∀*j*. To this end, we introduce new variables *u*_{j} by , ∀*j*=1,…,*p* and find for
4.23

The remaining integral can be performed explicitly again in the two limits: (i) the full-circle case *L*=2*π* so that *N*_{L}=*N* and (ii) the mesoscopic interval 1≪*N*_{L}≪*N*. In case (i), it is the familiar Dyson–Morris integral which gives *Γ*(1−*px*^{2})/[*Γ*(1−*x*^{2})]^{p} for 0<*x*^{2}<1/*p* and diverges otherwise (the same remark on the divergence of the moments of the partition function applies here). Collecting all factors, we get
4.24where the ‘typical’ value *μ*_{e}(*x*) for the measure is given by
4.25The non-trivial leading scaling with *N*^{−x2} reflects the multi-fractal-type structure of the measure of intervals supporting high values.

Note that the mean value of the measure is given by . It obviously stays finite for *x*→1, and a direct calculation shows such an expression is valid for any *x*>0, without restricting to *x*<1. However, when approaching *x*=1 the mean value is significantly larger than the typical value, and is dominated by rare fluctuations. Recall that with is conjectured to be the threshold of extreme values. This claim is consistent with the fact that at such a level *x* the measure supporting high values is of the order of *μ*_{e}(*x*)∼*N*^{−1}, that is comparable with the minimal scale of the problem (the typical separation between zeros) which simply means there are typically of order of one maxima above such a level.

In the whole interval 0<*x*<1, the probability density of *μ*≡*μ*_{N}(*x*) can be immediately recovered by noting that the moments of *ξ*=*μ*/*μ*_{e}(*x*) coincide with those of the random variable *Z*/*Z*_{e} distributed according to (3.10), with the obvious identification *x*^{2}→*β*^{2}. We conclude that the total relative length *μ*_{N≫1}(*x*) of the intervals supporting high values of characteristic polynomials is distributed according to the probability density given by (2.17). Such an expression should be valid for all *μ* as long as *μ*≪1 and when *μ*∼1 must have a sharp cut-off, as obviously the fraction of the total length cannot be larger than unity in any realization. Note, however, that although the above picture is in good qualitative agreement with numerical experiments performed for the circular-logarithmic model, it has so far not proved possible to achieve quantitative agreement with (2.17) for any realistic numerical simulation of 1/*f* noise (see [19]).

For the mesoscopic interval with *L*≪1, we can proceed similarly, and, again after expanding in (4.23) the integrand assuming *θ*_{j}≪1, connect to the corresponding moments of partition function (4.8), (4.9) with the obvious change *β*^{2}→*x*^{2}. In this way, we find that the random variable *μ*_{N}(*x*) is distributed as the product of two independent factors: , with the standard normal *u* and characterized by the integer moments
4.26Here the ‘typical’ value is related to the similar scale for the full-circle case (4.25): , so shares the same multi-fractal scaling of the length of the intervals supporting high values. To restore the probability density of the random variable , we define the generic moments for any complex *s* at fixed 0<*x*<1. An explicit expression for *M*_{x}(*s*) was found in [15,63], and is given by (2.19). This allows us to represent as a contour integral (2.18). The dominating features of that distribution, like the power-law tail at *ξ*≫1, are the same as for the full-circle case. Numerical verifications of these features are expected to be even more challenging, because by definition one is restricting attention to a small fraction of data that are themselves hard to obtain in quantities sufficient for statistical analysis.

## 5. Motivation for the predictions relating to the extreme value statistics of

The calculations outlined in the previous section are based on estimating the asymptotics of the moments of the characteristic polynomials, and using the high moments to determine the extreme values. The connections between these moments and those of the zeta function are now relatively well understood, at least conjecturally [7,9,10,23]. For example, to leading order as
5.1with , where *a*(*λ*) is defined by (2.26). This allows us to use the previous calculations to motivate the predictions for the extreme value statistics for the zeta function. The approach follows closely that already detailed in the random-matrix context, and so we limit ourselves to outlining the principal steps.

The analogue of the partition function defined by (1.9) is clearly
5.2and so the analogue of the moments of partition function (4.1) is given by
5.3Interchanging the *t*-integral with the *y*_{j}-integrals (using the fact that the premultiplying factor is slowly varying) produces an integrand of the form
5.4This is the analogue of the left-hand side of (4.2). A general expression for shifted moments of this kind was conjectured in [9,23]. This takes the form of a multiple integral; for example,
5.5where
5.6and *A*_{k} is another Euler product which is analytic in the regions in which we are interested. The leading-order asymptotics of such integrals were shown in [9,23,65,66] to take a form analogous to (4.4), but with an additional factor *a*(*β*)^{k}, where *a*(*β*) is defined by (2.26). The calculation then proceeds exactly as in the previous section; for example, the arithmetic factor multiples . In determining the extreme value statistics, however, the fact that the freezing transition occurs at *β*=1 and that *a*(1)=1 suggests that the arithmetical factor does not contribute at leading order. In the computation of the measure of large values this factor remains.

It should be emphasized that this calculation only concerns the leading-order asymptotics. It is known that the lower order terms contribute significantly to the moments and are necessary to model numerical computations. This is because the moments are long-range statistics and so are more sensitive to lower order corrections than local statistics. It would presumably be important to develop a method to calculate these lower order terms in order to develop a more accurate model for *p*(*x*) at a finite height up the critical line.

## Funding statements

Y.V.F. was supported by EPSRC grant EP/J002763/1 ‘Insights into Disordered Landscapes via Random Matrix Theory and Statistical Mechanics’. J.P.K. was supported by a grant from the Leverhulme Trust and by the Air Force Office of Scientific Research, Air Force Material Command, USAF, under grant no. FA8655-10-1-3088. The US Government is authorized to reproduce and distribute reprints for Governmental purpose notwithstanding any copyright notation thereon.

## Acknowledgements

We are most grateful to Dr Ghaith Hiary and Mr Timothée Wintz for collaboration and extremely helpful discussions at various stages of this research and for producing extensive numerical data supportive of our speculations.

## Appendix A. Small-distance behaviour of the two-point correlation function of

We have from the Euler product that
A1where the *p*-summation includes all prime numbers. According to the prime number theorem, the number *π*(*x*) of primes smaller than *x* grows as when . This can be interpreted as implying that the probability that a large integer *n* is prime is asymptotically (see [67] for an introduction). Moreover, sums over primes of the type that are dominated by large primes can be approximated by .

Substituting (6.1) into the two-point correlation function we obtain
A2Further rewriting the product of cosine factors as
we see that the second term will always give rise to rapid oscillations with *t*, whereas the first term does the same except when . We conclude that the dominating contributions come from those non-oscillating terms in the sum: *p*_{1}=*p*_{2}=*p*;*n*_{1}=*n*_{2}=*n*, resulting in the following ‘diagonal approximation’ for the two-point correlation function:
A3Using 1/*n*^{2}=1/*n*+(1−*n*)/*n*^{2} and denoting *x*=|*x*_{1}−*x*_{2}|, relation (6.3) can be rewritten as
A4Now note that for any *n*≥2 the sums are convergent, as the integrals are convergent. This fact implies that the second sum in (6.4) has a finite value when *x*→0. As to the first sum, invoking (6.1) one observes that it can be related exactly to the Riemann zeta function *ζ*(*s*) along the line *s*=1+i*x*, that is . Recall finally that *ζ*(1+i*x*)|_{x→0}≈1/i*x*, implying in the limit *x*→0 the relation
A5Of course, the logarithmic form of correlations cannot hold for arbitrary small *x*=|*x*_{1}−*x*_{2}|; must be obviously finite. As is well known (e.g. [68]) the diagonal approximation works only for *p*<*t*/2*π*, and breaks down for larger primes. Taking this into account, one can show that (6.5) holds as long as , whereas for .

The fact that has a Gaussian distribution follows from a similar analysis of the higher moments. Essentially, the diagonal terms dominate, and so the prime sums representing the higher moments reduce to powers of the second moment in the same way as those of a Gaussian.

## Footnotes

One contribution of 13 to a Theo Murphy Meeting Issue ‘Complex patterns in wave functions: drums, graphs and disorder’.

↵1 As will be discussed in §3, the typical size of the maximum of

*M*independent and identically distributed normal, mean-zero random variables*V*_{i}with variance behaves asymptotically like . The estimate for the maximum modulus of the*ζ*-function follows from considering samples drawn independently from the Gaussian distribution in (1.2) with variance .↵2 As was recently observed in [18] the probability density (2.9) in fact corresponds to the sum of two independent Gumbel-distributed variables.

↵3 It is possible to give a bona fide mathematical construction of the continuous 2dGFF, see [41], but for our goals it suffices to rely upon this heuristic approach.

↵5 See, however, the investigation of the transformation properties of the quadratic forms associated with (3.5) in [55,56]. Those objects turned out to be intimately related to the representation theory of the group

*SL*(2,*R*) and the Virasoro algebra. It would be natural to expect that such an invariance may play an important role in building a mathematically rigorous theory of path integrals of the type (3.6).↵6 Note that precisely the same process and the associated Boltzmann measure were found recently to play an important role in constructing two-dimensional closed conformally invariant random curves [50,51].

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.