## Abstract

A new method is proposed to determine the time–frequency content of time-dependent signals consisting of multiple oscillatory components, with time-varying amplitudes and instantaneous frequencies. Numerical experiments as well as a theoretical analysis are presented to assess its effectiveness.

## 1. Introduction

Oscillatory signals occur in a wide range of fields, including geophysics, biology, medicine, finance and social dynamics. They often consist of several different oscillatory components, the nature, time-varying behaviour and interaction of which reflect properties of the underlying system. In general, we want to assess the number, strength and rate of oscillation of the different components constituting the signal, to separate noise from signal, and to isolate individual components; efficient and robust extraction of this information from an observed signal will help us better describe and quantify the underlying dynamics that govern the system. For each of the quantities of interest listed, we thus want an estimator that is consistent, that has (ideally) small variance and that produces results robust to different types of noise.

If the observed signal *f* can be written as a finite sum of the so-called harmonic components, i.e. , where *a*_{ℓ}>0 (respectively, *ξ*_{ℓ}>0) represents the strength or amplitude (respectively, frequency) of the ℓth component, then one can recover the *a*_{ℓ} and *ξ*_{ℓ} from time samples of *f*(*t*) via the Fourier transform of *f*, defined by . (If the *ξ*_{ℓ} are all integer multiples of a common 1/*t*_{0}, then the integral can be taken over an interval of length *t*_{0}; when this is not the case, one can resort to integrals over long time intervals and average. Typically, only discrete samples , are known, rather than the continuous-time course , and the integrals are estimated by quadrature.) However, oscillatory signals of interest often have more complex behaviour. We shall be interested in particular in signals that are still the combination of ‘elementary’ oscillations, but in which both the amplitude and the frequency of the components are no longer constant; they can be written as
1.1where , *A*_{k}(*t*)>0 and *φ*′_{k}(*t*)>0 for all *k*, but *A*_{k}(*t*) and *φ*′_{k}(*t*) are not constants. One can compute the Fourier transform of such signals, and recover *f* from (this can be validly done for a much wider class of functions), but it is now less straightforward to determine the time-varying amplitudes *A*_{k}(*t*) and the so-called ‘instantaneous frequencies’ *φ*′_{k}(*t*) from . Although the time-local behaviour of the oscillations, and their deviation from perfect periodicity, cannot be captured by the Fourier transform in an easily ‘readable’ way, an accurate description of this instantaneous behaviour is nevertheless important in many applications, both to understand the system producing the signal and to predict its future behaviour. Examples in the medical field include studies of the circadian [1,2] and cortical rhythms [3], or of heart rate [4,5] and respiratory variability [6,7], all widely studied to understand physiology and predict clinical outcomes.

The last 50 years have seen many approaches, in applied harmonic analysis and signal processing, to develop useful analysis tools for signals of this type; this is the domain of time–frequency (TF) analysis. Several algorithms and associated theories have been developed and widely applied (see e.g. the overview [8]); well-known examples include the short-time Fourier transform (STFT), continuous wavelet transform (CWT) and Wigner–Ville distribution (WVD). The main idea is often to ‘localize’ a portion of the signal in time, and then ‘measure’ the oscillatory behaviour of this portion. For example, given a function *f*∈*L*^{2}, the *windowed transform* or STFT associated with a window function *h*(*t*) can be defined as
1.2where is the time, is the frequency and *h* is the window function chosen by the user—a commonly used choice is the Gaussian function with kernel bandwidth *σ*>0, i.e. *h*(*t*)=(2*πσ*)^{−1/2} e^{−t2/(2σ2)}. (The overall phase factor e^{−i2πηt} is not always present in the STFT, leading to the name *modified short-time Fourier transform* (mSTFT) for this particular form in [9].)

Other, more specialized methods, targeting in particular signals of type (1.1), include the empirical mode decomposition [10], ensemble empirical mode decomposition [11], the sparsity approach [12], iterative convolution filtering [13,14], the approximation approach [15], non-local mean approach [16], time-varying autoregression and moving average approach [17] as well as the synchrosqueezing transforms (SSTs) introduced and studied by some of us [9,18–21].

All TF methods that target reasonably large classes of functions (as opposed to functions with such specific models that complete characterization requires only fitting a small number of parameters) must face the Heisenberg uncertainty principle, limiting how accurately oscillatory information can be captured over short time intervals; for toy signals specially designed to have precise TF properties (e.g. chirps), this typically expresses itself by a ‘blurring’ or ‘smearing out’ of their TF representation, regardless of the analysis tool used. *Reassignment methods* [22–24], introduced in 1978 and recently attracting more attention again, were proposed to analyse and possibly counter this. Their main idea is to analyse the local behaviour in the TF plane of portions of the representation, and determine nearby possible TF concentration candidates that best explain it; each small portion is then reallocated to its ‘right’ place in the TF plane, to obtain a more concentrated TF representation that, one hopes, gives a faithful and precise rendering of the TF properties of the signal. Reassignment methods can be applied to very general TF representations [8,23]; they can be adaptive as well [25]. It has been argued recently [16] that reassignment methods can be viewed as analogues of ‘non-local means’ techniques commonly applied in image processing; this provides an intuitive explanation for their robustness to noise.

The SST can be viewed as a special reassignment method [23–25]. In SST, the STFT or CWT coefficients are reassigned *only* in the frequency ‘direction’ [19,21,26]; this preserves causality, making it possible to reconstruct each component with real-time implementation [27]. The STFT-based SST of *f* is defined as
1.3where *g*_{α} is an ‘approximate *δ*-function’ (i.e. *g* is smooth and has fast decay, with , so that *g*_{α}(*t*):=(1/*α*)*g*(*t*/*α*) tends weakly to the delta measure *δ* as *α*→0), and with defined by
1.4The SST for CWT is defined similarly; see [19,28], or §2. SST was proposed originally for sound signals [18,29]; its theoretical properties have been studied extensively [19,26–28,30–32], including its stability to different types of noise [28,33]. Several variations of the SST have been proposed [31,34–37]; in particular, the SST approach can also be used for other TF representations, such as the wave packet transform [35] and S-transform [38], and it can be extended to two-dimensional signals (such as images) [39,40]. In addition, its practical usefulness has been demonstrated in a wide range of fields, including medicine [5–7,41–44], mechanics [37,45,46], finance [47,48], geography [38,49–51], denoising [31], atomic physics [52–54] and image analysis [55,56].

The SST approach can extract the instantaneous frequency and reconstruct the constitutional oscillatory components of a signal of type (1.1) in the presence of noise [28,33]. However, its performance suffers when the signal-to-noise ratio (SNR) becomes low: as the noise level increases, and even before it completely obscures the main concentration in the TF plane of the signal, spurious concentration areas appear elsewhere in the TF plane, caused by correlations introduced by the overcomplete STFT or CWT analysis tool. The effect of these misleading perturbations, which downgrade the quality of the results, can be countered, to some extent, by *multitapering*.

Multitapering is a technique originally proposed to reduce the variance and hence stabilize power spectrum estimation in the spectral analysis of stationary signals [57–59]. Sampling the signal during only a finite interval leads to artefacts, traditionally reduced by tapering; an unfortunate side effect of tapering is to diminish the impact of samples at the extremes of the time interval. Thomson [57] showed that one can nevertheless exploit optimally the information provided by the samples at the extremities, by using several orthonormal functions as tapers: the average of the corresponding power spectra is a good estimator with reduced variance. This technique has since been applied widely [58,60–63]. Multitapering was later extended to non-stationary TF analysis by combining it with reassignment [64–66]: a more robust ‘combined’ reassigned TF representation is obtained by picking orthonormal ‘windows’ (used to isolate portions of the TF representation when working with a reassignment method), and averaging the reassigned TF representations determined by each of the individual windows. Heuristically, the concentration for a ‘true’ constituting component of the signal will be in similar locations in the TF plane for each of the individual reassigned TF representations, whereas the spurious concentrations, artefacts of correlations between noise and the windowing function, tend not to be co-located and have a diminished impact when averaged. In the SST context, a similar multi-taper idea was used by one of us in a study of anaesthesia depth [5,44], in which *J* different window functions *h*_{j}, *j*=1,…,*J*, are considered, and the multitaper SST (MTSST) is computed as the average of the individual : . Using multiple tapers reduces artefacts, and the MTSST remains ‘readable’ at higher noise levels than a ‘simple’ SST [5,44]. To suppress noise artefacts better, it is tempting to consider increasing *J*. However, the area in the TF plane over which the signal TF information is ‘smeared out’ also increases (linearly) with *J*, and a balance needs to be observed; in the multitaper reassignment method of [64], for instance, six Hermite functions were used (i.e. *J*=6).

In this paper, we introduce a new approach to obtain better concentrated time–frequency representations, which we call *ConceFT*, for ‘concentration of frequency and time’. It is based on STFT- or CWT-based SSTs, but the approach could be applied to yet other TF decomposition tools. The ConceFT algorithm will be defined precisely below, in §2. Like MTSST, ConceFT starts from a multilayered time–frequency representation, but instead of averaging the SST results obtained from STFT or CWT for orthonormal windows, which can be viewed as elements in a vector space of time–frequency functions, it considers many different projections in this same vector space, and averages the corresponding SSTs; for more details, see §2; figures illustrating the result of STFT-based ConceFT can be found in the Introduction section of the electronic supplementary material. Section 3 studies the theoretical properties of ConceFT, and explains how it can provide reliable results under challenging SNR conditions; finally, in §4, we provide several numerical results.

## 2. The ConceFT algorithm

We start by briefly reviewing SST. In the Introduction, we defined STFT-based SST, discussed in more detail in [21,26]; to show that the situation is very similar for CWT-based SST, we discuss that case here; see [19,28] for details. We start with the wavelet *ψ* with respect to which the CWT will be computed, which must necessarily have mean zero; that is, ; let us also pick it to be a Schwartz function. We shall assume that we are dealing with real signals *f*; in this case, the symmetry in *ξ* of makes it possible to consider only the ‘positive-frequency part’ of *f*, by picking *ψ* so that its Fourier transform is supported on . (The approach can be extended easily to handle complex signals as well, but notation becomes a bit heavier.) Then the *continuous wavelet transform* of a tempered distribution *f*, with the variables *a*, *b* standing for scale and time location, is defined as the inner product of *f* with *ψ*^{(a,b)}(*t*)=|*a*|^{−1/2}*ψ*((*t*−*b*)/*a*). Even if the Fourier transform is very concentrated around some frequency *ω*_{0}, the magnitude of the CWT will be spread out over a range of scales *a*, corresponding to a neighbourhood of *ω*_{0}. However, the phase information of will still hold a ‘fingerprint’ of *ω*_{0} on that whole neighbourhood, in that will show oscillatory behaviour in *b*, with frequency *ω*_{0}, for a range of different *a*. This is the motivation for the SST, which shifts the CWT coefficients ‘back’, according to certain reassignment rules determined by the phase information. More concretely, we set a *threshold* *Γ*>0, and then define
and
where ∂_{b} is the partial derivative with respect to *b* (see the electronic supplementary material for a remark concerning robust numerical implementation). The hard threshold *Γ* can be adjusted for best reduction of the numerical error and noise influence. For example, when the noise level is known or estimated with reasonable accuracy, *Γ* can be chosen so as to reduce the influence of the noise most effectively; see [33] for a discussion. The CWT-based SST then moves the CWT coefficient to the ‘right’ frequency slot, using as guideline:
2.1where 0<*α*≪1 is chosen by the user, *g* is a smooth function so that *g*_{α}(⋅):=(1/*α*)*g*(⋅/*α*)→*δ* in the weak sense as *α*→0, and the factor *a*^{−3/2} is introduced to ensure that the integral of over *ξ* yields a close approximation to the original *f*(*b*). For more details, we refer the reader to [19,28].

Although both the CWT and its derived SST depend on the choice of the reference wavelet *ψ*, this is much less pronounced for the SST; CWT-based SST corresponding to different reference wavelets lead to different but very similar TF representations. (Theoretical reasons for this can be found in [19,28].) In particular, the dominant components in the TF representations are very similar. Moreover, even when the signal is contaminated by noise, these dominant components in the TF representations are not significantly disturbed [28]. However, the distribution of artefacts across the TF representation, induced by the noise, as seen in e.g. the middle left panel of figure S.1, vary from one reference wavelet to another; this can be intuitively explained by observing that the CWT is essentially a convolution with (scaled versions of) the reference wavelet, so that the wavelet transforms of independent and identically distributed (i.i.d.) noise based on different orthogonal reference wavelets are independent. These observations lead to the idea of a multitaper SST algorithm [5,44]. In brief, given *J* orthonormal reference wavelets *ψ*_{j}, *j*=1,…,*J*, one determines the reassignment rules , as well as the corresponding , and then defines the MTSST by
2.2This suggests that averaging over a large number of orthonormal reference wavelets would smooth out completely the TF artefacts induced by the noise, as originally discussed for the reassignment method [64]. However, in order for reassignment to make sense, the reference function, whether it is the window *h* for STFT or the wavelet *ψ* for CWT, must itself be fairly well concentrated in time and frequency, so that inner products with modulated window functions or scaled wavelets do not mix up different components and behaviours of the signal. On the other hand, there is a limit to how many orthonormal functions can be ‘mostly’ supported in a concentrated region in the TF plane—by a rule of thumb generalizing the Nyquist sampling density one can find, for a region in the TF plane, only orthonormal functions that are mostly concentrated on [67]. This limits how many different orthonormal *ψ*_{j} can be used in MTSST.

ConceFT uses the different TF ‘views’ provided by the CWT transforms in a different way, exploiting the *nonlinearity* of the SST operation. (See the electronic supplementary material for a sketch of an alternative way in which one could extend multitaper CWT; not pursued in this paper, however.) For each choice of *ψ*, the collection of CWT , where *f* ranges over the class of signals of interest, span a subspace of , the space of all reasonably smooth functions of the two variables *a*, *b*. Different orthonormal *ψ*_{j} generate different subspaces in ; combined, they generate a larger subspace, in which one can define an infinite number of ‘sections’, each corresponding to the collection of CWT generated by one reference wavelet. Each linear combination of the *ψ*_{j} defines such a CWT space, in which one can carry out the corresponding SST operation. For , where , one has ; because synchrosqueezing is a highly nonlinear operation, the corresponding are however not linear combinations of the . In practice, the artificial concentrations in the TF plane, triggered by fortuitous correlations between the noise and the (overcomplete) *ψ*^{(a,b)}, occur at locations sufficiently different, for different choices of the vector ** r**=(

*r*

_{1},…,

*r*

_{J}), that averaging over many choices of

**successfully suppresses noise artefacts.**

*r*More precisely, the CWT-based ConceFT algorithm proceeds as follows:

— Take

*J*orthonormal reference wavelets,*ψ*_{1},…,*ψ*_{J}, in the Schwartz space, with good concentration in the TF plane.— Pick

*N*random vectors*r*_{n},*n*=1,…,*N*, of unit norm, in ; that is, uniformly select*N*samples in*S*^{J−1}.— For each

*n*=1,…,*N*, define , and .— Select the threshold

*Γ*>0 and the approximation parameter*α*>0, and evaluate, for each*n*between 1 and*N*, the corresponding CWT-based SST of*f*by computing the reassignment rule , and hence , as defined above, with the minor adjustment that when the expression in the reassignment rule denominator has a negative real part, we switch to the vector −*r*_{n}.— The final ConceFT representation of

*f*is then the average 2.3In practice,*J*could be as small as 2, while*N*could be chosen as large as the user wishes.

The square of the magnitude of , , can be of interest in its own right, as an estimated *time-varying power spectrum* (tvPS) of *f*.

STFT-based ConceFT representations are defined entirely analogously, based on the STFT reassignment rule given in §1.

## 3. Theoretical results

In this section, we list and explain theoretical results about CWT-based ConceFT. The detailed mathematical computations and proofs can be found in the electronic supplementary material. Entirely similar results hold for STFT-based ConceFT; as they are established by the same arguments, we skip those details. We start by recalling the structure of our signal space, as introduced in [19,28]. We emphasize that this is, to a large extent, a purely phenomenological model, constructed so as to reflect the fairly (but not exactly) periodic nature of many signals of interest, in particular (but not only) those of a physiological origin (see the discussion in [6]).

A single-component or *intrinsic-mode type* (IMT) function has the following form:
3.1where both the amplitude modulation *A*(*t*) and the phase function *φ*(*t*) are reasonably smooth; in addition, both *A*(*t*) and the derivative *φ*′(*t*) (or the ‘instantaneous frequency’) are strictly positive at all times as well as bounded; finally, we assume that *A* and *φ*′ vary in time at rates that are slow compared with the instantaneous frequency of *F* itself. For the precise mathematical formulation of these conditions, we refer to the electronic supplementary material; this precise formulation invokes a few parameters, one of which, *ϵ*, bounds the ratio of the rate of change of *A* and *φ*′. This parameter will play a role in our estimates below. (Although we are assuming the signal to be real-valued here, all this can easily be adapted to the complex case by replacing the cosine with the corresponding complex exponential; the discussion in the remainder of this section can be adapted similarly.) We also consider signals that contain several IMT components, that is, functions of the type
3.2where each *F*_{ℓ} is an IMT function, and we assume in addition that the instantaneous frequencies *φ*_{ℓ}′(*t*) are ordered (higher ℓ corresponding to larger *φ*_{ℓ}′) and well separated,
3.3for all ℓ=1,…,*L*−1, for some *d* with 0<*d*<1. We denote by the set of all such functions *G*; it provides a flexible *adaptive harmonic model* space for a wide class of signals of interest. (Strictly speaking, they are not ‘truly’ harmonic, if harmonicity is interpreted—as it often is—as ‘having components with frequencies that are integer multiples of a fundamental frequency’.)

Next, we turn to the noise model for which we prove our main theoretical result. For the purposes of this theoretical discussion, we use a simple additive Gaussian white noise (even though the approach works for much more challenging noise models as well—see the electronic supplementary material). That is, we consider our noisy signals to be of the form
3.4where is in , *Φ* is a Gaussian white noise with standard deviation 1 and *σ*>0 is the noise level. *Y* is a generalized random process, since by definition *G* is a tempered distribution. We could extend this, introducing also the trend and a more general noise model as in [28], the wave-shape function used in [30], or the generalized IMT functions that model oscillatory signals with fast varying instantaneous frequency of [68]. Since none of these generalizations would significantly affect the mathematical analysis, we restrict ourselves to the model (3.4).

Finally, we describe the wavelets *ψ*_{1},…,*ψ*_{J} with respect to which we compute the CWT of *Y* . For the sake of convenience of the theoretical analysis, we assume that they are smooth functions with fast decay, and that their Fourier transforms are all real functions with compact support, , where 0<Δ_{j}<1. We also assume that the *ψ*_{1},…,*ψ*_{J} form an orthonormal set, that is, , where *δ*_{i,j} is the Kronecker delta. To build appropriate linear combinations of the *ψ*_{j}, we define, for any unit-norm vector ** r**=(

*r*

_{1},…,

*r*

_{J}) in , the corresponding combination as . It is convenient to characterize intervals for the scale

*a*such that the support of overlaps

*φ*

_{ℓ}′(

*b*), where ; we thus introduce the notation . It then follows from the definition of the CWT as the inner product between the signal and scaled, translated versions of the wavelets that [19,28] 3.5where and

*ϵ*

_{j}is of order

*ϵ*for all

*j*=1,…,

*J*. Here

*ϵ*

_{j}depends on the first three absolute moments of

*ψ*

_{j}and

*ψ*′

_{j}and the model parameters. It follows that the CWT of

*Y*, with respect to

*ψ*

_{j}, is given by 3.6where is the indicator function of the set ; note that the

*ϵ*

_{j}(

*a*,

*b*) term, again of order

*ϵ*, need not be the same as before. As shorthand notations, we will use bold symbols to regroup quantities indexed by

*j*=1,…,

*J*into one

*J*-dimensional vector, e.g. (which has norm of order

*ϵ*), (a complex Gaussian random vector [69], with mean , and covariance as well as relation matrix equal to

*I*

_{J×J}—see the electronic supplementary material), , and . Finally, or, more explicitly, 3.7Under the general assumptions for our model, a similar calculation as for (3.7) leads to where is again a

*J*-dimensional vector of order

*ϵ*, and , where is again a complex Gaussian random vector. The scalar products and are independent complex Gaussian random variables, with mean 0 and variance ∥

**∥**

*r*^{2}and , respectively. (See the electronic supplementary material.) Set now . Then it follows that for

*a*∈

*Z*

_{ℓ}(

*b*), we get the following reassignment for the CWT : 3.8which is a ratio random variable of two dependent complex Gaussian random variables with non-zero means. Next, we consider, for each fixed realization of the random noise, the unit-norm vector as a random vector, picked uniformly from . Restricting the choice of

**to the subset of**

*r**S*

^{J−1}for which the inner product of

**and**

*r*

*W*^{ψ}

_{Y}(

*a*,

*b*) has magnitude larger than 2

*κ*reflects the threshold used in the SST algorithm (see §2); restricting

**so that the inner product has positive real part means that we sample**

*r***from a half-sphere rather than the whole sphere. (See the electronic supplementary material for more details.)**

*r*Assuming that the bound on the noise is such that , the expectation of over *S*_{κ} is then given by
3.9where , denotes ‘taking the component along’ a vector ** v**, that is, , and

*E*

_{1}is bounded by 3.10Furthermore, the variance is bounded by 3.11A detailed derivation, and an explicit expression for the constant

*c*, is given in the electronic supplementary material; if

*J*becomes large, we have . The quantity or its absolute value occur in several of these estimates. Estimates in the electronic supplementary material show that, although ∥

**∥**

*V*^{2}itself is expected to be of order , the expectation of is bounded above independently of

*J*. This does not explain our observation that ConceFT estimates seem to be more concentrated that those obtained by SST or MTSST; we will return to this in further work.

The detailed estimates given in section ESM-3 in the electronic supplementary material are derived under the restrictive conditions listed at the start of this section for the signals and the wavelets used. However, as noted above, these conditions can be relaxed significantly (at the price of more intricate estimates). In practice, we observe similar behaviour in our numerical examples even for more complex situations; in particular, the method can handle noise models that are much more challenging, as illustrated in the next section as well as by figure S.1.

## 4. Numerical experiments

In this section, we demonstrate the results of the CWT-based ConceFT algorithm on examples; we also discuss different choices for some of the different parameters involved. The results of the STFT-based ConceFT algorithm are shown in the electronic supplementary material. The ConceFT Matlab code and the codes leading to the figures in this paper can be found at https://sites.google.com/site/hautiengwu/home/download.

The first choice to be made, when applying CWT- or STFT-based ConceFT, concerns the family of orthonormal reference functions (wavelets or window functions) for the underlying wavelet or windowed Fourier transform. In both cases, we pick a family of eigenfunctions for a time–frequency localized operator designed for the CWT or STFT framework; as shown in [70,71] these can provide ‘optimal’ localization within a restricted region of time–frequency space, where the size of the region depends solely on the number of functions used. More precisely, we use orthonormal Hermite functions for the STFT case [64,70] (see also figure S.1), and Morse wavelets for the CWT case [71,72]. In both cases, the shape of the localization domain in the TF plane is not completely fixed, but can be adjusted by varying some parameters; for details, see the electronic supplementary material. Once the family of orthonormal reference functions is fixed, we need to decide how many *ψ*_{j}, *j*=1,…,*J*, we pick; this corresponds to choosing the size of the corresponding domain of concentration in the TF plane. Flexibility in the choices of shape and size of the TF localization domain makes it possible to adapt ConceFT, to some extent, to the family of signals under consideration. Finally, ConceFT also depends on the number *N* of random projections chosen (see §§2 and 3). In principle, the larger *N*, the closer the results are to the expected value of the random process, and the more we expect accidental correlations between reference function and the noise to cancel out in regions of the TF plane where the signal does not reside; in practice, increasing *N* beyond a certain value does not appreciably improve the results. In what follows, we explore these different choices for the CWT case, on a simple family of challenging examples, with noise of different types (white Gaussian, Poisson and autoregressive moving-average (ARMA)), and of different strengths.

For our test data, we restrict ourselves to simulated signals only, so as to be able to quantify the deviation from the ‘ground truth’, usually not available in real-life applications. (ConceFT results on concrete signals will appear elsewhere [73].) On the other hand, we want to avoid parametric models, so as to be sufficiently general. Accordingly, we generate a class of non-stationary data via a random process described in §4a; each realization provides us not only with a (simulated) clean signal, but also with the exact ‘ground truth’ for the time-dependent instantaneous frequency and amplitude of the components of that signal. The same subsection also describes in detail three different noise models (white Gaussian, Poisson and ARMA(1,1)) for which the approach is tested. After applying ConceFT to signals in , we want to compare the ConceFT results with the optimal, ground-truth TF representation; to quantify their (dis)similarity, we use an optimal transport (OT) distance, as described in §4b. In §4c, we discuss how choices of the parameters and of the number of orthogonal Morse wavelets impact the ConceFT results, for this family of examples; §4d illustrates the effect of the number *N* of random projections. Finally, in §4e, we explore the effect on CWT-based ConceFT of different noise levels, for each of the three noise types we consider; §4f briefly discusses the STFT case.

### (a) Data simulation

To generate a typical multicomponent signal, we use smoothened Brownian path realizations to model the non-constant amplitudes and the instantaneous frequencies of the components; more precisely, if *W* is the standard Brownian motion defined on , then we define the *smoothened Brownian motion with bandwidth* *B*>0 as *Φ*_{B}:=*W*★*K*_{B}, where *K*_{B} is the Gaussian function with standard deviation *B*>0 and ★ denotes the convolution operator. Given *T*>0 and parameters *ζ*_{1},…,*ζ*_{6}>0, we then define the following family of random processes on [0,*T*]:
4.1For the amplitude *A*(*t*) of each IMT, we set *ζ*_{2}=*ζ*_{5}=0; every realization then varies smoothly between *ζ*_{1} and *ζ*_{1}+*ζ*_{3}. In the examples shown below and in the electronic supplementary material, the signal consists of two components (i.e. *L*=2) on [0,60]; their two amplitudes are independent realizations of *Ψ*_{[2,0,1,200,0,0}(*t*)]. To simulate a phase function, we set *ζ*_{1}=*ζ*_{3}=0; *Ψ*_{[0,ζ2,0,0,ζ5,ζ6]}(*t*) is then, appropriately, a monotonically increasing process. In the examples we consider, we take for *φ*_{1}(*t*) a realization of *Ψ*_{[0,10,0,0,6,400]}(*t*) for *t*∈[0,60], and for *φ*_{2}(*t*) a realization of *Ψ*_{[0,2π,0,0,2,300]}(*t*). We constrain each component to ‘live’ on only part of the interval, by setting
where *χ*_{[τ1,τ2]} is the indicator function of [*τ*_{1},*τ*_{2}]; that is, *χ*_{[τ1,τ2]}(*t*)=1 if *τ*_{1}≤*t*≤*τ*_{2}, *χ*_{[τ1,τ2]}(*t*)=0 otherwise. We shall denote the resulting class of two-component signals by . In our examples, signals in are sampled uniformly at rate 160 Hz, corresponding to 9600 samples. Figure 1 plots *s*(*t*) for one example , as well as the instantaneous frequencies (IFs) of its two components, all restricted to the subinterval [15,40]⊂[0,60].

Note that the signal *s* should not be viewed as a random process itself—we use the random processes *Ψ*_{[ζ1,…,ζ6]} as a means to generate signals consisting of several components for which the amplitudes and instantaneous frequencies are not easily expressed analytically, but we will not consider or compute expectations with respect to these processes—once is generated, we consider it fixed when we apply ConceFT to it. (In further subsections, we shall encounter other elements of .)

To study the performance of ConceFT in the presence of noise, we add noise to *s*(*t*), setting *Y* (*t*_{k})=*s*(*t*_{k})+*σξ*(*t*_{k}), where *t*_{k} is the *k*th sampling time and *ξ* is a stationary random process. We shall consider three different noise models; in each case, we set the value of *σ* so that the SNR, defined as , equals 0 dB. The three noise models we consider are Gaussian white noise, an ARMA noise and Poisson noise. For the ARMA case, we consider an ARMA(1,1) model determined by autoregression polynomial *a*(*z*)=0.5*z*+1 and moving-averaging polynomial *b*(*z*)=−0.5*z*+1; for the innovation process we use i.i.d. Student *t*_{4} random variables. (Note that this ARMA(1,1) noise is not white, because of the time dependence; in addition, the Student *t*_{4} random variable has a ‘fat-tailed’ distribution, resulting in possibly spiky realizations.) For the Poisson noise, we pick the *ξ*(*t*_{k}) to be independent and identically sampled from the Poisson distribution with parameter *λ*=1. Figure 2 plots a realization of *Y* (*t*)=*s*(*t*)+*σξ*(*t*) for each of these three noise processes, restricted to the subinterval [15,40].

### (b) Performance evaluation

To evaluate the performance of ConceFT, we propose comparing the time-varying power spectrum (tvPS, defined at the end of §2) of the results of the ConceFT analysis of *Y* with the ideal time-varying power spectrum (itvPS) of our simulated signal *s*, which can easily be defined explicitly (because our construction was designed accordingly) as follows [68]:
4.2In order to quantify the (dis)similarity between the ConceFT-estimated tvPS and the itvPS P_{s}, we use the OT distance (also called the earth mover distance) between pairs of probability distributions, *μ* and *ν*, on . Denote (analogously for *f*_{ν}). Then the OT distance between *μ* and *ν* is defined as
4.3For a more general definition and discussion of the OT distance, see the electronic supplementary material. The principle of ConceFT is to ‘reassign’ content in the TF plane, keeping the time variable fixed (see §2); we therefore choose to keep *t* fixed for our OT distance. Because the positive functions and P_{s}(*t*,⋅) might not have integral 1 for all *t*, we normalize them before computing their OT distance. After normalization, we interpret, at each time *t*, and P_{s}(*t*,*ω*) as (probability) distributions in *ω* and compute the OT distance between them, which essentially measures how much one distribution needs to be ‘deformed’ in order to coincide with the other; this is repeated for all *t*, and the average of the *t*-dependent individual OT distances then indicates the quality of the estimator for P_{s}. Figure 3 displays four examples in which two delta measures localized on curves in the TF plane (similar to the itvPS defined above) lie at similar OT distances of each other—although in each example the distance indicates a different type of ‘distortion’. Together, these examples give an intuitive understanding of the way in which OT distances capture the difference between the TF distributions of interest to us here.

### (c) Parameter selection

As described in [72], generalizing the construction in [71], orthonormal families of Morse wavelets can be defined for different values of two parameters, *β* and *γ*; different choices correspond to different shapes of the domain in the TF plane on which they are mostly localized (see the electronic supplementary material). Once the values of *β* and *γ* are chosen, determining the family of *ψ*_{j}, one also needs to select *J*, the total number of orthonormal reference wavelets used in the ConceFT method. For signals in (see §4*a*), we explored systematically a range of (*β*,*γ*) pairs, as well as different values of *J*, to find the choice that, under different types of noise, with SNR of 0 dB, gave rise to the smallest OT-based distance (as described above) between the itvPS and the ConceFT-estimated tvPS. Surprisingly, the optimal choice depended very little on the type of noise; the optimal values we found are *β*=30, *γ*=9 and *J*=2. (Detailed results are given in the electronic supplementary material.)

### (d) Effect of the number of random projections

The ConceFT algorithm averages the SST results computed with *N* randomly picked reference wavelets (or windows, for STFT) from the linear span of the *ψ*_{j}, *j*=1,…,*J*. It is expected that the concentration in the TF plane observed with ConceFT kicks in only when *N* is sufficiently large; on the other hand, the larger *N*, the more expensive the computation. To explore the trade-off, we applied ConceFT to the three noisy versions of the signal (see §4a), with *N* ranging from 1 to 200. In all cases, the ConceFT algorithm uses the optimal parameters as described in §4c, i.e. it uses the first two Morse wavelets with parameters *β*=30, *γ*=9. In this simulation, each ConceFT computation was repeated 300 times and the mean and standard deviation of the OT distances of the ConceFT tvPS to the itvPS were computed. Figure 4 plots the results. For each of the three noise types, the graph of the average OT distance shows an ‘elbow’ shape, i.e. a regime in which the decrease is faster, as *N* increases, followed by one in which the decrease is less marked. The elbow is located around *N*=20; the standard deviation is also quite small for this *N*. We accordingly decided to set *N*=20 in our further experiments.

### (e) ConceFT results for noisy signals

We now show the result of using ConceFT with the calibrated parameter choices. We illustrate the performance of ConceFT on signals of the simulation class (see §4a), for a range of SNR, as well as on deterministic signals.

As a warm-up, we start with the signal *s* seen before. The top row of figure 5 plots the tvPS of the three noisy versions of *s* next to the itvPS P_{s}. To compress the dynamical range of the tvPS plots, we carry out the following procedure. We first normalize the discretized version of (where *m* and *n* stand for the number of discrete frequencies and the number of time samples, respectively) by multiplying it by a constant so that the total weight of all entries equals the same number for all cases—i.e. for some *θ*>0 (to be picked—see below), . We then plot a grey-scale visualization of rather than the (normalized) itself, where , *k*=1,…,*m*, *l*=1,…,*n*, and *q* is a (very high) cut-off to downplay the effect of far-off outliers. We choose *q* to be the same for all three tvPS, so that comparable grey levels on the different tvPS panels indicate comparable values of ** R** (see §4f in the electronic supplementary material, for a more extensive discussion of choosing

*q*and grey-scale plotting of tvPS). For the figures, we choose

*θ*=5 and

*q*=5.718; this value for

*q*is the minimum of the 99.8% quantiles of the different tvPSs. The second row of figure 5 gives the results for

*s**, a signal of the simulation class that was not used (in contrast to

*s*) to calibrate parameters of ConceFT. The results are similarly highly accurate.

Next, we study the effect on the ConceFT performance of the noise level, as quantified by SNR. To this end, we revisit the analysis of the signal *s** (and *s* in the electronic supplementary material). For each signal, each type of noise (Gaussian, ARMA(1,1) or Poisson) and each SNR (SNR=*x* dB, where *x*∈{−7,−6,…,6,7}), we considered 20 independent realizations of the noise process; for each of the resulting noisy signals we carried out the ConceFT analysis and computed the OT distance of the tvPS to the itvPS of the clean signal; we then computed the mean and the standard deviation for each. The results are shown in figure 6. The same figure also compares the ConceFT results with those of simple SST (using either the first Morse wavelet with parameters *β*=30, *γ*=9 as reference wavelet, or one random linear combination of the two first Morse wavelets) and of multitaper SST (denoted as orgMT), using the same *ψ*_{j} as ConceFT. For each of these alternative methods, we likewise computed the mean OT distance of the tvPS to the itvPS for 20 noise realizations. It is striking that the ConceFT method outperforms the other methods in all cases.

Finally, to address possible concerns that the randomness in the generation and plots of *φ*′(*t*) and *A*(*t*) somehow ‘help’ ConceFT in these estimations, we show in figure 7 the results for yet another signal, which (in contrast to *s* and *s**) is completely deterministic; it consists of three components, each given by an explicit, analytic formula (again for *t*∈[0,60]):
Figure 7 shows that the results are of a quality similar to those in figure 5.

### (f) ConceFT with short-time Fourier transform

As described earlier, the ConceFT approach can be carried out for STFT-based SST as well as for CWT-based SST. Figure S.1 already showed the results of STFT ConceFT on one example. Other examples are shown in the electronic supplementary material, together with values of the OT distance of the STFT ConceFT estimated tvPS to the itvPS, and other discussions.

## 5. Conclusion

We consider signals that are the linear combination of a small number of ‘intrinsic-mode functions’, each of which can be reasonably viewed as an oscillatory function with well-defined but time-varying amplitude and ‘instantaneous frequency’. We have introduced a new approach, called ConceFT, to determine the time–frequency representation of such signals, combining multi-taper estimation ideas and averaging over random projections with synchrosqueezing. Numerical results show that this leads to improved estimation of the time-varying characteristics of the signals of interest, even when the signals are corrupted by significant and challenging noise.

We also introduced two tools to evaluate the effectiveness of this method (or other similar methods), which may be of interest in their own right to others working in the TF field. On the one hand, we introduced a class of explicit, easy-to-construct signals with explicit time-varying characteristics, even though the signals themselves are not given by explicit formulae; the explicit time-varying amplitude and instantaneous frequency give a ‘ground truth’ with which estimations can be compared. On the other hand, we introduced a distance between time–frequency representations that can be useful in comparing results obtained by different methods, by computing for each the distance to the ‘ground-truth’ time–frequency representation.

## Authors' contributions

I.D., Y.W. and H.-T.W. contributed equally to the theoretical and numerical analysis of the whole paper. All authors gave final approval for publication.

## Competing interests

The authors declare that they have no competing interests.

## Funding

H.-T.W.’s work is partially supported by Sloan Research Fellow FR-2015-65363.

## Acknowledgements

Part of this work was done during H.-T.W.’s visit to the National Center for Theoretical Sciences, Taiwan, and he would like to thank NCTS for its hospitality. The authors thank the referees for pushing us to make the paper clearer. The authors would like to thank Dr. Tingran Guo for the optimal transport code.

## Footnotes

One contribution of 13 to a theme issue ‘Adaptive data analysis: theory and applications’.

- Accepted December 22, 2015.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.