## Abstract

In this paper, we propose a time-frequency analysis method to obtain instantaneous frequencies and the corresponding decomposition by solving an optimization problem. In this optimization problem, the basis that is used to decompose the signal is not known *a priori*. Instead, it is adapted to the signal and is determined as part of the optimization problem. In this sense, this optimization problem can be seen as a dictionary adaptation problem, in which the dictionary is adaptive to one signal rather than a training set in dictionary learning. This dictionary adaptation problem is solved by using the augmented Lagrangian multiplier (ALM) method iteratively. We further accelerate the ALM method in each iteration by using the fast wavelet transform. We apply our method to decompose several signals, including signals with poor scale separation, signals with outliers and polluted by noise and a real signal. The results show that this method can give accurate recovery of both the instantaneous frequencies and the intrinsic mode functions.

## 1. Introduction

Nowadays we must process a massive amount of data in our daily life and scientific research. Data analysis methods have played an important role in processing and analysing these data. Most data analysis methods use a predetermined basis, including the most commonly used Fourier transform and wavelet transform. While these data analysis methods are very efficient in processing data, each component of the decomposition in general does not reveal the intrinsic physical information of these data due to the presence of harmonics in the decomposition. For example, application of these traditional data analysis methods to a modulated oscillatory chirp signal would produce many components. Thus, it is essential to develop a truly adaptive data analysis method that can extract hidden physical information such as trend and time varying cycles from these data and preserve the integrity of the physically meaningful components. To achieve this, we need to use a data-driven basis that is adapted to the signal instead of being determined *a priori*.

The empirical mode decomposition (EMD) method, which was proposed by Prof. Norden E. Huang in 1998 [1,2], provides an efficient adaptive method to analyse nonlinear and non-stationary data. In the EMD method, the signal is decomposed to several intrinsic mode functions (IMFs) by the so-called sifting process. The EMD method is empirical in nature. Several methods with more clear mathematical structures have been proposed as a variant of the EMD method (for example the synchrosqueezed wavelet transform [3], empirical wavelet transform [4] and variational mode decomposition [5]). Inspired by the EMD method and the recently developed compressed (compressive) sensing theory, we have proposed a data-driven time-frequency analysis method [6–8]. In this method, we formulate the problem as a nonlinear optimization problem and decompose the signal by looking for the sparsest representation of a multiscale signal over a dictionary consisting of all IMFs.

In our data-driven time-frequency analysis, we decompose a given signal *f*(*t*) into the following form:
1.1where *a*_{j}(*t*), *θ*_{j}(*t*) are smooth functions, *θ*_{j}′(*t*)>0, *j*=1,…,*M*, and *M* is an integer that is unknown *a priori*. We assume that *a*_{j}(*t*) and *θ*_{j}′(*t*) are less oscillatory than . We call the intrinsic mode functions (IMFs) and *θ*′_{j}(*t*), *j*=1,…,*M* the instantaneous frequencies [1]. The objective of our data-driven time-frequency analysis is to extract the IMFs and their instantaneous frequencies.

In [7], we proposed to decompose the signal by solving the following optimization problem:
1.2where is the dictionary consisting of all IMFs (see [7] for its precise definition). To simplify the notation, when no ambiguity arises, we drop the argument *t* from the functions *a*_{k}(*t*) and *θ*_{k}(*t*) and in the rest of this paper, we will use this notation to make the formula more concise. Further, an efficient algorithm based on matching pursuit and fast Fourier transform has been proposed to solve the above nonlinear optimization problem. In a subsequent paper [9], we proved the convergence of the algorithm in [7] for periodic data that satisfy a certain scale separation property.

In this paper, we will introduce another formulation to get the sparsest time-frequency decomposition based on dictionary adaptation. In this formulation, we assume that we have already known the number of IMFs, *M*. Then, we can obtain the desirable decomposition by solving the following optimization problem:
1.3where
1.4and *Φ*_{θj}, *j*=1,…,*M* is the basis to decompose the signal *f*. The specific form of the basis as a function of *θ*_{j} will be given in (2.3). And we use bold font to denote the vectors (lowercase) or matrices (uppercase).

In (1.3), the basis *Φ*_{θ1,…,θM} is not known *a priori*. It is determined by the phase functions *θ*_{j}, *j*=1,…,*M* and the phase functions are adaptive to the data. We need to solve not only the optimal coefficients **x** but also the optimal basis *Φ*_{θ1,…,θM}. In this sense, the problem described above seems to be a dictionary learning problem. Many efficient dictionary learning methods have been proposed [10–14]. But the problem we study here has some important differences from the dictionary learning problem. In a dictionary learning problem, a common set-up starts with a training set, a collection of training vectors. The atoms of the dictionary can be any function. For the problem we consider here, we only have one signal not a training set. Moreover, the atoms of the dictionary in our problem are given in the form of . If we do not put any constraint on the atoms of the dictionary, we may get only trivial decompositions. In order to get a reasonable result, the atoms of the dictionary are restricted to be nonlinear functionals of the phase function *θ*_{j}. At the same time, the phase functions are confined to a low dimensional space to make sure that the overall degree of freedom is not too large. As we will demonstrate later, we can still get a reasonable decomposition with only one measurement of the signal. To distinguish our problem from the dictionary learning problem, we call it dictionary adaptation.

We need to develop a new method to solve this non-convex optimization problem. The key part is to find the phase functions. Once the phase functions are known, we need to solve a *l*_{1} optimization problem to get the decomposition. Based on this observation, we develop an iterative algorithm to solve (1.3). This algorithm starts from one initial guess of the phase functions. In each step, the phase functions are fixed and one *l*_{1} optimization problem is solved. The phase functions will be updated in the next step. This iteration is repeated until the changing of the phase function between two consecutive iterations is smaller than a given tolerance. In each step, the augmented Lagrangian multiplier (ALM) method is used to solve the *l*_{1} optimization problem. We further accelerate the ALM method in each iteration by using the fast wavelet transform. This method can be also generalized to decompose signals that contain outliers by enlarging the dictionary to include impulses.

We will demonstrate the effectiveness of our method by applying it to decompose several multiscale data, including synthetic data and real data. For the data that we consider in this paper, we will demonstrate that we can recover both the instantaneous frequencies and the IMFs very accurately. Even for those signals that are polluted by noise, we still can approximate the instantaneous frequencies and the IMFs with reasonable accuracy comparable to the noise level.

The remaining of the paper is organized as follows. In §2, we give the formulation of our problem. In §3, an iterative algorithm is introduced to solve the nonlinear optimization problem (1.3). An accelerating procedure is introduced based on the fast wavelet transform in §4. In §5, we generalize this method to deal with signals that have outliers. In §6, several numerical results are presented to demonstrate the effectiveness of our method. Finally, some concluding remarks are made in §7.

## 2. Formulation based on dictionary adaptation

In this section, we will set up the framework of the data-driven time-frequency decomposition. First, we construct a dictionary to represent the signal which satisfies the model (1.1).

In this paper, we construct the dictionary following an idea similar to that in [7]. However, we will use the wavelet basis instead of the overcomplete Fourier basis used in [7]. We choose the wavelet basis because there are fast decomposition and reconstruction algorithms. This feature makes our algorithm very efficient. Another advantage is that the wavelet basis can handle non-periodic functions much better.

Let be a multi-resolution approximation of and *φ* the associated scaling function, *ψ* the corresponding wavelet function. Assume that *φ* is real and has compact support, , where is the Fourier transform of *φ* defined below,

For each 1≤*j*≤*M*, we assume that *θ*_{j}′>0. Since *θ*_{j} is a strictly monotonically increasing function, there is a one-to-one mapping between the physical time *t* and *θ*_{j}(*t*). Thus, we can use *θ*_{j} as a new coordinate and represent the signal as a function of *θ*_{j}. Then we can define the following wavelet basis in the *θ*_{j}-coordinate:
2.1where *l*_{0} is a positive integer associated with the lowest frequency of the envelope. In practice, *l*_{0} is usually determined by the time range of the signal. And
2.2In this paper, we use the Meyer wavelet to construct the basis.

Then the corresponding IMF, , can be represented using the following basis: 2.3

This, in turn, implies that the whole signal *f*(*t*) given in ((1.1)) lies in the space spanned by the following basis:
2.4We assume that the envelope is sufficiently smooth, . Thus, *a*_{j} has a sparse representation over the wavelet basis . Consequently, the signal *f*(*t*) would have a sparse representation over . Based on this observation, we propose to decompose the signal *f*(*t*) in ((1.1)) by looking for the sparsest representation over :
2.5where *c*_{j}, *j*=1,2,… are the coefficients of the signal *f*(*t*) in the dictionary .

In this problem, not only the coefficients are unknown, but the basis over which the signal has a sparse representation is also unknown *a priori*. The basis is determined by the phase functions *θ*_{1},…,*θ*_{M}. Both the coefficients *c*_{j} and the phase functions need to be determined in the optimization process.

The above formulation is presented in the continuous version. Next, we will introduce a discrete version. Suppose the signal *f*(*t*) is sampled over *N* discrete points, *t*_{j}=*jh*, *j*=1,…,*N*, *h*=*T*/*N* and [0,*T*] is the sample interval. Here, we assume that the sample points are dense enough such that the signal can be interpolated to any other grids with high accuracy. And we use the bold font to denote the corresponding discrete samples of the continuous function, for instance, **f**=[*f*(*t*_{1}),…,*f*(*t*_{N})]. Then the discrete version of (2.5) is given as
2.6where *Φ*_{θ1,…,θM}=[*Φ*_{θ1},…,*Φ*_{θM}] and *Φ*_{θj}, *j*=1,…,*M* are matrices corresponding to the dictionaries , *j*=1,…,*M*, i.e. each column of *Φ*_{θj} is discrete samples of basis function in . Here, we denote **x**=[**x**_{1},…,**x**_{M}]^{T} and **x**_{j}, *j*=1,…,*M* are the corresponding coefficients of over *Φ*_{θj}. Moreover, we define
2.7To simplify the notation, we introduce an operator ★, which is defined as follows: for any ,
2.8

## 3. An algorithm based on the Augmented Lagrangian Multiplier method

Inspired by the algorithm in [7], we propose an iterative algorithm, algorithm 1, to solve the nonlinear *l*_{1} optimization problem (2.6). Algorithm 1 is essentially based on the Gauss–Newton iteration which is derived by performing a local linearization around the phase functions in the current step. Suppose *θ**(*t*) is the exact phase function and *θ*^{n}(*t*) is a good approximation. By keeping only the linear terms in the Taylor expansion, we obtain the following linearization around *θ*^{n}(*t*),
where Δ*θ*^{n}(*t*)=*θ**(*t*)−*θ*^{n}(*t*). Both the matrix in (3.1) and **b**_{j} in (3.2) come from this linearization procedure. The coefficients, **b**_{j}, are just auxiliary functions in the iteration and will tend to zero as the iteration converges.

In some sense, algorithm 1 is similar in spirit to the K-SVD algorithm [10]. The step to solve the *l*_{1} optimization problem is similar to the sparse coding stage in the K-SVD method and the step to update the phase functions is similar to the codebook update stage. The main difference is in the codebook update stage. In our problem, the atoms of the dictionary are not arbitrary functions. They are parametrized by the phase functions. For this reason, we cannot use singular value decomposition (SVD) to update the dictionary directly. Instead, we use the Gauss–Newton algorithm to update the phase functions, which in turn updates the dictionary accordingly.

In the step to update the phase functions, we choose to update the instantaneous frequency (derivative of the phase function) instead of updating the phase function directly. This is because could be discontinuous if we do not add or subtract 2*π* at some particular points. Here, *β*_{j} at step 5 is used to make sure that the instantaneous frequency being positive during the iteration to make sure that the phase function remains monotonically increasing. In other words, we update *θ*_{j} along the direction of Δ*θ*_{j} under the constraint that *θ*_{j}′>0.

The optimization problem we want to solve is non-convex. There are many local minima. If the initial guess is not good enough, the iteration may converge to a local minimum instead of the global minimum. However, we cannot guarantee to get a good initial guess unless we have obtained some *a priori* information about the signal. To abate the dependence on the initial guess, we introduce a projection operator and a parameter *η*. The value of *η* is gradually decreased during the iteration. When *η* is large, Δ** θ**′ is confined to a small space. In this small space, the objective functional has fewer local minima. The iteration may find a good approximation for Δ

**′. By gradually decreasing**

*θ**η*, we enlarge the space for Δ

**′ which allows for the small-scale structures of Δ**

*θ***′ to develop gradually during the iterations. This procedure mimics a continuation method and works very well in practice.**

*θ*In the numerical examples presented in §6, we obtain the initial guess using a simple approach based on the Fourier transform. More precisely, we obtain the initial guess by estimating the frequencies by which the Fourier coefficients have the largest energy.

In algorithm 1, the most expensive part is to solve the *l*_{1} optimization problem (3.1) at step 3. In this paper, we use an ALM algorithm [15] to solve (3.1). In order to simplify the notations, we denote
3.3where

The ALM method operates on the augmented Lagrangian
3.4A generic Lagrange multiplier algorithm [15] would solve (3.1) by repeatedly setting **p**^{k+1}=argmin_{p} *L*(**p**,**q**^{k}), and then updating the Lagrange multiplier via .

In this iteration, solving is also very time consuming. Note that the matrix *Θ*_{θ1,…,θM}=[*Θ*_{θ1},…,*Θ*_{θM}] is the combination of *M* matrices with smaller size. It is natural to use the following sweeping algorithm, algorithm 2, to solve iteratively.

Theoretically, we need to run the above sweeping process several times until the solution converges, but in practical computations, in order to save the computational cost, we only run the sweeping process once. Combining this idea with the augmented Lagrange multiplier method, we obtain algorithm 3 to solve the *l*_{1} optimization problem (3.1):

## 4. A fast algorithm based on the discrete wavelet transform

In this section, we propose an approximate solver to accelerate the computation of , which is the most expensive step in algorithm 1.

In this paper, we only consider the signal which is well resolved by the samples and the samples are uniformly distributed in time and the total number of the samples is *N*. Based on these assumptions, we can approximate the continuous integral by the discrete summation and the integration error is negligible. This greatly simplifies our calculations.

Now, we turn to simplify the optimization problem. First, we replace the standard *L*^{2} norm by a weighted *L*^{2} norm which gives the following approximation:
4.1where , *θ*_{j}′ is the derivative of *θ*_{j} which is computed by a central difference scheme, **g**(*t*_{i}) is the *i*th element of vector **g**.

Using the fact that , it is easy to check that the columns of the matrix *Θ*_{θ} are orthonormal under the weighted discrete inner product
4.2Using this property, it is easy to derive the following equality:
4.3We can also define the projection operator *P*_{V (θ)} to the *V* (** θ**) space. Here,

*V*(

**) is the linear space spanned by the columns of the matrix**

*θ**Θ*

_{θ}and

*P*

_{V (θ)}is the projection operator to

*V*(

**). Since the columns of the matrix**

*θ**Θ*

_{θ}are orthonormal under the weighted inner product (4.2), the projection

*P*

_{V (θ)}can be calculated as follows: 4.4Now, we are ready to show that the optimization problem (4.1) can be solved explicitly by the shrinkage operator. To simplify the notation, we denote , 4.5where is the shrinkage operator defined below: 4.6

Note that the matrix vector product in (4.5) has the following structure by the definition of *Θ*_{θj} in (3.1):
This is nothing but the wavelet transform of and in the *θ*_{j}-coordinate, since the columns of *Π*_{θj} are standard wavelet basis in the *θ*_{j}-coordinate. Then this product can be computed efficiently by interpolating and to the uniform grid in the *θ*_{j} coordinate and employing the fast wavelet transform.

Summarizing the above discussion, we obtain algorithm 4 based on the fast wavelet transform to solve the optimization problem (3.1).

### Remark 4.1

The final algorithm described above is based on employing the soft shrinkage operator iteratively. Essentially, it is a proximal splitting algorithm which shares many common ideas with some existing *l*_{1} optimization methods, including the split Bregman method [16], proximity algorithms [17] and proximal splitting methods [18]. Inspired by these methods, we adopt the splitting technique to decompose the original problem to some easier subproblems and solve the original problem by solving these subproblems iteratively. In each step, we need to compute a proximity operator. The main novelty of our algorithm is in the computation of the proximity operator. Here, we use the special structure of the matrix *Θ*_{θj}, i.e. the orthonormality of the columns in the *θ*_{j}-coordinate such that the proximity operator is just a simple soft shrinkage operator. This observation gives rise to a much more efficient algorithm which is designed specifically for our optimization problem. For our particular application, our method is much more efficient than the current *l*_{1} optimization algorithms.

## 5. Generalization for signals with outliers

One advantage of the formulation (2.6) is that it can be generalized to deal with more complicated data with some minor modifications. In this section, we will give one generalization for signals with outliers. For outliers, we mean that at some sample points, the error of the measurement is large. In this paper, we assume that the number of outliers is small and distributed over the whole interval at random.

In order to deal with this kind of signal, we have to enlarge the dictionary since the outliers are not sparse over the time-frequency dictionary. Under the assumption that the number of outliers is small, we know that the outliers are sparse over the basis consisting of the impulses *δ*[*n*−*i*], *i*=1,…,*N*, where *N* is the number of samples and
5.1If we enlarge the dictionary to include all the impulses, then the generalized formulation can be used to decompose signals with outliers.

More specifically, in this case, the optimization problem is formulated in the following way:
5.2where *Φ*_{θ1,…,θM} is given in (2.4).

Using a linearization procedure similar to that of algorithm 1, we obtain algorithm 5 for the above optimization problem (5.2). Moreover, the *l*_{1} optimization problem in the above iterative algorithm can be solved by a sweeping ALM method, algorithm 6. The computation of can be accelerated by algorithm 4 in the previous section.

## 6. Numerical results

In this section, we present several numerical results to demonstrate the effectiveness of our time-frequency analysis methods. In our previous paper [7], we have performed extensive numerical experiments to demonstrate the effectiveness of our data-driven time-frequency analysis method for signals with good scale separations. To save space, we will not consider the examples with good scale separation in this paper and consider more challenging signals that do not have good scale separation property.

### Example 6.1

In the first example, the signal contains two different components whose instantaneous frequencies intersect with each other. More specifically, the signal is generated by the formula below:
6.1where the phase function *θ*_{1},*θ*_{2} are given as follows:
and *X*(*t*) is white noise with zero mean and variance *σ*^{2}=1. The signal is sampled over 1024 grid points which are uniformly distributed over the interval [0,1]. The original signal is shown in figure 1.

For this signal, the classical time-frequency analysis methods, such as the windowed Fourier transform, the wavelet transform give a poor result near the intersection where the two instantaneous frequencies cross each other. The EMD method and the data-driven time-frequency analysis method introduced in our previous paper [7] also have problem near the intersection.

The result given by the data-driven time-frequency analysis method proposed in this paper is shown in figure 1. In the computation, the initial guesses of the instantaneous frequencies are chosen to be 128*πt* and 32*πt*, respectively, which are far from the ground truth. As we can see, even with these rough initial guesses, our iterative algorithm still can recover the instantaneous frequencies and the corresponding IMFs with reasonable accuracy, although the signal is polluted by noise. We note that the end-effect is more pronounced in this case due to the noise pollution.

### Example 6.2

The next signal that we consider is polluted by outliers. We generate the signal by using the following formula:
6.2where *θ*_{1}(*t*) and *θ*_{2}(*t*) are the same as in example 6.1. The signal is sampled over 1024 uniform grid points. Among these samples, there are 32 samples that are outliers. The locations of these outliers are selected randomly and the strengths satisfy the normal distribution. In figure 2, we present the results for the case without noise (*σ*(*t*)=0) by using the algorithm given in §5. As we can see, both the IMFs and the outliers are captured very accurately.

We also test the signal with outliers and noise. The results are shown in figure 3. In this example, the noise and the outliers are added to the original signal together. The signal is sampled over 1024 uniform grid points. Among these samples, there are 32 samples that are outliers. The locations of these outliers are selected randomly and the strengths satisfy the normal distribution whose standard deviation is 1. The noise is Gaussian noise and the standard deviation is 0.1.

In this case, the instantaneous frequencies and the IMFs are still quite accurate. But the outliers are not captured as well as in the case with no noise. We would like to emphasize that it would be hard to distinguish the outliers from the noise when the amplitude of the outliers is small. For the outliers whose amplitude is large, they can be separated from the noise by a proper shrinkage operator. However, the shrinkage operator also kills the outliers whose amplitude is comparable to or smaller than the noise level.

### Example 6.3

In our third example, we consider a real signal, a bat chirp signal. It is the digitized echolocation pulse emitted by the Large Brown Bat, Eptesicus Fuscus.^{1} The signal includes 400 samples and the sampling period is 7 μs, so the total time span is 2.8 ms.

The signal is shown in figure 4*a*. The IMFs and instantaneous frequencies obtained by our method are given in figure 4*b*,*c*. Our method could give precise instantaneous frequencies and also the sparse decomposition of the original signal. From figure 4, we can see that the signal is approximated very well by only three IMFs. Near the boundaries, the original signal and the IMFs are all very small. In these regions, the frequencies actually do not have any physical meaning. They are only auxiliary variables in the algorithm. In the region in which the amplitude of the signal is order one, the recovered instantaneous frequencies reveal some interesting patterns that have not been seen before using traditional time-frequency methods. The physical significance of these patterns need to be further investigated in the future.

### Example 6.4

In the last example, we consider the data from an ODE system. We consider a multiple degree of freedom system.
6.3where *K* is an *N*×*N* symmetric positive definite stiffness matrix and **u** is an *N*×1 vector, which typically represents the displacement of certain engineering structure. This kind of ODE system is widely used to model the movements of structures [19], such as buildings, bridges, etc. In many applications, we want to recover the stiffness matrix *K* (at least part of it) from incomplete measurement of the solution **u**.

Here, we assume that *K*(*t*) is slowly varying. Under this assumption, the solutions, *u*_{j}(*t*),*j*=1,…,*N*, have the following approximate expression:
6.4and (*θ*_{k}′(*t*))^{2},*k*=1,…,*N* are eigenvalues of *K*(*t*). Using this formulation, we can see that the instantaneous frequencies could help us to retrieve the stiffness matrix.

As a test example, we consider a simple case where the degree of freedom is given by *N*=2. The *K* matrix has the following form:
6.5The initial conditions are
This system models the movement of two objects of equal masses connected by springs. And *k*_{1},*k*_{2},*k*_{3} are stiffness values of springs.

The above system is solved up to *t*=9. The initial guesses of the phase functions are 20*t* and 40*t*. Here, we only analyse the first component of the solution *u*_{1}(*t*), which is shown in the figure 5*a*. According to (6.4), the theoretical frequencies are and . In figure 5*c*, we compare the theoretical frequencies and the numerical results given by our method. As we can see, they match very well even near the intersection point. This toy example shows that our method indeed has the capability to retrieve some information of the physical process hidden within the signal. Now we are trying to apply our method to analyse the signals from real bridges.

## 7. Concluding remarks

In this paper, we have introduced a novel formulation to obtain sparse time-frequency decomposition and the corresponding instantaneous frequencies. We formulated the decomposition as a dictionary adaptation problem. The dictionary is parametrized by phase functions and the phase functions are determined by the signal itself. Based on our previous work and the methods of dictionary learning, we developed an iterative algorithm to determine the phase functions and the corresponding decomposition. By designing the dictionary carefully to make them orthogonal in the coordinate of the phase functions, we can accelerate the algorithm by using the fast wavelet transform. This makes our algorithm very efficient.

Another advantage of this method is that it can be easily generalized to deal with more complicated data that are not sparse over the time-frequency dictionary, such as data with outliers. For this kind of signal, we just need to enlarge the dictionary and follow a similar procedure to look for the sparsest decomposition over this enlarged dictionary. We presented several numerical examples to demonstrate the effectiveness of our method, including data that do not have scale separation and data that are polluted by noise and/or outliers. The results that we obtained seem to suggest that our method can offer an effective way to decompose multiscale data even with a poor scale separation property.

We remark that decomposing several IMFs simultaneously increases the complexity of the optimization problem. As a result, the robustness of this new method is not as good as the previous one that we introduced in [7]. Generally speaking, the more IMFs we try to decompose simultaneously, the less robust the method becomes. In the numerical tests, we found that a naive application of the method is not very stable if *M*>3. This difficulty can be alleviated by knowing some information about the signal. To make our method more robust, we are now considering combining our method with other time-frequency analysis methods, such as synchrosqueezed wavelet transform [3], in our future work.

Another interesting problem that we are considering is to decompose data with intra-wave frequency modulation. This type of data is known to be very challenging. Naive application of traditional data analysis methods tends to introduce artificial harmonics. To deal with this kind of data, we have to extend the definition of the dictionary by replacing the cosine function by an unknown periodic shape function that we need to find as part of the optimization problem. This work will be reported in our future paper.

## Data accessibility

Data in examples 6.1, 6.2 and 6.4 are synthetic data. They can be reproduced following the descriptions in the paper. Data used in example 6.3 can be downloaded at http://www.ece.rice.edu/dsp/software/bat.shtml.

## Authors' contributions

Both authors, T.Y.H. and Z.S., worked together to propose the proper formulation for this problem. Z.S. carried out most of the work in the design and the implementation of the numerical algorithms. He also drafted the manuscript. T.Y.H. revised the manuscript and made sure that we have done a thorough study to demonstrate the accuracy and robustness of the proposed method. Both authors gave final approval for publication.

## Competing interests

We declare we have no competing interests.

## Funding

This research was supported in part by NSF grants nos. DMS-318377 and DMS-1159138, DOE grant no. DE-FG02-06ER25727, and AFOSR MURI grant no. FA9550-09-1-0613. The research of Z.S. was supported by a NSFC grant no. 11201257.

## Footnotes

One contribution of 13 to a theme issue ‘Adaptive data analysis: theory and applications’.

↵1 The authors wish to thank Curtis Condon, Ken White and Al Feng of the Beckman Institute of the University of Illinois for the bat data and for permission to use it in this paper.

- Accepted September 29, 2015.

- © 2016 The Author(s)

Published by the Royal Society. All rights reserved.