## Abstract

This paper presents a unified mathematical derivation of the asymptotic behaviour of the three main forms of partial directed coherence (PDC). Numerical examples are used to contrast PDC, *g*PDC (generalized PDC) and *i*PDC (information PDC) as to meaning and applicability and, more importantly, to show their essential statistical equivalence insofar as connectivity inference is concerned.

## 1. Introduction

*Partial directed coherence* (PDC), a multivariate time-series technique obtained from the factorization of partial coherence [1], has become popular for evaluating the connectivity between neural structures. In the frequency domain, it closely reflects the idea of Granger causality [2], i.e. a time series *x*(*k*) is Granger-caused by *y*(*k*) only if knowledge of *y*(*k*)'s past proves helpful in predicting *x*(*k*), and it is, as such, an important measure given that many neuroscience research scenarios, such as sleep staging [3], have long been linked to typical neuroelectric oscillatory behaviour.

PDC has been finding increasing applications [4–6], most of which have been carried out by comparing sample connectivities between groups classified as presenting some known disorder against normal controls. Only recently have objective trial-by-trial criteria appeared that allow the evaluation of PDC's estimator asymptotic properties. This is the case with Takahashi *et al*. [7], who confirmed a previously available connectivity hypothesis test [8] by the addition of hitherto unavailable PDC confidence intervals.

This scenario has recently been compounded by the introduction of other more general expressions for PDC, which can be more suitable under certain special circumstances. Generalized PDC (*g*PDC) [9], for example, is better suited to cases of large prediction errors or power spectral disparity between the multivariate channels. Its asymptotics have been presented without proof in reference [10].

Another, perhaps more relevant, form of PDC, termed information PDC (*i*PDC) has been introduced in references [11,12] in connection with a precise interpretation of PDC in terms of the mutual information between adequately partialized processes, thereby establishing it as a measure of direct connectivity strength.

This paper extends reference [7] to explicitly obtain the asymptotic behaviour of all PDC forms in a unified way. It starts by briefly reviewing PDC's various formulations (§2) and a statement of the asymptotic results (§3), which are numerically illustrated in §4. A final brief discussion appears in §5, leaving mathematical details to the appendix.

## 2. Background

Defining PDC and its variants calls for an adequately fitted multivariate autoregressive time-series (i.e. vector time-series) model of simultaneously acquired time series **x**(*n*) made up by *x*_{k}(*n*), *k*=1,…,*K*:
2.1where **w**(*n*) is a zero-mean white innovations process with covariance matrix *Σ*_{w}=[*σ*_{ij}] and model order *p*. The *a*_{ij}(*l*) coefficients from each **A**(*l*) matrix describe the lagged effect of the *j*th onto the *i*th series, wherefrom one can also define a frequency domain representation of (2.1) via the matrix whose entries are given by
2.2where .

Denote the columns of by . Then, the general expression summarizing all the forms of PDC from *j* to *i* is
2.3where H denotes Hermitian transposition. Each PDC version uses its own *s* and **S** in (2.3) as summed up in table 1.

### (a) Problem formulation

As in reference [7] for the original PDC, one must reparametrize (2.3) before applying the delta method [13], which consists of an appropriate Taylor expansion of the distribution of (2.3) with respect to *a*_{ij}(*r*) and *Σ*_{w}, which are estimated when fitting (2.1) and whose dispersion depends on *n*_{s}, the number of available observations.

To do so, use the vec matrix-column-stacking operator on the model coefficients
2.4and on the residual noise covariance matrix ** σ**=vec

*Σ*_{w}to produce the parameter vector

*θ*^{T}=[

*α*^{T}

*σ*^{T}]

^{T}.

The delta method [13] rests on the continuous mapping nature of the parameters onto the statistics of interest (PDC) by using the asymptotic distribution parameter behaviour in terms of the number of data points *n*_{s} and the Taylor expansion of the mapping.

### Lemma 2.1

If vector time series **x**(*n*) is *stationary zero-mean Gaussian, then* the full model parameter vector ** θ** obeys [14]
2.5where
2.6with

*Ω*_{α}=

*Γ*^{−1}

_{x}⊗

*Σ*_{w}, where for 2.7and 2.8where is the Moore–Penrose pseudo-inverse of the standard duplication matrix.

To simplify notation, except where essential for comprehension, *i*, *j* and *λ* are omitted as they are fixed with respect to (2.1) estimation.

Lemma 2.1 holds even if **x**(*n*) Gaussianity is relaxed [14]. In fact, finite auto- and cross-fourth-order moments of **w**_{i}(*n*) suffice with the proviso that model (2.1) adequately describes the problem.

## 3. Result overview

Under lemma 2.1, confidence intervals and null-hypothesis thresholds for PDC (2.3) can be computed by splitting its parameter dependence on the parameter vector ** θ** by considering its decomposition into numerator and denominator:
3.1The following results hold, thanks to the quadratic form dependence of

*π*

_{n}(

**) and**

*θ**π*

_{d}(

**) on**

*θ***:**

*θ*1.

*Confidence intervals*. For large*n*_{s}, (3.1) is asymptotically normal, i.e. 3.2where*γ*^{2}(*λ*) is a frequency-dependent variance whose expression requires the notation introduced in the appendix.2.

*Null hypothesis threshold*. Under the null hypothesis, 3.3*γ*^{2}(*λ*) vanishes identically, and (3.2) no longer holds, requiring the next Taylor expansion term [15] in the asymptotic expression of (3.1), which has an dependence. The resulting distribution is that of a linear combination of at most two properly weighted with frequency-dependent weights: 3.4where

*l*_{k}(*λ*) (details left to the appendix) depend only on PDC's numerator*π*_{n}((*θ**λ*)). PDC denominator dependence is implicit on the left-hand side of (3.4).

The *χ*^{2} dependence comes from the quadratic behaviour of the ** θ** parameters, which are themselves asymptotically normal according to lemma 2.1.

## 4. Numerical illustrations

### (a) Example 4.1: Sunspot × melanoma incidence data

Computing |PDC|^{2} for the relationship between sunspot activity and Connecticut melanoma incidence [16] over only 37 years leads to the upper panels of figure 1, whose striking feature is that the PDC relationship counterintuitively turns out to be small as opposed to the one. Despite its small size, closer examination reveals that |*PDC*|^{2} is above threshold around the characteristic sunspot periodicity and that , though numerically high, is below the test threshold and is, hence, not significant.

Apparent interaction size discrepancies such as this were the primary reason for introducing *g*PDC [9], and later *ι*PDC [11,12], shown in figure 1*b*, where the relative estimated sizes are more in accord with intuition and retain the correct directed inferential significance relationship. The *g*PDC results are similar to those for *i*PDC because *Σ*_{w} is diagonal, and may be appreciated in reference [10].

The difference between PDC and *g*PDC/*i*PDC stems from the different units/magnitudes of the observed quantities (sunspot number versus melanoma cases per 100 000 inhabitants). The scale invariance of *g*PDC/*i*PDC solves this and dispenses with the common often inexplicit practice of data standardization, i.e. division of each original time series by its standard deviation (see also Baccalá *et al*. [9]).

Despite possible misleading impressions caused by the computed magnitude of the original PDC, it is important to keep in mind that connectivity inference using it is, nonetheless, correct under the appropriate statistical threshold.

Now consider the model
4.1from reference [1] for the next two examples. In all cases, *n*_{s}=2000 observations were used after a burn-in period of 1000 to ensure stationarity. The model order *p*=3 was obtained via Akaike's information criterion.

### (b) Example 4.2: Dependent inputs

Use *w*_{i}(*n*)=*e*_{i}(*n*)+*a*_{i}*e*_{6}(*n*) in (4.1) with mutually independent white unit-variance zero-mean Gaussian time series *e*_{i}(*n*) and realize that *e*_{6}(*n*) simultaneously affects all series to induce correlations representing *instantaneous Granger causality* [14], something that is suited to *ι*PDC use. Different *a*_{i} produce final *x*_{i}(*n*) that differ not only in their instantaneous coupling but also in their variances. |*ι*PDC|^{2} results for a given trial (figure 2) display correctly detected connections.

Figure 3 confirms the asymptotic statistical behaviour and is consistent with allied simulations in reference [7], for |PDC|^{2}, and in reference [10], for |*g*PDC|^{2}, using the same model.

### (c) Example 4.3: PDC performance under coloured inputs

Now use *w*_{i}(*n*)=*e*_{i}(*n*)+*a*_{i}*e*_{6}(*n*)+*b*_{i}*e*_{7}(*n*−1)+*c*_{i}*e*_{7}(*n*−2) in (4.1) (*e*_{7}(*n*) is also zero-mean white unit-variance, and independent of the other *e*_{i}(*n*)). This allows investigation of the example in reference [17] by taking *a*_{i} as uniform independent random variables in the [0,1] interval and *b*_{i}=2 and *c*_{i}=5, leading to system inputs that are no longer white.

Contrary to PDC failure claims [17], figures 4–6 show that all PDC forms successfully infer connectivity, which is further confirmed by the time domain Wald-type Granger causality [14] *p*-value tests listed in figure 4.

The first striking difference between the various |*PDC*|^{2} forms is their magnitude. The respective confidence intervals scale according to PDC magnitude.

|*i*PDC|^{2}, despite its small size, that decreases as instantaneous input power is increased, leads to correct connectivity detection. Note how the *p*-values in figure 4 decrease as |*ι*PDC|^{2} approaches the thresholds.

## 5. Discussion

By establishing a common PDC asymptotics framework, this paper epitomizes the effort begun in reference [7] for the original PDC and leads to rigorous *g*PDC derivation details for illustrative simulations first shown in reference [10]. The present results also apply to the *i*PDC case [11,12].

Outside this framework is Schelter *et al*.'s PDC-related work [18], the statistical behaviour of which lies outside of the present scope.

The derivation's unified nature implies that the PDC forms addressed herein have similar statistical behaviour and that, despite PDC magnitude issues as for the original PDC, they all correctly detect directional connections provided proper null hypothesis thresholds are used in each PDC case, and, most importantly, as long as the fitted models are good, i.e. the observed time-series durations are compatible with model orders, and the fitted residuals are white. Naturally, because of their scale invariance, *g*PDC and *i*PDC should be preferred over PDC, specially now that they, too, have objective connectivity detection criteria.

The examples in this paper were chosen to illustrate some specific points. The sunspot/ melanoma example had only very few observations (merely *n*_{s}=37 data points), and results were consistent even though sunspot behaviour hardly conforms to signal Gaussianity as rigorously required or to the desirable *n*_{s} large limit. This is in line with the more general validity of lemma 2.1 in a more general setting.

Example 4.2 was the most canonical one in that all theoretical premises were imposed by construction, thus permitting full appreciation of distribution quantiles. *ι*PDC was stressed for its objective ability to address interaction size effects in the presence of instantaneous causality.

Example 4.3 involved imposing coloured inputs whose presence can introduce biases into the estimated models. Yet, even in this case, using model parameters borrowed from other published results led to correct connectivity inference.

Because interval/threshold computations are intricate, their derivations were left to the appendix even though they constitute an important part of the present paper. Further illustrations of the examples are provided as electronic supplementary material that furthermore contains reference to the routines from the open-source AsympPDC package (available from http://www.lcs.poli.usp.br/~baccala/philosophical) used in producing the examples.

## Acknowledgements

Support from FAPESP (grant no. 2005/56464-9; CInAPCe Programme), CAPES Bioinformatics Graduate Programme (D.Y.T. and C.S.N.B.), FAPESP (grant no. 2008/08171-0; D.Y.T), CNPq (grant nos 306964/2006-6 and 304404/2009-8; L.A.B.) are gratefully acknowledged.

## Appendix A

In the following, function arguments are shown explicitly only when essential for comprehension and must otherwise be deduced from context.

Stating PDC in a general form requires preliminary definitions.

To include *λ* dependence, and avoid using complex numbers for PDC definition, consider the change of variables from ** α** to

**a**, A1where is an

*N*×

*N*identity matrix and whose blocks are

*K*

^{2}×

*pK*

^{2}dimensional of the form for

This reparametrizes the problem using , i.e.
A2with implicit *i*, *j* and *λ* dependence.

### Lemma A.1 (α-asymptotics)

The asymptotic properties of the are given by A3where A4with A5for

The proof is straightforward by direct computation using (A1).

To usefully write (2.3) as the (A2) ratio, consider the *K*^{2}×*K*^{2} matrices:

—

**I**_{ij}, whose entries are zero except for indices of the form (*l*,*m*)=((*j*−1)*K*+*i*,(*j*−1)*K*+*i*), which equal 1; and—

**I**_{j}, which is non-zero only for entries whose indices are of the form (*l*,*m*) with (*j*−1)*K*+1≤*l*=*m*≤*jK*.

These matrices can be used to construct
which are parameter choice operators for the *j* to *i* pair of interest.

### Lemma A.2

The absolute value of squared (2.3) can be written as a ratio of quadratic forms:
A6where and do not depend on *λ* and take on different values according to the desired PDC form being listed in table 2.

The proof follows by direct substitution.

Thus,
A7have naturally split factors in their dependence on **a** and ** σ** (table 1).

### Theorem A.3 delta methoddelta method

*Let the distribution of* **v**_{n}*=(v*_{1}*,…,v*_{k})^{T} *estimated from n observations converge in distribution as
*A8*Let g(***v***) be a real-valued function with continuous partials of order m>1 in the neighbourhood of* **v***=**μ**, with all the partials of order j with 1≤j≤m−1 vanishing at* **v***=**μ**and non-vanishing mth-order partials at* **v***=**μ**. Then
*A9*with
*A10*From this one can readily deduce the following for large n and non-null first derivatives in (*A9*).*

### Corollary A.4

For a real differentiable function *g*(**v**) asymptotically distributed as in (A8), then
A11is the first delta method approximation, where **g**=**∇**_{v}*g* is the gradient of *g*(**v**) computed at ** μ** [15].

**A.1. Confidence interval theorem**

### Proposition A.5

Omitting λ and ** σ** for the sake of simplicity, one has
A12where

*n*

_{s}is the number of observations and A13for A14and A15where the values of

*ξ*_{n}and

*ξ*_{d}are listed in table 3 and where

*Θ*_{K}=(

**T**

_{2K,K}⊗

**I**

_{2K2})(

**I**

_{K}⊗vec(

**I**

_{2K}⊗

**I**

_{K})) and

**T**

_{L,M}stands for the commutation matrix [14].

When *Σ*_{w} is known *a priori* or does not need to be estimated, **g**_{σ}*Ω*_{σ}**g**^{T}_{σ} in (A15) is zero.

### Proof.

The proof follows from theorem A.4 under lemma A.1, by computing the gradient of . This is simplified as **a** and ** σ** are asymptotically independent (hence is block diagonal), allowing their separate consideration.

The transpose of the required gradients can be written as
A16where *ψ* represents either **a** or ** σ** and respectively leads to

**g**

_{a}and

**g**

_{σ}.

These gradients operate on the quadratic forms whose differentiation with respect to **a** yields (see reference [19], p. 175 and equation (2))
A17because all appropriately valued matrices are symmetric. Inserting the terms in (A16) leads to (A14).

The nested dependence of on ** σ** calls for the use of the chain rule (see reference [19], p. 183 and equation (3)):
A18where
A19In its general form, . Thus, by the chain rule (see reference [19], p. 184 and equation (11)(c)).
A20where
A21

All that is left to compute are the derivatives of in each case (table 1). For for each possible value, one has
A22which follows from lack of ** σ** dependence.

When (see reference [20], p. 183 and equation (19)) then
A23and finally let so that implies
A24which requires the chain rule twice where (see reference [19], p. 185 and equation (16))
A25and
A26and trivially
A27which together in (A24) reduce to
A28after some additional algebra. The latter results are summarized as *ξ*_{n} and *ξ*_{d} quantities in table 3 and comprise (A15).

Use of Slutsky's lemma concludes the proof by allowing the use of estimated quantities. □

**A.2. Null hypothesis test**

Under the null hypothesis A29both (A14) and (A15) equal zero, and (3.2) no longer applies, so that the next Taylor term becomes necessary [15] weighted by one-half of PDC's Hessian at the point of interest with an dependence. Via a device similar to that used in reference [7], one can show the following.

### Proposition A.6

Under (A29)
A30(see (3.2)) where *l*_{k} are the eigenvalues of , where **L** is the Cholesky factor of and and reduces to 1 whenever *λ*∈{0,±0.5} or *p*=1 for all *λ*∈[−0.5,0.5].

Equation (A30) is a linear combination of variables whose weights depend on estimated parameter and covariance values appearing in , which depend on *λ* (A5).

### Proof.

Using theorem A.3 with *m*=2 as both (A14) and (A15) vanish requires (A14) and (A15) derivation with respect to ** σ** a second time. This does not alter the dependence and so also produces null results. The same holds when deriving (A15) with respect to

**a**because it is quadratic in . The only non-zero surviving term comes from deriving (A14) with respect to

**a**and reduces to A31Therefore, the Hessian in (A9) only has an upper non-zero block corresponding to (A31) implying that only the distribution of

**a**needs to be considered in writing A32under theorem A.3 for . Slutsky's lemma concludes the first part of the proof by allowing the use of estimated quantities.

Diagonalizing is done via the Cholesky factor **L**_{a} from . Making **x**=**L****y**, where , yields , which means the vector elements **y**=(**L**^{T}**L**)^{−1}**L**^{T}**x** become mutually independent zero-mean and unit-variance so that diagonalizing **D**=**U****Λ****U**^{T} with **U****U**^{T}=**I**_{q×q} produces
A33where **u**_{k} is the *k*th column of **U**. It is easy to show that the variables *ζ*_{k}=**u**^{T}_{k}** y** are mutually independent, unit-variance zero-mean normal, implying are random variables.

As rank(** X**)=rank(

*X*^{T}) and , after recalling explicit

*λ*dependence, this implies that A34is upper bounded by and is given by when

*λ*∈{0,±0.5} or by when

*p*=1 for all

*λ*, thereby concluding the proof. □

Details on obtaining the threshold values from the linear combination followed those adopted in Takahashi *et al*. [7] and references therein.

## Footnotes

One contribution of 13 to a Theme Issue ‘Assessing causality in brain dynamics and cardiovascular control’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.