## Abstract

Granger causality is increasingly being applied to multi-electrode neurophysiological and functional imaging data to characterize directional interactions between neurons and brain regions. For a multivariate dataset, one might be interested in different subsets of the recorded neurons or brain regions. According to the current estimation framework, for each subset, one conducts a separate autoregressive model fitting process, introducing the potential for unwanted variability and uncertainty. In this paper, we propose a multivariate framework for estimating Granger causality. It is based on spectral density matrix factorization and offers the advantage that the estimation of such a matrix needs to be done only once for the entire multivariate dataset. For any subset of recorded data, Granger causality can be calculated through factorizing the appropriate submatrix of the overall spectral density matrix.

## 1. Introduction

Cognitive functions are achieved through cooperative neural processing. Multi-electrode recording and functional imaging are key technologies to enable us to study network mechanisms of cognition and their breakdown in disease. Analytically, cross correlation functions and ordinary coherence spectra remain the main statistics for assessing interactions among the monitored nodes of a neuronal network. As the hypotheses concerning the role of neural interactions in cognitive paradigms become more elaborate, it is felt that being able to evaluate the direction of neural interactions in neural networks holds the key to teasing out the functional significance of different nodes in these networks. Recently, there has been considerable interest in a class of techniques called Granger causality to provide a statistically principled way to measure directional influences [1–20].

Typically, Granger causality is inferred parametrically through autoregressive models of time-series data. For a multivariate dataset recorded from a set of brain areas or neuronal ensembles, depending on the problem, one might be interested in different subsets of the recorded areas. According to the current estimation framework, for each subset, one conducts a separate model fitting process. Such an approach may introduce unwanted variability and uncertainty as the model parameters (e.g. model order) are often different for different subsets of data channels. In addition, our past work [16] has shown that Geweke's method for conditional Granger causality estimation, while theoretically sound, can give erroneous results in practice, because quantities assumed to be the same in different autoregressive models are not always the same when finite datasets and inevitable estimation errors are taken into account.

The goal of this paper is to provide a multivariate framework for estimating Granger causality that overcomes these shortcomings. We begin by reviewing the essential ideas of Granger causality with special emphasis on its spectral representation. The current estimation framework is then extended to a multivariate formalism based on the technique of spectral density matrix factorization. Numerical examples and experimental data are included to illustrate the theoretical ideas.

## 2. Material and methods

### (a) Pairwise analysis

Consider two simultaneously recorded stationary time series. According to Wiener [21], if the prediction of one time series is improved by incorporating the knowledge of a second one, then the second series is said to have a causal influence on the first. Wiener's proposal lacks the machinery for practical implementation. Granger later formalized the prediction idea in the context of linear regression models [22]. Specifically, if the variance of the autoregressive prediction error of the first time series at the present time is reduced by inclusion of past measurements from the second time series, then the second time series is said to have a causal influence on the first one. Reversing the role of the two time series, one can consider the causal influence in the opposite direction. The interaction discovered this way could be either reciprocal or unidirectional. Below we first give a brief introduction to the mathematical formulation of Granger causality based on the more detailed exposition in [2], and then proceed to define the multivariate estimation framework.

Let the two stationary time series be denoted by *X*_{t} and *Y* _{t}. Individually, *X*_{t} and *Y* _{t} can each be represented by the following autoregressive models:
2.1and
2.2Jointly, they are represented as the following bivariate autoregressive model:
2.3where *ε*_{2t} and *η*_{2t} are uncorrelated over time, and their contemporaneous covariance matrix is
2.4The total interdependence between *X*_{t} and *Y* _{t} can be computed as
2.5where |⋅| denotes the determinant of the enclosed matrix. When *X*_{t} and *Y* _{t} are independent, *F*_{X,Y}=0. When this total interdependence is non-zero, Geweke [23] showed that *F*_{X,Y} can be decomposed into three components,
2.6where
2.7and
2.8are the linear causality from *Y* _{t} to *X*_{t} and from *X*_{t} to *Y* _{t} due to their interactions, and
2.9is the instantaneous causality due to possibly common input exogenous to the bivariate time-series system *X*_{t} and *Y* _{t}.

After Fourier transforming the bivariate autoregressive representation (2.3) and applying proper ensemble average, we get the spectral density matrix
2.10where the asterisk (*) denotes the complex conjugate and matrix transpose, and **H**(*ω*) is the transfer function matrix.

The spectral density matrix **S**(*ω*) contains cross-spectra (off-diagonal elements) and auto-spectra (diagonal elements). If *X*_{t} and *Y* _{t} are independent, then the cross-spectra are zero and the determinant |**S**(*ω*)| equals the product of two auto-spectra. This observation motivates the spectral domain representation of total interdependence between *X*_{t} and *Y* _{t} as
2.11It is easy to see that this representation of interdependence is related to coherence by the following relation:
2.12where the coherence function is defined as *C*(*ω*)=|*S*_{xy}(*ω*)|^{2}/*S*_{xx}(*ω*)*S*_{yy}(*ω*).

Geweke [23] proved that the two directional measures can be computed explicitly according to 2.13and 2.14where and .

By defining the instantaneous causality spectra as 2.15we achieve a decomposition of the total interdependence in the spectral domain that is analogous to equation (2.6) in the time domain, namely, 2.16It is worth noting that the spectral instantaneous causality could become negative for some frequencies in certain situations and may not have a readily interpretable physical meaning on a frequency-by-frequency basis.

Under general conditions, the spectral measures integrated over the frequency yield their time-domain counterparts [23], 2.17It is expected that the conditions necessary for these equalities to hold are met in practical applications.

### (b) Conditional Granger causality

When there are three or more time series, a pairwise analysis can be performed using the above bivariate approach. Such a pairwise approach may lead to ambiguous results in terms of differentiating direct from mediated causal influences [2,16,24]. A conditional Granger causality analysis can help resolve the ambiguity. Consider three time series *X*_{t}, *Y* _{t} and *Z*_{t}. Suppose that a pairwise analysis reveals a causal influence from *Y* _{t} to *X*_{t}. The following procedure determines whether this influence has a direct component or is mediated entirely by *Z*_{t}.

Let the joint autoregressive representation of series *X*_{t} and *Z*_{t} be
2.18where the covariance matrix of the noise vector is . Next, consider the joint autoregressive representation of all three time series *X*_{t}, *Y* _{t} and *Z*_{t},
2.19where the covariance matrix of the noise vector is . From these two sets of equations (2.18) and (2.19), the Granger causality from *Y* _{t} to *X*_{t} conditional on *Z*_{t} is defined as
2.20If *F*_{Y →X|Z}>0 in some suitable statistical sense, then the inclusion of *Y* _{t} results in improved prediction of *X*_{t}, indicating that *Y* _{t}→*X*_{t} has a direct component. In contrast, if *F*_{Y →X|Z}=0, the influence *Y* _{t}→*X*_{t} is said to be entirely mediated by *Z*_{t}. Conditional measures such as *F*_{Y →Z|X} and *F*_{X→Z|Y} can be similarly defined.

The time-domain conditional Granger causality can be decomposed into its frequency component
2.21where
The quantities involved in the above expression come from **G**(*ω*) and **H**(*ω*), which are the transfer function matrices for the normalized bivariate autoregressive model involving *X*_{t} and *Z*_{t}, and the normalized trivariate model involving *X*_{t}, *Y* _{t} and *Z*_{t}, respectively. The details are omitted here. See below or refer to [24] for the derivation of the normalization processes that make all the noise terms uncorrelated.

### (c) Multivariate estimation of Granger causality: basic idea

The central quantity of interest in traditional multivariate spectral analysis is the spectral density matrix **S**(*ω*), from which one can derive statistics, such as auto-power, coherence, multiple coherence and partial coherence [25]. Here, we show that, with the help of spectral density matrix factorization, one can develop a multivariate Granger causality approach centred on the spectral density matrix. Specifically, for a multi-channel set of time-series recordings, a single autoregressive model is fit to all the available data to yield the overall spectral density matrix. For any subsystem, Granger causality analysis, pairwise or conditional, can be performed by using elements from the overall spectral density matrix without having to resort to additional model fitting. This approach offers benefits both theoretically and in practical terms.

### (d) Spectral density matrix factorization

For a given spectral density matrix **S**(*ω*) satisfying , the spectral density matrix factorization theorem ensures that it can be decomposed into a set of unique minimum-phase functions [25,26]
2.22where ** Ψ** is the minimum-phase, spectral density matrix (left) factor that has a Fourier series expansion in non-negative powers of e, and

*** is its complex conjugate transpose. From the minimum-phase spectral factor**

*Ψ***, the noise covariance matrix**

*Ψ***and minimum-phase transfer function**

*Σ***H**(

*ω*) can be obtained as 2.23and 2.24such that

***=**

*ΨΨ***H**

*Σ***H***. Here, T stands for matrix transposition.

### (e) Multivariate estimation of Granger causality: formulation

Consider a *p*-dimensional multivariate stochastic process, **X**(*t*)=[*X*_{1}(*t*),*X*_{2}(*t*),⋯ ,*X*_{p}(*t*)]^{T}, where *p* is the number of recording channels. Parametrically, the spectral density matrix of the process can be computed by estimating its multivariate autoregressive (MVAR) representation
2.25where *A*_{ij}(*k*) is the coefficient at *k*th lag, and *ε*_{i}(*t*) is a white noise residual with zero mean. After Fourier transforming, the above equations and suitable ensemble averaging, one gets the overall spectral density matrix as
2.26where and ** Σ** is the covariance matrix of the noise vector. Note that for real data, the above infinite series needs to be truncated to a finite order. For discussion on the determination of model order and additional references, see [2].

A well-recorded multivariate dataset from a well-designed experiment can be used to address a myriad of problems. If a specific problem calls for the analysis of a subset of recording channels, suitable elements of the overall spectral matrix can be selected to form the spectral matrix for that problem. Factorizing this spectral density matrix according to equations (2.22)–(2.24), and combining the outcome with Geweke's Granger formulation outlined earlier, one can examine the causal relationship among this subset of channels. The key here is that this process can be repeated for different subsets of channels without having to fit autoregressive models for each subset. This is one benefit of the multivariate approach for the estimation of the Granger causality.

We illustrate the approach by considering the problem where the direct causal influence from the *j*th channel to the *i*th channel needs to be evaluated within a subset of channels that includes the *i*th and the *j*th channels and another *w* channels (0≤*w*≤*p*−2). First, select the suitable elements from the overall spectral density matrix to form the specific spectral density matrix for the problem,
2.27Factorization of this spectral density matrix gives the unique decomposition
2.28where is the transfer function matrix and the noise covariance matrix for the subsystem of interest. Second, consider another spectral density matrix
2.29Applying the factorization technique again, we get the decomposition
2.30where the transfer function is and the noise covariance matrix ** Ω**. These factorized quantities can be viewed as coming from equivalent autoregressive representations of the respective stochastic processes and thus form the foundation for Granger causality estimation in the spectral domain.

In order to compute conditional Granger causality for the above problem, a key step is Geweke's normalization method by which the noise terms are made independent [24]. For the model represented in equation (2.28), the following transformation matrix can be used for the purpose:
2.31where **C**=−(_{wj}−_{wi}_{ij}/_{ii})/(_{ij}−_{ji}_{ij}/_{ii}). For equation (2.30), the transformation matrix is
2.32After the transformation, the new transfer function matrices and covariance matrices will be
2.33and
2.34The direct causal influence from the *j*th channel to the *i*th channel can be calculated with the conditional Granger causality
2.35where
It is worth noting that the pairwise Granger causality from the *j*th channel to the *i*th channel can be obtained by letting *w*=0 in the above expressions. In addition, when *w*=*p*−2, the above conditional Granger causality takes into account all the information in the recorded dataset.

### (f) Non-parametric Granger causality analysis

Thus far, our theoretical development assumes that autoregressive models have been obtained from the time-series data. It is straightforward to see that the spectral density matrix factorization approach also enables the inclusion of Granger causality as part of the non-parametric spectral analysis framework where the spectral density matrix can be estimated via Fourier or wavelet transforms [27,28]. Below we demonstrate this approach.

Let **X**(*t*)=[*X*_{1}(*t*),*X*_{2}(*t*),⋯ ,*X*_{p}(*t*)]^{T} denote a *p*-dimensional multivariate stochastic process where *p* is the number of recording channels. Now suppose that we observe *M* realizations or trials of the above process with each realization being of length *n*: {*x*_{r}(*t*)}(*r*=1,…,*p*;*t*=1,…,*n*). For a single trial, the multi-taper cross-spectrum estimator between channels *l* and *m* at frequency *ω* is
2.36where *w*(*k*) (*k*=1,2,…,*K*) are *K* orthogonal tapers of length *n* [29,30] and Δ is the sampling interval. For *l*=*m*, we obtain the auto-spectrum. Averaging the cross-spectrum estimators for all pairs of channels over individual realizations or trials leads to the spectral density matrix
The diagonal terms represent auto-spectra, whereas the off-diagonal terms cross-spectra. Once the spectral density matrix is available, the subsequent multivariate analysis of Granger causality follows the same procedure given in equations (2.27)–(2.35).

### (g) Experimental data

Detailed descriptions of the experimental procedure and related data analysis can be found in [10]. Briefly, functional magnetic resonance imaging (fMRI) data were recorded from 12 healthy subjects performing a visual spatial attention experiment, using a 3T Siemens Magnetom Trio MRI system. There were 12 attention blocks and 12 passive view blocks in each experiment. Data preprocessing, including slice timing, realignment, coregistration, normalization and spatial smoothing, and global scaling, were performed using SPM2 (http://www.fil.ion.ucl.ac.uk/spm/). Regions of interest (ROIs) were obtained by contrasting attention blocks against passive view blocks. Three ROIs were selected for this work: dorsal anterior cingulated cortex (dACC), dorsal lateral prefrontal cortex (dLPFC) and right temporo-parietal junction (rTPJ). These regions are known to be important in attention and in cognitive control [31,32]. From the three ROIs, fMRI time series were extracted and subjected to conditional Granger causality analysis. Specifically, *F*_{dACC→rTPJ|dLPFC} and *F*_{dLPFC→rTPJ|dACC} were estimated and correlated with performance accuracy to assess their functional significance.

## 3. Results

### (a) Simulations

We first use numerical examples to illustrate the application of Granger causality. Simulation data were generated by coupled autoregressive models of varying network complexity. The focus is on the multivariate approach and on the ability of conditional Granger causality to determine unequivocally the built-in network connectivity from time-series data.

#### (i) Example 1

Consider a five-node oscillatory network. The network configuration is shown in figure 1*a*. The mathematical equations are
3.1where *ε*_{1}(*t*), *ε*_{2}(*t*), *ε*_{3}(*t*), *ε*_{4}(*t*) and *ε*_{5}(*t*) are independent Gaussian white noise processes with zero means and variances respectively. The intrinsic dynamics of each node is chosen in such a way that they all exhibit a prominent spectral peak at the same frequency. From construction, the signal from the first node (the source) is propagated to the other four nodes (the targets) with differential time delays. A pairwise analysis will reveal non-zero Granger causality from the nodes that receive an early input from the source node to the nodes that receive a late input (e.g. node 3→node 4). Clearly, this does not depict the true connectivity of this dynamical network. Below it will be shown that conditional Granger causality helps to resolve this problem.

A dataset of 500 realizations each with 50 time points was generated. The sampling rate is taken to be 200 Hz. Assuming no knowledge of the model equations (3.1), we fitted a fifth-order MVAR model to the simulated dataset and calculated power, coherence and conditional Granger causality spectra from it. In figure 1*b*, power spectra for all five nodes are shown in panels with the same row and column index, where a spectra peak at around 40 Hz is seen. For the (*I*,*J*) panel, where *I* is the row index that is not equal to the column index *J*, the conditional Granger causality is from the column index to the row index, namely, *J*→*I*. For example, the bottom-left panel is the conditional Granger causality from node 1 to node 5. Only the first column has non-zero conditional Granger causality values, reflecting the driving influence emanating from node 1. The conditional Granger causality among other pairs of nodes is uniformly zero. This corresponds precisely to the structural connectivity pattern in figure 1*a*. One noteworthy feature about figure 1*b* is the consistency of spectral features (i.e. peak frequency) across both power and Granger causality spectra. This is important since it allows us to link local dynamics with that of the global network.

#### (ii) Example 2

A more complex network involving the same five nodes is considered here,
3.2where the noise covariance matrix is
3.3As can be seen, one of the four nodes (node 4) receiving inputs from the source node also transmits information to its nearest neighbours (figure 2*a*). In addition, the noise terms at different nodes are correlated, giving rise to non-zero instantaneous causality [24].

The simulated dataset consists of 200 trials of 500 points each. First, the parametric approach was used. A fifth-order MVAR was fitted to the dataset. The power spectra and conditional Granger causality spectra are shown in figure 2*b*. Clearly, the structural pattern of this network is correctly recovered by the multivariate Granger causality analysis.

Next, we applied the non-parametric approach described earlier to this example. The power spectra and conditional Granger causality spectra computed with this non-parametric approach are shown in figure 2*c*. It is seen that this non-parametric approach also correctly recovers the same network connectivity pattern.

### (b) Experiment

The 12 attention blocks were rank ordered according to performance accuracy and segmented into 10 groups [10]. Both *F*_{dACC→rTPJ|dLPFC} and *F*_{dLPFC→rTPJ|dACC} were calculated for each group, averaged across subjects and plotted as a function of mean group accuracy. The results are shown in figure 3*a*,*b*. It can be seen that while *F*_{dACC→rTPJ|dLPFC} is positively correlated with performance accuracy (Spearman correlation, *R*=0.82,*p*<0.01), suggesting that *F*_{dACC→rTPJ|dLPFC} may represent top-down cognitive control, *F*_{dLPFC→rIPS|dACC} is not correlated with performance accuracy (*R*=0.14,*p*=0.71).

## 4. Discussions

Multi-electrode neural recordings and functional imaging are becoming commonplace. Such multivariate data promise unparalleled insights into how different areas of the brain work together to achieve thought and behaviour and how such coordinated brain activity breaks down in disease. While the accumulation of data continues at an astonishing rate, how to effectively analyse these data to extract information about the workings of the brain remains a key challenge. Recent years have witnessed the rapid growth in the applications of various directional measures to multi-channel neural data. The fundamental concept underlying these measures is that of Granger causality. In §2, we presented a brief introduction to this concept with emphasis on Geweke's spectral causality decomposition. For many applications, the spectral representation is necessary as the experimental recordings are replete with oscillatory phenomena. The mathematical development leading to pairwise as well as conditional causality estimations was given in the same section.

Traditionally, Granger causality is estimated via autoregressive models. For multivariate datasets, one has to fit autoregressive models multiple times if one is interested in analysing different combinations of network nodes, increasing the likelihood for estimation errors and model parameter inconsistency. In the proposed formalism, this problem is overcome: a single autoregressive model is fit to all the data and from the overall spectral density matrix, one can select elements corresponding to the subset of channels of interest to form a new spectral density matrix. Since any spectral density matrix can be uniquely factorized into the transfer function and the covariance matrix of the corresponding autoregressive model, Granger causality analysis can be carried out on the subsystem without having to go through a new model fitting process. This spectral density matrix factorization based approach also opens the possibility to extend non-parametric spectral analysis to include Granger causality where the spectral density matrix can be obtained from Fourier or wavelet transforms of data [27,28]. The mathematical expressions for estimating multivariate Granger causality between two channels conditioned on another set of channels were developed in detail.

In practical applications, there is often a need to assess the statistical significance of pairwise and conditional Granger causality values. Past work has employed the random permutation approach for this purpose [2,3,14]. Other methods based on the idea of surrogate data [33,34] or based on the *F*-test [35] have also been proposed. The functional significance of Granger causality values can be assessed by correlating them with experimental conditions [10].

The effectiveness of Granger causality and the proposed estimation framework was first illustrated on numerical simulations using coupled autoregressive models of varying network complexity. The network configuration identified using our approach is found to agree with the built-in connectivity.

For experimental demonstration, we chose fMRI data recorded from subjects performing a visual spatial attention task [10]. Three ROIs were considered: dACC, right dLPFC and rTPJ. rTPJ is a posterior component of the ventral attention network [31]. Both dACC and dLPFC are thought to be important in maintaining task set and in exerting top-down control over downstream sensorimotor processing [31,32,36]. The exact manner with which the control is achieved is, however, not clear. Our analysis showed that causal influences from dACC to rTPJ conditioned on dLPFC are behaviour enhancing, while causal influences from dLPFC to rTPJ conditioned on dACC have no effects on behaviour. These results appear to suggest that for the current visual spatial attention task, dACC is more important in regulating rTPJ to facilitate task performance.

## Acknowledgements

We thank Yonghong Chen for help with data analysis. This work was supported by NIH grant nos MH079388, MH070498 and MH097320. G.R. was supported by grants from the J. C. Bose National Fellowship, DST Center for Mathematical Biology and DST IRHPA Centre for Neuroscience.

## Footnotes

One contribution of 13 to a Theme Issue ‘Assessing causality in brain dynamics and cardiovascular control’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.