## Abstract

I review the use of the concept of Granger causality for causal inference from time-series data. First, I give a theoretical justification by relating the concept to other theoretical causality measures. Second, I outline possible problems with spurious causality and approaches to tackle these problems. Finally, I sketch an identification algorithm that learns causal time-series structures in the presence of latent variables. The description of the algorithm is non-technical and thus accessible to applied scientists who are interested in adopting the method.

## 1. Introduction

The learning of cause–effect relationships is an important task for our understanding of the functionality of the brain and the cardiovascular system. In recent years, an increasing number of studies have used time-series methods based on the notion of Granger causality [1,2,3,4,5,6,7]. This probabilistic concept of causality exploits the fact that causes must precede their effects in time. However, temporal precedence alone is not a sufficient condition for establishing cause–effect relationships, and it is well known that the omission of relevant variables can lead to the so-called spurious causalities, that is, falsely detected causal relationships. Consequently, one often meets reservations that Granger causality is not suited as a tool for causal inference [8]. For a controversy about the application of Granger causality in brain imaging, I refer to Roebroeck *et al.* [6] and the related papers in the same issue of *NeuroImage*.

While it is true that association—even though between lagged variables—does not necessarily constitute a causal link, the concept of Granger causality still remains a useful tool for causal learning. Given the nowadays widespread use of Granger causality, it is therefore important to raise awareness of the limitations of Granger causality and to point out strategies for valid causal inference from time-series data. The objective of the present paper is to review the concepts and methods for causal inference and to outline the underlying assumptions on which the causal conclusion rests.

The paper is organized as follows. In §2, I review four theoretical concepts of causality for multivariate time series and point out the relations among them. The discussion shows under which circumstances Granger causality as well as Sims causality—another probabilistic concept of causality from econometrics that will be of interest for our discussion of causal inference—indeed correspond to a true causal link between the variables. In §3, I discuss the empirical versions of Granger and Sims causality further. In particular, I move that Sims causality corresponds to the total information flow from one variable to another; in §4, I illustrate its use for detecting certain types of spurious causalities. In §5, I briefly sketch the basic steps for identifying causal relationships from observational time-series data. Section 6 concludes the paper with a discussion of open problems and challenges for causal inference in the time-series case.

## 2. Concepts of causality for multiple time series

Among the many possible properties that have been hypothesized for cause–effect relationships, there are two properties that have been particularly fruitful in the context of time series.

—

*Temporal precedence*: a cause precedes its effects in time.—

*Physical influence*: manipulations of the cause change the effects.

The second property expresses the common sense notion that causal relationships link variables fundamentally and continue to work if other parts of the system are changed. Thus, an enforced change of value in the causing variable will lead to corresponding changes in the caused variable. This aspect is central to the modern theory of causality [9,10,11,12,13].

In time-series analysis, most definitions of causality focus on the first property of temporal precedence. The concepts have the advantage that they provide readily empirical versions that can be used for inference. The most prominent concept is that of Granger causality [14,15,16], which recently has attracted much attention in the field of neuroscience [1,3,17,6]. However, similar to correlation in ordinary multivariate analysis, the concept of Granger causality is basically probabilistic and can lead to wrongly detected causal relations in the presence of latent variables. Nevertheless, it is an indispensible tool for causal inference if it is properly applied.

In the following, I review four approaches for defining causality for multiple time series and discuss the relations among them. Two of these concepts, Granger causality and Sims causality, have empirical versions, which will be the basis for the discussion of causal inference in the remaining sections of this paper.

### (a) Intervention causality

The most successful modern approach for describing causality is based on the concept of interventions [9,10,11,12,13]. The underlying principle of this approach is that a cause–effect relationship should persist if the cause is manipulated without directly affecting any other variables, whereas any non-causal associations should vanish. In the context of time series, this idea of defining a causal effect as the effect of an intervention in such a system was first proposed by Eichler & Didelez [18] and discussed in more detail in Eichler & Didelez [19] and Eichler [20].

More precisely, to describe the causal effect that one process *X* exerts on another process *Y* , I consider primitive interventions that set the value of *X* at some time *t* to a fixed value *x**, say. This yields a new probability measure, which I denote—in terms of Pearl's do-terminology [10]—by . The causal effect of *X* on *Y* can now be quantified by any functional of the post-intervention distribution of *Y* _{t′} with *t*′>*t*. The most commonly used measure is the average causal effect (ACE) defined as the average increase or decrease in value caused by the intervention.

### Definition 2.1 (Average causal effect)

The ACE or causal effect in mean of an intervention setting *X*_{t} to some value *x** on the response variable *Y* _{t′} with *t*′>*t* is given by

Here, the mean is taken with respect to the natural probability measure that describes the system without interventions. Depending on the specific application, other measures to describe the causal effect could be more appropriate. For instance, the causal effect in variance given by the difference var(*Y* _{t′} | do(*X*_{t}=*x**))−var(*Y* _{t′}) could be of interest in the analysis of financial time series, where an intervention might aim at reducing the volatility of the stock market.

Next, if the intervention in *X*_{t} shows no effect whatsoever on later instances of the process *Y* , we conclude that the process *X* does not cause the process *Y* .

### Definition 2.2 (Non-causality with respect to interventions)

The process *X* does not cause the process *Y* if an intervention in *X* setting *X*_{t} to a value *x** does not affect the distribution of future instances of *Y* , that is, if
for all and all *t*′>*t*.

I note that the process *Y* may be multi-dimensional to cover cases where an intervention in *X* does affect the relationship between two components of *Y* . Eichler [21] formulates a threshold autoregressive model as an example of such a process.

One common problem in time-series analysis is that controlled experiments cannot be carried out. Therefore, the effect of an intervention must be determined from data collected under the natural distribution . This is only possible if the intervention affects only the distribution of the target variable *X*_{t}, whereas all other conditional distributions remain the same as under the observational regime, i.e. without intervention in *X*_{t}. This invariance of distributions under different regimes is formalized by the following conditions: there exists a process *V* =(*X*,*Y*,*Z*) such that

(I1) the distributions of

*X*^{t−1},*Y*^{t},*Z*^{t}under the observational and the intervention regimes are the same, and(I2) conditional on

*V*^{t}the joint distributions of any future instances*V*_{t+1},…,*V*_{t′}for*t*′>*t*under the observational and the intervention regimes are the same.

Here, *X*^{t}={*X*_{s} | *s*≤*t*} denotes the history of the process *X* up to time *t*. Under these conditions, the distribution of the time series *V* =(*X*,*Y*,*Z*) under the intervention regime is fully specified by its natural distribution described by the measure and the intervention requiring that *X*_{t}=*x** with probability 1. Moreover, the conditions imply that any dependences between *X*_{t} and *Y* _{t′} for *t*′>*t* conditional on *X*^{t−1},*Y* ^{t},*Z*^{t} found under the observational regime are causal. Thus, the conditions impose some causal Markov property on the system.

### (b) Structural causality

In a recent article, White & Lu [22] proposed a new concept of the so-called direct structural causality for the discussion of causality in dynamic structural systems. The approach is based on the assumption that the data-generating process has a recursive dynamic structure in which predecessors structurally determine successors. For ease of notation, the following definitions are slightly modified and simplified.

Suppose that we are interested in the causal relationship between two processes, the ‘cause of interest’ *X* and the ‘response of interest’ *Y* . We assume that *X* and *Y* are structurally generated as
and
for all . Here, the process *Z* includes all relevant observed variables, while the realizations of *U*_{t}=(*U*_{x,t},*U*_{y,t}) are assumed to be unobserved. The functions *q*_{x,t} and *q*_{y,t} are also assumed to be unknown.

### Definition 2.3 (Direct structural causation)

The process *X* does not directly structurally cause the process *Y* if the function *q*_{y,t}(*x*^{t−1},*y*^{t−1},*z*^{t−1},*u*_{y,t}) is constant in *x*^{t−1} for all admissible values for *y*^{t−1}, *z*^{t−1} and *u*_{y,t}. Otherwise, *X* is said to directly structurally cause *Y* .

The idea of defining causality by assuming a set of structural equations is not new but has been used by a number of authors before [23,24,9,10]. However, in contrast to this strand of the literature, White & Lu [22] make no reference to interventions or to graphs.

It is clear from the definition that a primitive intervention on *X*_{t} results in replacing the generating equation by *X*_{t}=*x**. As the generating equation for the response variable *Y* _{t′}, *t*′>*t*, is unaffected by this, we immediately obtain that direct structural non-causality implies non-causality with respect to interventions, that is,
for all *t*′>*t* and all *y* whenever *X* does not directly structurally cause *Y* . White & Lu [22] also propose a definition of total structural causality. With this, the above result can be generalized to responses at arbitrary times *t*′>*t*. I omit the details and refer the reader to White & Lu [22].

Furthermore, I note that the converse implication is generally not true. As an example, suppose that *Y* _{t} is generated by *Y* _{t}=sign(*X*_{t−1}) *U*_{y,t}, where *U*_{y,t} has a symmetric distribution independent of *X*^{t−1} and *Y* ^{t−1}. Then, *X* directly structurally causes *Y* but the distribution of *Y* _{t} does not depend on *X*^{t−1}. Such causal relationships are only relevant for counterfactual queries such as: ‘Suppose *x*_{t−1} has a positive value. What would have been the value of *Y* _{t} if *x*_{t−1} had been negative?’ The corresponding counterfactual realization *y*_{x*,t} is given by *y*_{x*,t}=sign(*x**) *u*_{y,t}=−*y*_{t}, that is, *Y* _{t} would have had the opposite sign. However, since *Y* _{x*,t} is not observable, this causal dependence cannot be detected empirically.

### (c) Granger causality

In time-series analysis, inference about cause–effect relationships is commonly based on the concept of Granger causality [14,15]. Unlike the previous approach, this probabilistic concept of causality does not rely on the specification of a scientific model and thus is particularly suited for empirical investigations of cause–effect relationships. For this general definition of causality, Granger [14,15] evokes the following two fundamental principles:

the effect does not precede its cause in time;

the causal series contains unique information about the series being caused that is not available otherwise.

The first principle of temporal precedence of causes is commonly accepted and has also been the basis for other probabilistic theories of causation [25,26,27]. By contrast, the second principle is more subtle, as it requires the separation of the special information provided by the former series *X* from any other possible information. To this end, Granger considers two information sets: first, the set of all information in the universe up to time *t*, denoted by ; and, second, the same information set reduced by the information of *X* up to time *t*, denoted by . If the series *X* causes series *Y* , we expect by the above principles that the conditional distributions of *Y* _{t+1} given the two information sets and differ from each other. Equivalently, letting *X*⊥⊥*Y* | *Z* denote that *X* and *Y* are conditionally independent given *Z*, we have the following:

### Definition 2.4

[Granger's definition of causality [14]; 15] The process *X* does not cause the process *Y* if for all ; otherwise, the series *X* is said to cause the series *Y* .

The definition poses some theoretical problems such as the exact definition of the information set . Leaving such problems aside, it is still clear that Granger's definition of causality cannot be used with actual data. In practice, only the background knowledge available at time *t* can be incorporated into an analysis. Therefore, the definition must be modified to become operational. Suppose that the process *V* =(*X*,*Y*,*Z*) has been observed. Substituting the new information sets consisting of *X*^{t},*Y* ^{t},*Z*^{t} and *Y* ^{t},*Z*^{t} for and , respectively, we obtain the following modified version of the above definition [15,16].

### Definition 2.5 (Granger causality)

The series *X* is Granger non-causal for the series *Y* with respect to *V* =(*X*,*Y*,*Z*) if *Y* _{t+1}⊥⊥*X*^{t} | *Y* ^{t},*Z*^{t} for all ; otherwise we say that *X* Granger-causes *Y* with respect to *V* .

Granger [15,16] used the term ‘*X* is a prima facie cause of *Y* ’ to emphasize the fact that a cause in the sense of Granger causality must be considered only as a potential cause. It therefore can be (and indeed has been) debated whether ‘Granger causality’ is an appropriate term for this kind of association. A possible alternative is the equivalent notion of ‘local independence’ originally introduced in the context of time-continuous processes [28]. However, the notion of Granger causality is nowadays well established, and I will therefore continue to work with this term. Moreover, this restriction in causal interpretation does not mean that the concept is completely useless for causal inference. Indeed, as I discuss in §5, it can be an essential tool for recovering (at least partially) the causal structure of a time series *X* if used in the right way.

It is clear from the general definition given earlier that Granger intended the information to be chosen as large as possible such that, if available, all relevant variables are included. For example, if for some process *V* =(*X*,*Y*,*Z*), Granger causality with respect to *V* coincides with the original general definition with respect to . As in that case *V* comprises all (distributional relevant) direct structural causes of *Y* , we have also that *X* is Granger non-causal for *Y* given *V* if *X* does not directly structurally cause *Y* . Again, since direct structural causes need not be distributional relevant, the converse implication does not hold in general.

Similar to direct structural causality, the concept of Granger causality aims to describe direct causal–effect relationships. Consequently, it can describe the effects of interventions in only a very limited way. More precisely, suppose that the process *V* =(*X*,*Y*,*Z*) is such that conditions (I1) and (I2) are satisfied for an intervention in *X*_{t}. Then, if *X* is Granger non-causal for *Y* with respect to *V* , there is no causal effect of intervening in *X*_{t} on *Y* _{t+1}, that is,
I note that the results cannot be strengthened by considering interventions in all instances of *X*^{t}. For more information about such sequential interventions and relating their effects to quantities of the natural distribution, I refer to Eichler & Didelez [19].

### (d) Sims causality

The econometric literature features other less well-known probabilistic notions of causality that are related to Granger causality. Among these, the concept introduced by Sims [29] seems of most interest. In contrast to Granger causality, it takes in account not only direct but also indirect causal effects.

### Definition 2.6 (Sims non-causality)

The process *X* is Sims non-causal for the process *Y* with respect to *V* =(*X*,*Y*,*Z*) if {*Y* _{t′} | *t*′> *t*}⊥⊥*X*_{t} | *X*^{t−1},*Y* ^{t},*Z*^{t} for all ; otherwise, we say that *X* is Sims causal for *Y* with respect to *V* .

If the variables in *Z* include all confounders for the dependence of *Y* on *X*, Sims non-causality describes the absence of a causal effect of *X* on *Y* at any lag. More precisely, consider a process *V* =(*X*,*Y*,*Z*) such that conditions (I1) and (I2) are satisfied for an intervention in *X*_{t}. Then *X* is Sims non-causal for *Y* with respect to *V* if and only if , for all *t*′>*t* and all *y*, that is, *X* is non-causal for *Y* (with respect to an intervention in *X*_{t}).

## 3. Empirical concepts: Granger and Sims causality

Of the four concepts of causality presented in §2, the first two concepts—intervention and structural causality—are more of a theoretical nature in the sense that they provide a formal setting to describe causality and allow conclusions about how the system behaves, for instance, under certain interventions. Working with these concepts typically requires some knowledge about the causal structure; a typical application is the derivation of conditions under which the effect of an intervention can be predicted from data taken under the observational regime [19]. They are less suitable for the identification of causal relationships.

In contrast, the concepts of Granger and Sims causality are formulated in terms of statistical independence and therefore provide readily operational versions for statistical inference. Although both Granger and Sims causality apply generally to multiple time series of any type [21], empirical applications usually are confined to the framework of vector autoregressive (VAR) models. In the following, I will briefly review the empirical versions of Granger and Sims causality in this setting. For this purpose, I consider throughout this paper stationary Gaussian processes *X*=(*X*_{t}) that have an infinite autoregressive representation of the form
3.1where the innovation process *ε*=(*ε*_{t}) is white noise with mean zero and non-singular covariance matrix *Σ*. Furthermore, the *d*×*d* matrices *A*_{u} (*d* being the dimension of the process *X*) are assumed to be square summable, which ensures the existence of a moving average representation for *X*. In the following, recall that *X*^{t}={*X*_{s} | *s*≤*t*} denotes the past of *X* up to time *t*.

### (a) Granger causality

As noted in §2, the series *X*_{i} does not Granger-cause another series *X*_{j} with respect to *X* if *X*_{j,t} does not depend on conditional on the past of the remaining variables. In the above autoregressive representation, *X*_{j,t} is given by
Hence, *X*_{j,t} does not depend on if and only if *A*_{ji,u}=0 for all lags *u*>0. An alternative characterization of Granger causality is obtained by comparing the mean square prediction errors of predicting *X*_{j,t} from the full past *X*^{t−1} and past of all components except *X*_{i}: *X*_{i} does not Granger-cause *X*_{j} with respect to the full process *X* if and only if , where the conditional variance is taken to be the variance about the linear projection on the set of conditioning variables [30].

In order to learn about the causal structure of a system, we need to determine which variables Granger-cause which other variables. This can be most easily accomplished by testing whether the corresponding autoregressive coefficients are zero. More precisely, let be the least-squares estimators obtained by fitting a VAR(*p*) model to the data. It is well known [31] that the coefficients , *u*=1,…,*p*, are asymptotically jointly normally distributed with means *A*_{ji,u} and covariance matrix *V* =(*V* _{uv}) with entries *V* _{uv}=*σ*_{jj} *H*_{ii,uv} for *u*,*v*=1,…,*p*. Here, *H*_{ij,uv} is the entry at row *d*(*u*−1)+*i* and column *d*(*v*−1)+*j* of , the inverse of the covariance matrix of *X*_{t−1},…,*X*_{t−p}. Replacing *Σ* and *R*_{p} by estimates, the existence of a Granger causal effect of *X*_{i} on *X*_{j} can be tested by evaluation of the test statistic
Under the null hypothesis that *X*_{i} is Granger non-causal for *X*_{j} with respect to *X*, the test statistic *S*_{ji} is asymptotically *χ*^{2}-distributed with *p* degrees of freedom. In the case of non-Gaussian innovations, data-driven procedures for hypothesis testing based on surrogate data have been suggested as an alternative [32].

In practice, one is not only interested in testing whether or not a variable *X*_{i} Granger-causes another variable *X*_{j} but also wants to quantify any deviations from the null hypothesis. As the autoregressive coefficients *A*_{ji,u} depend—as any regression coefficients—on the unit of measurement of *X*_{i} and *X*_{j}, they are less suited for measuring the strength of a causal relationship. An alternative are the partial directed correlations *π*_{ji,u} proposed by Eichler [3], which are defined as the correlation between *X*_{j,t} and *X*_{i,t−u} after removing the linear effects of *X*^{t−1}\{*X*_{i,t−u}}, and thus possess the using scaling properties of correlations. They can be estimated by rescaling of the autoregressive estimates ; for details, I refer to Eichler [3].

In many applications in neuroscience, the time series of interest are characterized in terms of their frequency properties. The frequency domain description of a process *X* and the relationships among its components are based on the spectral representation of *X*, which is given by
where *Z*_{X}(*λ*) is a -valued random process on [−*π*,*π*] with mean zero and orthogonal increments [33]. In this representation, the complex-valued random increments d*Z*_{Xi}(*λ*) indicate the frequency components of the time series *X*_{i} at frequency *λ*. Similarly, we obtain d*Z*_{ε}(*λ*) for the innovation process *ε*. The autoregressive representation of *X* implies that the frequency components of the processes *X* and *ε* are related by the linear equation system
3.2where is the Fourier transform of the autoregressive coefficients *A*_{u}. The coefficient *A*_{ji}(*λ*) vanishes uniformly for all frequencies *λ*∈[−*π*,*π*] if and only if *X*_{i} is Granger non-causal for *X*_{j} with respect to the full process *X*. This means that the linear equation system (3.2) reflects the causal pathways by which the frequency components influence each other. More precisely, the complex-valued coefficient *A*_{ji}(*λ*) indicates how a change in the frequency component of the series *X*_{i} affects the frequency component of *X*_{j} if all other components are held fixed, that is, *A*_{ji}(*λ*) measures the direct causal effect of *X*_{i} on *X*_{j} at frequency *λ*.

There exist a number of frequency domain-based measures for Granger causality [34,35,17]. While they necessarily—in order to be correctly classified as Granger causality measures—vanish whenever the corresponding coefficients in the autoregressive representation are zero, they differ in the way deviations from zero are quantified. I also note that these measures usually are computed from the autoregressive estimates and therefore do not offer any advantage for testing for Granger causality.

Finally, I note that the autoregressive structure of *X* and hence the Granger causal relations among the components of *X* can be conveniently visualized by a path diagram [3,36]. In this graph—sometimes referred to as a *Granger causality graph* [20]—each node *i* represents one component *X*_{i}=(*X*_{i,t}) while directed edges *i*→*j* indicate that *X*_{i} Granger-causes *X*_{j} with respect to *X*. Additionally, dashed undirected edges are used to encode dependences among instances of the variables at the same point in time. More precisely, the edge *i*- - -*j* entails that *X*_{i} and *X*_{j} are contemporaneously correlated, that is, after removing the linear effects of the past *X*^{t−1}. For a VAR process, this is equivalent to *σ*_{ij}=0.

### (b) Sims causality

Unlike the concept of Granger causality, which has seen an increased interest in applications from neuroscience and other fields, the approach by Sims seems to be widely unknown to researchers outside economics. In the following, I will relate this concept to some measures known in time-series analysis, while, in the next section, I will show its usefulness for causal inference.

Recall that *X*_{i} does not Sims-cause *X*_{j} if *X*_{j,t+h} does not depend on *X*_{i,t} conditional on all variables in *X*^{t} except *X*_{i,t} for all *h*>0. Thus, Sims causality can be expressed in terms of *h*-step predictions of *X*_{j},
3.3Thus, *X*_{i} does not Sims-cause *X*_{j} if and only if for all *h*>0. The coefficients of the *h*-step predictor can be computed recursively from the autoregressive coefficients *A*_{u} by [37]
Expressing the coefficients only in terms of *A*_{u}, we obtain
This shows that accumulates the information flow from the direct link and all indirect links from *X*_{i} to *X*_{j}. This underlines our earlier remark that Sims causality can be viewed as a concept of total causality.

A different view on Sims causality can be obtained by noting that the *h*-step predictor can be derived by recursively substituting (3.1) for *X*_{t+h−1},…,*X*_{t+1} in the autoregressive representation of *X*_{t+1}, which yields
where *B*_{u} are the coefficients in the moving average representation of *X*. Substituting once more *X*_{t} by (3.1), we find that , that is, the coefficients determining Sims causality are those of the moving average representation of *X*. Thus, we find that—less formally expressed—*X*_{i} does not Sims-cause *X*_{j} if an independent shock in *X*_{i} does not affect *X*_{j} at later points in time and hence that the *impulse-response function* *IRF*_{ji}(*u*)=*B*_{ji,u} is zero at all lags *u*. More generally, the ACE of setting *X*_{i,t} to *x** on *X*_{j,t+h} can be expressed in terms of the IRF by
I note that this causal interpretation of Sims causality and the IRF is based on assumptions (I1) and (I2) in §2. These hold if the process *X* contains all relevant variables such that all lagged associations are truly causal and, second, there are no instantaneous causal relationships among the variables at the same point in time. If this latter assumption is violated, the additional instantaneous links also need to be taken into account. In econometrics, this is achieved by considering structural VAR processes and the corresponding orthogonalized IRF. In the discussion, I will briefly touch upon the problems that such instantaneous causalities pose for causal inference.

In the frequency domain, the concept of Sims causality is related to the well-known directed transfer function (DTF) by Kamiński & Blinowska [38]. For this, let be the frequency domain IRF. Then, the DTF measuring the influence of variable *X*_{i} on variable *X*_{j} at frequency *λ* is defined as
Thus, the DTF vanishes at all frequencies *λ* if and only if *X*_{i} does not Sims-cause *X*_{j} with respect to the process *X*. This clarifies the causal interpretation of the DTF as a frequency domain measure of Sims causality and not—as has been claimed previously [39,40]—of Granger causality.

Finally, I note that estimates for the IRF and related quantities such as the DTF are usually computed from the autoregressive estimates . An alternative approach has been proposed by Jordà [41] based on (3.3). In contrast to the commonly used approach that uses the asymptotic *δ*-method for statistical inference, the new method delivers estimates of the impulse response by regression methods.

## 4. Spurious causality

One problem with the empirical notion of Granger causality is its dependence on the set of variables included in the VAR model. It is well known that the omission of variables that affect two or more of the observed variables can induce conditional dependences among the observed variables that are wrongly interpreted as causal relationships by a Granger causality analysis. For instance, Porta *et al.* [7] have shown that respiration must be taken into account when analysing the causal relationships between heart period and systolic arterial pressure. In this section, I discuss criteria that help us to distinguish between such so-called *spurious causalities* and true cause–effect relationships.

A first step in this direction is the paper by Hsiao [42], who discussed causal patterns for vector time series of three variables *X*, *Y* and *Z*, say. The general idea is that direct causes—described by Granger's general definition—persist regardless of the background information used for the analysis, whereas indirect as well as spurious causes can be identified by either adding new variables to the analysis or removing already included variables. Comparing the Granger causal relations in the full trivariate model and a bivariate submodel, Hsiao distinguishes between the following cases:

*X*is a*direct cause*of*Y*if*X*Granger-causes*Y*in both the bivariate and the trivariate model;*X*is an*indirect cause*of*Y*if (i)*X*Granger-causes*Y*in the bivariate model but not in the full model and (ii)*X*Granger-causes*Z*and*Z*Granger-causes*Y*in the trivariate model;*X*is a*spurious cause of type II*for*Y*if (i)*X*Granger-causes*Y*in the bivariate but model not in the full model and (ii)*Z*Granger-causes*X*and*Y*in the trivariate model; and*X*is a*spurious cause of type I*for*Y*if (i)*X*Granger-causes*Y*in the full model but not in the bivariate model and (ii)*X*Granger-causes*Z*and*Z*Granger-causes*Y*in the trivariate model.

A generalization of this classification approach to cases with more than three variables seems very complicated owing to the increasing number of configurations that need to be considered. Even for three variables, the above-mentioned four situations do not cover all possible situations; an example of such a system not covered by the above-mentioned cases will be studied later in this section.

Furthermore, it is clear that Hsiao's definition of different types of causation suffers from the same defects as the empirical notion of Granger causality, since only the information of the three variables *X*, *Y* and *Z* is taken into account. For instance, a ‘direct cause’ in a trivariate model might turn out to be indirect or even spurious (of type II) when adding further variables. Therefore, the term ‘cause’ in the above-mentioned definitions must be interpreted with some caution, and it would be more correct to speak of direct, indirect and spurious Hsiao causality.

Despite these shortcomings, Hsiao's definitions of spurious causality are of much interest for the final task of causal identification, as these phenomena provide important information about the underlying true causal structure. Therefore, I first discuss the two types of spurious causality in more detail.

The intuitive understanding of spurious causalities is that omission of relevant variables induces associations among lagged variables that then—because of temporal precedence—are wrongly interpreted as causal relationships. If the relevant variables are added, the spuriously causal relationships vanish. This is expressed in the spurious causalities of type II, which can be detected by enlarging the set of variables. In order to avoid spurious causalities of this type, a multivariate VAR model with all available relevant variables included should be used. For spurious causalities that are induced by further latent variables, the above characterization however provides no help and they remain undetected.

In contrast, spurious causalities of type I are present in a large model and vanish only if the number of variables is reduced. According to the above-mentioned definition, a spuriously causal relationship of this type requires two causal pathways, namely one direct link and one indirect link via the third variable *Z* (figure 1*a*). However, as the causal relationship vanishes in the bivariate model, the effects of these two causal pathways on *Y* must cancel out and the total causal effect of *X* on *Y* is zero. This means that we can detect spurious causalities of type I by testing for Sims non-causality in the full model. This is illustrated in figure 2, which compares measures for bivariate and trivariate Granger causality as well as for trivariate Sims causality for three of the four cases considered by Hsiao. For the network with a spurious causality of type I, both the IRF and the bivariate autoregressive coefficients vanish.

Another far more plausible explanation for the phenomenon of spurious causality of type I is the network in figure 1*b* with an added fourth—unobserved—variable *U* that affects both *X* and *Y* . It has been shown elsewhere [3,43,20] that processes with autoregressive coefficients constrained by this graph show a spurious causality of type I without additional constraints on the parameters. In contrast, the cancellation of direct and indirect effects restricts the parameters to a lower-dimensional subset of the parameter space.

The main problem of causal inference is that in a larger network both types of spurious causalities might occur. Consider the network in figure 3*a*, with one latent variable *U*. Here *X* Granger-causes *Y* if *Z* is included in the analysis, owing to the spurious causality of type I. Because of the additional confounding by variable *Z*, variable *X* still Granger-causes *Y* if *Z* is omitted. According to Hsiao, *X* is a direct cause of *Z* even though the total causal effect is zero and manipulating *X* would be without consequences for *Y* . In contrast to Granger causality, which fails to detect this combination of spurious causalities of type I and II, the IRF indicates correctly that *X* is not Sims causal for *Y* with respect to (*X*,*Y*,*Z*).

Finally, I note that for other networks correct causal identification requires the analysis of submodels. Figure 3*b* provides an example where there is no causal link from *X* to *Y* but *X* Granger-causes as well as Sims-causes *Y* with respect to the trivariate process. Only a bivariate analysis shows that *X* does not Granger-cause *Y* (and the same holds true for Sims causality).

## 5. Causal identification

In the previous section, I described the problems in the empirical application of Granger and Sims causality. Despite the criticism that—in view of these problems—Granger causality is unsuited for causal inference, our examples have also shown that both Granger and Sims causality might contain some information about the network structure. In this section, I will review an approach for the (partial) identification of causal structures that uses these insights. For better readability, I avoid theoretical details such as graph-theoretic terminology and concentrate on the basic ideas; for a more detailed description of the approach, I refer the reader to the original articles [43,20].

The basic idea of the approach is to determine the set of all causal structures that are consistent with the empirically found Granger causal or Sims causal relationships. In contrast to the standard Granger causality analysis, we evaluate not only the full multivariate model, including all available variables, but also all possible submodels to detect spurious causalities of type I. Only if every causal structure that is consistent with the empirical findings from all these models features a causal link between two variables is this link identified as truly causal.

To motivate the approach, consider two variables *X* and *Y* such that *X* is Granger causal for *Y* . Without any further information, it is impossible to determine whether the empirically found association between *X* and *Y* is due to a true causal link or induced by some latent confounding variable. In contrast, if we have an additional variable *Z* such that *Z* is an indirect cause of *Y* in the sense of Hsiao, the link from *X* to *Y* must be truly causal. Similarly, if *Z* satisfies Hsiao's conditions for a spurious cause of type I for *Y* , the link from *X* and *Y* cannot be truly causal but must be induced by a confounding latent variable. I note that in both cases—without further information—nothing can be said about the nature of the link from *Z* to *X*.

Like similar approaches for ordinary multivariate data [10,13], the approach is based on two fundamental assumptions: the *causal Markov condition* and the *stability* or *faithfulness condition*. The causal Markov condition states that all dependences among the variables that are observed are due to the causal structure. This rules out time series that become dependent because of a common deterministic trend. The faithfulness condition states that also all independences among the variables are implied by the causal structure. In particular, this rules out that two or more causal links cancel each other out due to a particular choice of the parameters. This means that, of the two possible explanations for spurious causalities of type I, we accept only the second with an additional latent confounding variable. Consequently, detection of spurious causalities of type I allows identifying spurious causalities of type II without knowing the confounding variable.

The identification algorithm itself consists of two steps. In the first step, all potential direct causal links among the components of a time series *X* are determined. Here, a variable *X*_{i} is said to be a potential cause of another variable *X*_{j} if *X*_{i} Granger-causes and Sims-causes *X*_{j} with respect to all processes *X*_{S}, where {*i*,*j*}∈*S*. More precisely, one also needs to consider certain combinations of Sims and Granger causality; below I give the precise condition but refer for details to the original work. The network of identified potential causes can be visualized by connecting two nodes *i* and *j* representing time series *X*_{i} and *X*_{j} by a dotted arrow *i* *j* whenever *i* is a potential cause of *j*.

The second step consists of trying to identify the exact nature of a potential causal link—whether it is a true or a spurious causal link. To achieve this, I make use of the fundamental difference between indirect causes and spurious causes of type I. As we have seen in §4, if *X*_{i} indirectly causes *X*_{j} with mediating variable *X*_{k}, then the causal link via *X*_{k} becomes broken by including *X*_{k} in the analysis. In the case of a spurious causality of type I, we find the opposite behaviour: the causal link from *X*_{i} to *X*_{j} induced by the variable *X*_{k} vanishes if *X*_{k} is omitted. Thus, depending on whether *X*_{k} needs to be included or omitted to make *X*_{i} Granger non-causal for *X*_{j}, we can label *X*_{k} as a potential cause of *X*_{j} as either a true cause or a spurious cause. Correspondingly, the edge *k* *j* in the graph is replaced by *k*→*j* in the case of a true cause or by in the case of a spurious cause.

I have left out a few details such as contemporaneous associations, which are represented by dashed undirected edges (*i*- - -*j*). Furthermore, one needs to consider also certain combinations of Sims and Granger causality to identify potential causes. The exact condition is given below; for details, I again refer to the original papers.

### Definition 5.1 (Identification of adjacencies)

Insert

*a*- - -*b*whenever*X*_{a}and*X*_{b}are not contemporaneously independent with respect to*X*_{V}.Insert

*a**b*whenever the following two conditions hold:*X*_{a}Granger-causes*X*_{b}with respect to*X*_{S}for all*S*⊆*V*with*a*,*b*∈*S*;*X*_{a,t−k}and*X*_{b,t+1}are not conditionally independent given for some , all , and all disjoint*S*_{1},*S*_{2}⊆*V*with*b*∈*S*_{1}and*a*∉*S*_{1}∪*S*_{2}.

### Definition 5.2 (Identification of tails)

*Spurious causality of type I*: Suppose that*G*does not contain*a**b*,*a*→*b*, or . If*a**c**b*and*X*_{a}is Granger non-causal for*X*_{b}with respect to*X*_{S}for some set*S*with*c*∉*S*, replace*c**b*by .*Indirect causality*: Suppose that*G*does not contain*a**b*,*a*→*b*, or . If*a**c**b*and*X*_{a}is Granger non-causal for*X*_{b}with respect to*X*_{S}for some set*S*with*c*∈*S*, replace*c**b*by*c*→*b*.*Ancestors*: If*a*∈an(*b*) replace*a**b*by*a*→*b*.*Discriminating paths*: A fourth rule is based on the concept of discriminating paths. For details, I refer to Ali*et al.*[44].

Similar identification algorithms exist for the case of ordinary multivariate variables [10,13,45]. I further point out that the resulting graph is—even if all edges were classified as true causal or spurious causal—only identified up to Markov equivalence. This means that other causal structures, which imply the same set of Granger causal relations, are equally valid answers to the identification structure. As a consequence, it can be shown that directed edges in the graph cannot necessarily be interpreted as direct causal links, but that there must be at least one true causal pathway in the underlying causal structure.

For an illustration of the identification algorithm, I apply it to neuronal spike train data recorded from the lumbar spinal dorsal horn of a pentobarbital-anaesthetized rat during noxious stimulation. The firing times of 10 neurons were recorded simultaneously by a single electrode with an observation time of 100 s. The connectivity among the recorded neurons has been analysed previously by partial correlation analysis [46] and partial directed correlations [47].

For the analysis, I converted the spike trains of four neurons to binary time series. For the full as well as each trivariate and bivariate submodel, a VAR model of order *p*=50 was fitted and the associated Granger causality graph determined by a series of significance tests adjusted to have a false discovery rate of 5 per cent [48]. The resulting graphs are shown in figure 4. From these graphs, we first identify the causal adjacencies depicted in figure 5*a*; the adjacencies coincide with those of the Granger causality graph of the full model, which means that there are no spurious causalities of type I. Next, we try to determine the type of the edges by the rules of the identification algorithm. First, the graphs in figure 4*a* and 4*b* (or equivalently those in figure 4*b* and 4*g*) show that the first series causes the third series indirectly, which identifies the edge 2 3 as 2→3. For the edge 2 4, a similar reasoning is not possible because the edge 1 4 prevents separation of nodes 1 and 4. The type of the remaining two edges cannot be identified as there are no edges with arrowheads at node 1 and hence none of the rules of the algorithm can be applied. Thus, we obtain the graph in figure 5*b* as the final result of the identification algorithm with only one true causal link identified.

## 6. Discussion of challenges and problems

Studying the applied literature on Granger causality, we are confronted with conflicting views favouring pairwise analysis or instead a full multivariate approach or claiming that Granger causality is not suited for causal inference at all as opposed to giving Granger causality a causal interpretation without much further thought.

This review has shown that the truth is somewhat in between and—as is usually the case—more complicated than the simple alternatives suggest. Causal inference based on Granger causality is indeed legitimate, but in many cases provides only sparse identification of true causal relationships, that is, for most links among the variables it cannot be determined whether the link is truly causal or not. Moreover, instead of examining Granger causalities between pairs of variables or just in the full model, correct learning accumulates knowledge obtained from the large variety of possible submodels. This imposes feasibility constraints in the size of the networks that can be practically analysed. Summarizing our discussion shows that any analysis claiming full identification of the causal structure either must be based on very strong assumptions or prior information or—on closer inspection—turns out to be unwarranted.

Besides algorithmic and computation problems, there are further challenges to causal inference from time-series data:

—

*Uncertainty in causal inference*. In most applications, causal inference is based on multiple tests to establish either multivariate or bivariariate Granger causal relationships. A complete analysis of the causal structure requires the evaluation of a Granger causal relationship not only for the full multivariate process and pairs of series but also for all possible subprocesses. Consequently, the number of tests required to obtain the list of Granger non-causality relationships increases many times over. The wrong inclusion or exclusion of a directed edge at one level has not only a local effect but also a global effect owing to the classification of edges based on the global Markov property. I am not aware of any approaches that explicitly take into account this uncertainty, which limits the identification of causal structures (and hence effective connectivity) further.—

*Instantaneous causality*. In our discussion of causal inference, I have disregarded the possibility of the so-called instantaneous causality, that is, causal links between variables at the same point in time. The main problem with identification of instantaneous causality is the lack of directional information. In the case of Granger and Sims causality, the direction of the prima facie causal link is given by the temporal ordering. In the absence of such temporal precedence, the direction of the cause–effect relationship could be determined by additional subject matter information or by graph-based identification methods. So far, only a number of approaches exist for the identification of only the instantaneous causal structure [49,50,51,52,53].—

*Aggregation*. One possible explanation for instantaneous causality is the aggregation of variables over time (see the detailed discussion of instantaneous causality in [16]). For a correct evaluation of causal effects, it thus becomes necessary to include instantaneous causal structures in the modelling. A major problem arises in the case of feed-back relationships between variables, as the standard causal models for multivariate observations are based on acyclic directed graphs, which must not contain cycles. It is not clear how these situations can be graphically represented and modelled.—

*Non-stationarity*. The standard framework for Granger causality analysis is stationary VAR models. Recordings taken to measure brain activity in behavioural situations are likely to violate the assumption of stationarity. Although there exist approaches for fitting VAR models with time-varying coefficients, tests for Granger causality require some kind of aggregation of the time-varying coefficients over time. Another problem arises if the causal structure itself is not stable in time. I am not aware of any approaches that deal with these problems.

## Footnotes

One contribution of 13 to a Theme Issue ‘Assessing causality in brain dynamics and cardiovascular control’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.