## Abstract

An outstanding challenge in neuroscience is to develop theoretically grounded and practically applicable quantitative measures that are sensitive to conscious level. Such measures should be high for vivid alert conscious wakefulness, and low for unconscious states such as dreamless sleep, coma and general anaesthesia. Here, we describe recent progress in the development of measures of dynamical complexity, in particular *causal density* and *integrated information*. These and similar measures capture in different ways the extent to which a system's dynamics are simultaneously differentiated and integrated. Because conscious scenes are distinguished by the same dynamical features, these measures are therefore good candidates for reflecting conscious level. After reviewing the theoretical background, we present new simulation results demonstrating similarities and differences between the measures, and we discuss remaining challenges in the practical application of the measures to empirically obtained data.

## 1. Introduction

A key objective for consciousness science is to develop and test what can be called ‘explanatory correlates': neural processes that not only *correlate with* but also *account for* fundamental properties of conscious experience [1]. One such property is that conscious scenes are simultaneously *integrated* (i.e. they are experienced ‘all of a piece’) and *differentiated* (i.e. they are composed of many different parts such that each conscious scene is one among a vast repertoire of possible scenes). Having a measure of conjoined integration and differentiation (more generally, *dynamical complexity*) in neural dynamics could account for this fundamental property of consciousness, in much the same way that measures of synchrony and coherence may account for and not merely correlate with the binding of different modalities in visual perception [2,3]. Such measures would most readily apply to conscious *level* (a position on a scale from total unconsciousness as in brain death or coma to full alert awake consciousness) rather than conscious *contents* (the components or qualia comprising a given conscious scene), though extension to the latter case represents an important objective [4].

Over recent years, several measures of dynamical complexity have been proposed and related to consciousness. These include *neural complexity*, *causal density* and *integrated information* (*Φ*) [5–7]. However, convincing application of these measures to experimental data remains challenging. Our aim in this paper is to review the theory underpinning causal density (CD) and integrated information, emphasizing recent work encouraging practical application (we discuss neural complexity only briefly). We describe simulations investigating the relations between the different measures, and we discuss some remaining obstacles to their efficient and interpretable use in common neuroimaging contexts such as magneto/electroencephalography (M/EEG) and functional magnetic resonance imaging (fMRI).

### (a) A note on notation

We use a standard mathematical vector/matrix notation in which bold type generally denotes vector quantities and upper-case type denotes matrices or random variables, according to context (for random variables, lower case is used to indicate a particular realization). All vectors are considered to be *column* vectors. ‘⊕’ denotes *vertical concatenation*, so that for ** x**=(

*x*

_{1},…,

*x*

_{n})

^{T}and

**=(**

*y**y*

_{1},…,

*y*

_{m})

^{T},

**⊕**

*x***is the vector (**

*y**x*

_{1},…,

*x*

_{n},

*y*

_{1},…,

*y*

_{m})

^{T}, where ‘T’ denotes the transpose operator. Given random vectors

**and**

*X***, we denote by**

*Y**Σ*(

**) the**

*X**n*×

*n*matrix of covariances

*cov*(

*X*

_{i},

*X*

_{j}), and by

*Σ*(

**,**

*X***) the**

*Y**n*×

*m*matrix of cross-covariances

*cov*(

*X*

_{i},

*Y*

_{α}). We shall make use of the quantity 1.1 which we call the partial covariance of

**given**

*X***[8]. (If**

*Y***and**

*X***are both multi-variate Gaussian variables, then the partial covariance**

*Y**Σ*(

**|**

*X***) is precisely the covariance matrix of the conditional variable**

*Y***|**

*X***=**

*Y***, for any**

*y***). If**

*y*

*X*_{t}is a random vector in discrete time, we use to denote

*X*_{t}itself, along with

*p*−1 lags. Given the lag

*p*, we often use the shorthand for the lagged variable, and drop the subscript ‘

*t*’ if there is no confusion. Other notations will be introduced as they appear.

## 2. Causal density

### (a) Granger causality and univariate causal density

Causal density is a measure of the overall causal interactivity sustained by a system. It leverages the econometric concept of Granger causality (G-causality), which is a measure of causal influence based on time-series inference. According to G-causality, given variables *X* and *Y* , *Y* G-causes *X* if, in an appropriate statistical sense, *Y* assists in predicting the future of *X* beyond the degree to which *X* already predicts its own future. In the more general, conditional case [9], *Y* is said to G-cause *X*, conditional on *Z*, if *Y* assists in predicting the future of *X* beyond the degree to which *X* and *Z* together already predict the future of *X*.

Given a set of time series, G-causality is typically implemented using the framework of linear autoregression. To measure the G-causality from *Y* (‘predictor’ variable) to *X* (‘predictee’ variable) given *Z* (conditional variable), we compare the following multi-variate autoregressive (MVAR) models [10]:
2.1
Thus, the ‘predictee’ variable *X* is regressed firstly on the previous *p* lags of itself plus *r* lags of the conditioning variable *Z* and secondly, in addition, on *q* lags of the predictor variable *Y* (*p*,*q* and *r* can be selected according to the Akaike or Bayesian information criterion [11]). The magnitude of the G-causality interaction is then given by the logarithm of the ratio of the residual variances,
2.2
where the final term expresses G-causality in terms of partial covariances. The statistical significance of a G-causality value can be assessed either by a *χ*^{2} test on or by examining the *F*-statistic for the regressions (2.1), in either case using appropriate corrections for multiple comparisons (alternatively, permutation or bootstrap resampling can be used instead) [11].

Given a set of G-causality values among elements of a system ** X**, a simple version of causal density (CD) can be defined as the average of all pairwise G-causalities between elements (conditioning on all remaining elements),
2.3
where

*X*_{[ij]}denotes the subsystem of

**with variables**

*X**X*

_{i}and

*X*

_{j}omitted, and

*n*is the total number of variables. Causal density provides a principled measure of dynamical complexity inasmuch as elements that are completely independent will score zero, as will elements that are completely integrated in their dynamics. High values will only be achieved when elements behave somewhat differently from each other, in order to contribute novel potential predictive information, and at the same time are globally integrated, so that the potential predictive information is in fact useful [7,12].

### (b) Multi-variate G-causality and extended causal density

As with most time-series measures, G-causality is standardly assessed between single (univariate) variables, perhaps conditioned on a set of other variables. However, relevant causal interactions within a system may take place between *groups* of variables. For example, in neural systems, one may wish to examine causal interactions among ‘Hebbian’ ensembles of neurons [13] or, at a macroscopic level, among networks of regions-of-interest (ROIs) distributed throughout the brain. More generally, measured variables (observables) are constrained by methods of data acquisition and need not map cleanly onto explanatorily relevant decompositions of the studied system.

Fortunately, it is straightforward to extend G-causality to the multi-variate case in which G-causality interactions are assessed among sets of variables (** X**,

**,**

*Y***) rather than only among univariate variables (**

*Z**X*,

*Y*,

*Z*). Following Geweke [9], we can define

*multi-variate G-causality*(MVGC) as 2.4 where |⋅| represents the matrix determinant and |

*Σ*(

**)| is the**

*ε**generalized variance*of the residual covariance matrix

*Σ*(

**), which quantifies the volume in which the residuals lie. As we have discussed in detail previously [14], the determinant (generalized variance) formulation of MVGC has important advantages over an alternative formulation [15] based on the**

*ε**trace*(total variance) of the residual covariance matrix. In brief, the determinant formulation is fully equivalent to transfer entropy (see §2

*c*) under Gaussian assumptions, is invariant under a wider range of variable transformations, is expandable as a sum of standard univariate G-causalities, and admits a satisfactory spectral decomposition. Numerically, evidence indicates that it is just as stable as the trace formulation [14].

MVGC suggests an extension to CD in which G-causality interactions are assessed across bipartitions of a system. For a system ** X**, we define CD

_{k→r}(

**), as the average MVGC from a subset of size**

*X**k*to a subset of size

*r*, conditioned on the rest of the system, 2.5 where denotes the

*i*th of the distinct tripartitions of

**into disjoint subsystems of respective sizes**

*X**k*,

*r*and (

*n*−

*k*−

*r*). The BCD is then the average of CD

_{k→(n−k)}(

**) over predictor size**

*X**k*, 2.6 This quantity may provide a more principled measure of dynamical complexity than CD in virtue of analysing a target system at multiple scales.

^{1}As we explain below (see §2

*d*), it is closely related to the well-known ‘neural complexity' measure [5], which averages

*mutual information*across bipartitions.

^{2}

We offer two final remarks about causal density. First, because MVGC has a spectral decomposition, both CD and BCD can be evaluated within specific frequency bands, which could be useful in cases where such bands have distinct neurophysiological interpretations. Second, in any complex system, it is usually possible to sample only a subset of relevant variables, which can lead to spurious causal inferences arising from hidden common causes. One approach to this problem is to ‘partial out’ hidden influences (by analogy with partial correlation) by introducing an additional term into the Granger equations that is sensitive to correlations among residuals [14,16].

### (c) Transfer entropy

A common criticism of G-causality is that its standard implementation in terms of linear MVAR models apparently excludes sensitivity to nonlinear interactions. Nonlinear extensions of G-causality do exist (e.g. [17]), however they are often complex and unwieldy to apply in practice. An alternative framework is provided by *transfer entropy*, which is a measure of directed information transfer based on conditional mutual information [18]. Transfer entropy is defined by the difference in entropies
2.7
and quantifies, in a naturally nonlinear way, the degree to which knowledge of the past of ** Y** reduces uncertainty in the future of

**, conditional on**

*X***.**

*Z*Although it has long been recognized that G-causality and transfer entropy must be related, only recently has an equivalence been formally established [19]. For Gaussian variables, it turns out that a simple factor of 2 relates the two quantities,
2.8
The equivalence (2.8) rests on relations between conditional entropy, partial covariance and linear regression prediction error. The essential relations are as follows. Firstly, for Gaussian variables, conditional entropy is a function of the determinant of the corresponding partial covariance matrix [19],
2.9
where *n* is the dimension of ** X**. This follows from the conditional distribution of

**given any outcome for**

*X***being Gaussian with covariance matrix**

*Y**Σ*(

**|**

*X***), and the Gaussian entropy formula 2.10 Second, the partial covariance of**

*Y***given**

*X***is precisely the covariance matrix of the residuals of a linear regression of**

*Y***on**

*X***, 2.11 Note that the equivalence (2.11) holds for**

*Y**any*(stationary)

**and**

*X***, Gaussian or otherwise. Together, these expressions allow to be written in terms of linear regression residuals, and therefore to be related directly to .**

*Y*The equivalence between and is important because it implies that, for Gaussian variables, linear regression accounts for *all* the dependence among variables, further justifying CD as a measure of dynamical complexity (see [20] for a comprehensive review of nonlinear causality measures). Importantly, in the multi-variate case, the equivalence (2.8) holds for the preferred determinant version (MVGC) but not for the alternative trace version [14].

### (d) Neural complexity and its relation to causal density

The neural complexity of a system ** X** with

*n*elements is given by 2.12 where is the state of the

*j*th sub-system with

*k*elements and . This measure quantifies the extent to which the entropy of sub-systems is greater than the normalized entropy of the whole, where normalization is by the ratio of the size of the sub-system to the size of the whole. The expected differences for each sub-system size are summed.

A key difference between and causal density is that is concerned only with the stationary distributions of the states of the system and its parts, whereas causal density is concerned with predicting the present of a system based on its past. However, for Gaussian variables, it can be shown that BCD is equivalent to a modified version of and , in which entropies are replaced by the conditional entropies of present states given past states, i.e.
2.13
To see the equivalence, we rearrange in terms of bipartitions of ** X**. We label the bipartitions of

**such that is the**

*X**j*th bipartition with the smaller component consisting of

*k*elements. Then, we have 2.14 We assume that there are no hidden or exogenous elements affecting

**, so that, given the past**

*X*

*X*^{−}, the present state of elements of

*X*are independent, i.e. 2.15 We can now express in terms of , and hence , 2.16 where now is the

*j*th bipartition with consisting of

*k*elements. The direct equivalence between causal density and neural complexity is then given by 2.17

## 3. Integrated information

An alternative information-theoretic approach to measuring dynamical complexity has been developed by Giulio Tononi, under the banner of the ‘information integration theory of consciousness’ (IITC) [6]. The IITC identifies conscious level with the quantity of *integrated information* (*Φ*) generated by a system. Several versions of *Φ* have now been described. The first was conceived as a measure of the *capacity* of a system to integrate information [21], however, it did not take into account time or changing dynamics. A more recent version, *Φ*_{DM}, was designed to measure the information generated when a system transitions to one particular state out of a repertoire of possible states, to the extent that this information is generated by the whole system, over and above that generated independently by the parts [22]. However, *Φ*_{DM} is defined only for idealized discrete, Markovian (memoryless) systems (hence the subscript ‘DM’), an in-principle restriction that severely limits in-practice applicability because neural systems are often measured using continuous variables, and also have memory (i.e. dynamics that depend on more than just the previous state). Here, we describe two recent alternative measures, *Φ*_{E} (‘empirical *Φ*’) and *Φ*_{AR} (‘autoregressive (AR) *Φ*’), which overcome the limitations of *Φ*_{DM}, and which are generally applicable to time-series data [8].

### (a) Integrated information for stationary, continuous systems

The primary difference between *Φ*_{DM} and *Φ*_{E} (and *Φ*_{AR}) has to do with the probability distributions assumed to characterize the target system. *Φ*_{DM} measures information with respect to a hypothetical *maximum entropy* distribution, corresponding to the *potential* behaviour of a system (i.e. its capacity). By contrast, *Φ*_{E} measures information with respect to the *stationary distribution* describing the system's dynamics, reflecting the *actual* behaviour of a system. Explicitly, *Φ*_{E} is concerned with the (average) information generated by the current state *X*_{t} of the system about some past state *X*_{t−τ},^{3}
3.1
To measure the extent to which this information is integrated, we use the concept of *effective information* (*φ*), which refers to the information generated by the whole system, minus the information generated independently by the parts (sub-systems) [21]. Considering only bipartitions ,^{4} the effective information at a time scale *τ* is given by
3.2

*Φ*_{E} is then defined as the effective information with respect to the *minimum information bipartition* (MIB). The MIB, , is the bipartition that minimizes *φ* after normalization to penalize asymmetric bipartitions.^{5} Intuitively, the MIB can be thought of as the ‘informational weakest link’. Thus,
3.3
Note that the value of *Φ*_{E}[** X**;

*τ*] is given by the

*non-normalized*effective information.

Because we assume stationary statistics, *φ* and therefore *Φ*_{E} can be measured empirically from time-series data without needing a generative model. However, accurately estimating entropies directly from time series can be challenging. Fortunately, for Gaussian systems, *Φ*_{E} can be calculated straightforwardly from empirical covariance matrices. Equations (2.9) and (2.10) allow effective information to be written as
3.4
The partial covariances in equation (3.4) can be obtained by using equation (1.1).

Together, the above formulae permit the straightforward computation of integrated information from time-series data, an important step not possible for previous measures [6,21]. We emphasize that the construction of *Φ*_{E} is very similar to that of *Φ*_{DM}. The important differences are that (i) *Φ*_{E} uses the stationary, rather than maximum entropy distribution, (ii) *Φ*_{E} uses the average information generated, and so is state-independent, and (iii) *Φ*_{E} enables a choice of time scale (*τ*) over which integrated information is measured. These differences carry substantial implications beyond practical applicability. Most notably, *Φ*_{E} is a measure of *process*, whereas *Φ*_{DM} remains in part a measure of *capacity* or *potential*, in virtue of the maximum entropy distribution. As we discuss later (see §5) *Φ*_{E} (and also *Φ*_{AR}, discussed in §3*b*) corresponds to a Jamesian view of consciousness-as-process, and thus entails a departure from the IITC that interprets conscious level in terms of capacity.

### (b) Autoregressive *Φ*

As noted, for Gaussian variables, *Φ*_{E} can be computed efficiently from empirical covariance matrices. Explicit calculation of *Φ*_{E} for (stationary) non-Gaussian variables requires estimation of entropies directly from time series, which is computationally expensive. However, in such cases, we can still use the same formulae to calculate a quantity that remains readily interpretable in terms of integrated information. We have called this quantity *Φ*_{AR} (for AR *Φ*). To see why, recall the equivalence between linear regression prediction error and partial covariance (2.11), which allows us to rewrite equation (3.4) as
3.5
where *ε*^{Mk}, *k*=1,2, and *ε*^{X} are the residuals in the following regressions:
3.6
and
3.7
Note that, in the above regressions, the *past* of a variable (*τ* time steps ago) is regressed on its *present* value (contrast with equation (2.1)).^{6} Although the relation between covariance and entropy (2.10) holds only in the Gaussian case, the relation between linear regression prediction error and partial covariance (2.11) holds for any stationary distribution. Thus, we can take equation (3.5) to define a new version of effective information, *φ*_{AR}, applicable to both Gaussian and non-Gaussian systems. *Φ*_{AR} is then the non-normalized *φ*_{AR} across the bipartition that minimizes (normalized) *φ*_{AR}. For Gaussian systems, *Φ*_{AR} and *Φ*_{E} are equivalent, but for non-Gaussian systems, they may differ.

Note that *Φ*_{AR} can be computed directly from empirical covariance matrices (owing to equation (2.11)); explicit calculation of the regressions is not needed. However, it is the formulation in terms of linear regression that gives *Φ*_{AR} its meaning in terms of integrated information. Specifically, *Φ*_{AR} can be understood as a measure of how well the *present* of a system predicts its *past*, to the extent that these predictions improve over predictions based on the parts acting independently. When Gaussian conditions are satisfied, this interpretation becomes exactly equivalent to the interpretation of *Φ*_{E} in terms of information theory.

### (c) The variants and

Further measures of integrated information, and , can be defined, respectively, using the following alternative definitions for effective information [8]:
3.8
and
3.9
where *ε*^{Mk}, *k*=1,2, and *ε*^{X} are as in equations (3.6) and (3.7). Rather than being formulated in terms of mutual information, the effective information for (3.8) is the average Kullback–Leibler (KL) divergence between the total distribution for the past state and the product of the distributions of the past states of the parts, all conditioned on respective present states. Note that, for a maximum entropy stationary distribution, and *Φ*_{E} are equivalent; hence they are equally relatable to *Φ*_{DM}. is analogous to *Φ*_{AR}, and is equivalent to for Gaussian systems. In the examples presented below, all versions of *Φ* behave similarly.

## 4. Simulations

We present results from computing CD, BCD, *Φ*_{E} and (time scale *τ*=1), for some example Markovian Gaussian systems. The example systems are characterized by the MVAR(1) dynamics
4.1
where *X*_{t} contains *n* variables, *A* is the connectivity matrix, and each component of *E*_{t} is an independent Gaussian random variable of mean 0 and variance 1. We considered seven systems with *n*=8 and connectivity as shown in figure 1*a*–*g*; we refer to these systems as ‘1(*a*)’, ‘1(*b*)’ and so on. The corresponding values of CD, BCD, *Φ*_{E} and are given in figure 1*h*–*k*. These values were computed analytically, using the generative model (4.1) to obtain the necessary cross- and auto-covariance matrices.^{7}

The values of all measures mostly correspond with expectations for these examples. A ring of reciprocal connections (figure 1*c*) generates higher values than a ring of unidirectional connections (figure 1*b*), which itself generates higher values than an open chain of unidirectional connections (figure 1*a*). Also as expected, the homogeneous system figure 1*d* has a low value for all measures (*Φ*_{E} and are low compared to other reciprocally connected networks). Perhaps in contrast to expectations, adding sparse long-range ‘short-cut’ connections to a reciprocal ring (figure 1*e*–*g*), in the style of a so-called ‘small-world’ network [12,23,24], does not increase the value of any of the measures (compare with network figure 1*c*). (We also tested networks with *n*=16 nodes, finding essentially the same results. Small-world properties are of course more prominent in larger networks; given adequate computational resources, analysis of still larger networks may reveal differences.)

To examine general trends in behaviour of the measures as a function of connection density, we performed the following procedure 100 times (trials). Starting with a fully disconnected network of *n*=8 elements, in which each element has only a self-connection (strength 0.5), we filled in the (directed) connectivity matrix in a random order. Each time a connection was added, connection strengths were normalized such that (i) there was a constant total afferent (including self-connection) of 0.5 to each element and (ii) all connections to a given element were equal. We then computed CD, BCD, *Φ*_{E} and , as well as the average stationary instantaneous correlation between elements (the ‘correlation index’), assuming MVAR(1) dynamics (as above). Results can be analysed in multiple ways. First, for each given number of (non-self-) connections *k*, we plot the inter-trial mean of each measure (solid line, figure 2*a*–*e*). These plots show that both causal density measures exhibit peaks at intermediate values of *k*, whereas *Φ*_{E} and and the correlation index do not. Second, for each value of *k*, we plot the inter-trial maximum of each measure (dashed line, figure 2*a*–*e*). This leads to all measures exhibiting a peak at an intermediate value of *k*. (Note correlation index shows a peak because, for intermediate *k*, there is greater inter-trial variety in network architecture.) Finally, we noted, for each trial, the number of connections present when each measure peaked, and compared this with the point at which the network became connected (i.e. the point at which every element can be reached, on some path, from every other element). Figure 2*f* shows the mean of this for each measure, showing that, on a given trial, the expectation is that CD and BCD will peak after the network becomes connected but before correlation peaks; by contrast, *Φ*_{E} and will on average peak at about the same *k* as the correlation index. It is interesting that, for each measure, there is a difference between the mean peak (figure 2*f*) and the peak of the mean (figure 2*a*–*e*).

To assess the generality of these results, we performed an additional set of simulations on networks with *n*=12 and *k* randomly chosen connections (no self-connections). Connection strengths were first drawn from independent Gaussians with mean and standard deviation both equal to 0.2, then set as positive (i.e. excitatory) with 80 per cent probability (else negative, i.e inhibitory), and finally normalized to yield a connectivity matrix with spectral radius *ρ*=0.8.^{8} We note that for small *k* (roughly *k*<*n*/2), the obtained connection strengths frequently yielded zero spectral radius; such networks could not be normalized and were rejected. We calculated CD, *Φ*_{E} and using three lags. The number of connected components of the underlying graph and the correlation index for the MVAR(1) were also calculated for each network. For CD, 20 000 trial networks were generated for each *k*. For *Φ*_{E} and , there were 1000 trials for each *k*. Results are shown in figure 3. Figure 3*a* shows mean CD and correlation index plotted against connection density; figure 3*b* shows mean *Φ*_{E} and . The mean number of connected components is plotted on the right-hand axes (graphs become connected as the number of connected components approaches 1). We see that CD peaks (sharply) just prior to emergence of connectivity, whereas *Φ*_{E} and peak at higher connectivity levels. We note that, in general, for networks with disconnected components, *Φ*_{E} may be negative, while remains at zero.

Taken together, these simulations indicate that both CD and integrated information track non-trivial properties of network topology and dynamics, as suggested by their theoretical formulation. Consistent with intuition, CD is sensitive to a dynamical regime in-between independence of elements and strong correlation among elements (see also [12]). Integrated information, as measured by *Φ*_{E} and , also shows a dissociation from overall correlation, at least when networks are normalized by spectral density. The two measures also dissociate from each other, with CD (but not integrated information) showing clear peaks in both simulations, and with these peaks appearing at lower connection densities, corresponding to lower levels of overall correlation.

## 5. Discussion

We have described measures of dynamical complexity based on causal density and integrated information, which are easily applicable to time-series data and which are theoretically grounded in reflecting conjoined functional differentiation and functional integration. This grounding encourages their use as potential measures of conscious level, given neural data contrasting normal waking consciousness with unconscious (or lowered/abnormal) conscious conditions including dreamless sleep, general anaesthesia, coma, the vegetative state, absence epilepsy, somnambulism and perhaps even certain psychiatric disorders.

Measures of causal density are based on predictive ability among time series, operationalized using G-causality, and they attempt to capture the overall amount of causal interactivity exhibited by a system. We have described causal density measures, CD and BCD, which characterize, respectively, (i) overall causal interactivity between pairs of elements, conditioned on the rest of the system and (ii) overall causal interactivity between bipartitions of a system. Measures of integrated information reflect the extent to which the information generated by the whole system exceeds that generated by its parts. Our measures *Φ*_{E} and *Φ*_{AR} operationalize this concept for Gaussian and non-Gaussian systems, respectively, and unlike the previous measure *Φ*_{DM}, they are easy to apply to empirical time-series data. Our key innovations over *Φ*_{DM} are (i) to measure information with respect to the stationary (as opposed to maximum entropy) distribution (*Φ*_{E}) and (ii) to interpret integrated information in terms of the predictive ability of the present with respect to the past (*Φ*_{AR}). The variants and are equally valid as alternatives to *Φ*_{DM}; they differ from *Φ*_{E} and *Φ*_{AR} only by characterizing effective information in terms of average KL divergence rather than by mutual information. We emphasize that the simulation results we have presented pertain to *Φ*_{E} and , not to *Φ*_{DM}, though we have previously shown that the measures do behave qualitatively identically in simple examples where calculation of *Φ*_{DM} is tractable [8].

### (a) Comparison among measures

Causal density (CD, BCD) integrated information (*Φ*_{DM}, *Φ*_{E}, *Φ*_{AR}) and neural complexity embody theoretical differences that have empirical consequences as revealed in our simulations. At the theoretical level, measures of integrated information differ from measures of causal density most prominently in that the former are based on a value (of effective information, *φ*) across a single partition (the MIB), whereas the latter are based on values (of G-causality, ) averaged across the whole system. The former approach guarantees that a disconnected system will not score positive, but raises problems of stability in systems that have several partitions with similar normalized *φ* but different non-normalized *φ*. In previous work, we have discovered examples of such networks for which *Φ*_{E} changes abruptly as a single connection strength is varied continuously [8]. On the other hand, a partition-based measure ensures that the measured value does not reflect only generic properties of a system, such as overall connection strength. Indeed, we have shown that simple eight-element networks optimized for *Φ*_{E} exhibit highly heterogeneous topologies [8]. This difference aside, the measures have several theoretical connections. Leveraging an equivalence between G-causality and transfer entropy, we have shown here that BCD is equivalent (for Gaussian systems) to a version of based on (time-directed) conditional entropies. Hence, causal density can be thought of as a ‘transfer entropy density’. *Φ*_{AR} bears further relation to causal density since both measures are based on AR models, the former employing a model order of 1, with varying time lag, and the latter employing, in addition, a variable model order.

Our simulation results support the intuitions underlying causal density and integrated information, while emphasizing that the measures can behave differently. We examined two sets of simulations, both of which involved ensembles of networks parameterized by overall connection density, with connections arranged at random. The simulations differed primarily by using different normalization factors; the first set maintained a constant afference to each element, while the second set normalized by spectral radius (roughly, the overall level of feedback in the corresponding AR process). The first set found that, as connection density increased, CD and BCD peaked after the network became connected, but before the overall correlation among network elements reached its maximum. By contrast, *Φ*_{E} and either peaked later, more-or-less with the onset of maximum correlation (when measured by the maximum across each ensemble) or did not peak (when measured by the mean). Generalizing these observations, the second set showed that causal density again peaked prior to maximum correlation, though in this case, more sharply and also prior to the emergence of a single connected component. Information integration, by contrast, peaked at higher connection densities and exhibited a noisier profile, likely due to the instabilities incurred by a partition-based method. As well as exemplifying the intuitive behaviour of our measures, these simulations highlight the importance of selecting an appropriate normalization method to ensure that measures do not reflect only trivial network properties. Since there is no *a priori* ‘best’ normalization for a network considered an abstraction of a neural system (e.g. [26]), we decided to compare two plausible alternatives. Further work is needed to establish the general relations among normalization, network structure and dynamical measures.

Other groups have described related measures without explicitly relating them to consciousness. For example, Ay and colleagues compare the whole system to the sum of individual elements [27] in terms of information geometry and conditional temporal information, furnishing a link to . They show an interesting link between these ideas and timing dependent plasticity, raising the interesting possibility that neural plasticity may be an essential prerequisite for consciousness in virtue of shaping dynamical complexity properties [28].

### (b) Practical application

Despite the theoretical advances described in this paper, application of causal density and integrated information to neural data remains challenging for several reasons. For numerical ease (i.e. for calculation directly from empirical covariance matrices), the data must be covariance (wide-sense) stationary; i.e. means and variances must remain constant over time. However, neural data are often noisy and contaminated by artefacts and drift, introducing non-stationarities. Non-stationary data can sometimes be made stationary by *differencing* (i.e. *X*′_{t}=*X*_{t}−*X*_{t−1}), repeatedly if necessary (differencing can however change the interpretation of subsequent calculations). Alternatively, one can analyse shorter time windows, each of which may be locally stationary [29]. For some types of data (notably M/EEG), bandpass and/or notch filtering are commonly used in order to remove artefacts and drift. Although AR modelling is in principle invariant to filtering [30], the temporal correlations induced by convolution with a filter introduce challenges for accurate and stable estimation of AR models [8].

Neural signals are frequently nonlinear as well as non-stationary. While information-theoretic measures are naturally nonlinear, AR models are typically estimated by linear models. However, as we have shown (§2*c*; see [14] for an extended discussion), a Gaussian equilibrium state distribution implies a linear AR model; moreover, information-theoretic quantities typically depend on Gaussian distributions in order to be calculable in practice. Together, these points underline the importance of a Gaussian assumption. Fortunately, Gaussian approaches appear to be very powerful in neuroscience. For example, Pauls and colleagues have shown (in theory and when applied to preictal EEG) that a Gaussian approach can distinguish different states of nonlinear chaotic attractors in ways similar to Lyapunov exponents or nonlinear entropy rates [31,32]; they also show that Gaussian descriptions are sufficient to describe resting state activity as measured by fMRI [33]. These results build on earlier work indicating the utility of linear approximations, especially for modelling large-scale interaction in neuroscience [34].

Neural data relevant to conscious level are typically obtained from neuroimaging methods such as M/EEG and fMRI. Each method poses its own distinctive challenges. M/EEG signals offer high temporal resolution suitable for time-series analysis, however, they require non-unique inverse modelling to move from sensor space (at the scalp) to an underlying source space, conferring ambiguity on subsequent interpretations. The spatial resolution of EEG is also low as compared to fMRI, and additional challenges related to filtering have already been mentioned. fMRI contrasts with M/EEG by offering high spatial resolution at the expense of temporal resolution. The blood oxygen level dependent (BOLD) signal measured by fMRI reflects slow metabolic processes related to neural activity (these relations remain incompletely understood [35]), and is typically sampled of the order of 0.3–1 Hz. The correspondingly sparse time series pose challenges for accurate estimation of AR models and even of covariance matrices. Moreover, variability in haemodynamic responses in different brain regions and different subjects [36] may confound causal inference [37], undermining subsequently derived complexity measures. However, causal inference based on fMRI time series is an area of extremely active research [38], and promising new approaches are emerging, for example, embedding AR models into state–space models that include haemodynamic parameters [39] and integrating perturbational approaches using transcranial magnetic stimulation [40].

Our simulations have focused on small networks (*n*=8,12). One reason for this is computational feasibility. For *Φ*, there is an exponential growth in the number of partitions to consider as *n* increases (causal density is less demanding, scaling roughly as *n*^{2}). However, raw computer power is continuing to increase, raising the possibility of calculation of these measures for much larger systems. A second reason is statistical. When computing causal density or *Φ* from data, the number of parameters to estimate scales roughly as *n*^{2}. This is a problem, not for computational reasons, but because estimating a large number of parameters requires a correspondingly large amount of data (i.e. longer time series) in order to avoid overfitting, and to obtain reasonable constraints on the estimated parameters. However, a system will only remain statistically stationary for a limited time period, restricting the number of nodes that can be considered in practice. One approach to this problem is to impose priors on the estimated parameters, for example, via regularization or sparsification [41].

Even for small *n*, causal density and *Φ* may be informative about consciousness. Abundant evidence indicates that coarse-grained low-dimensional information can differentiate states and contents of consciousness. For example, when analysing EEG signals from subjects experiencing binocular rivalry, synchrony among a few well-separated electrodes indicates perceptual dominance [2,42]. Sleep, anaesthesia and other states involving global loss of consciousness also have neural signatures detectable at coarse-grained levels (see [43] for a review). We emphasize that the measures described in this paper are motivated because they propose to explain rather than merely correlate with consciousness [1,44], a relationship that may hold at multiple levels of coarse graining. Although (to our knowledge) neither causal density nor *Φ* have yet been convincingly applied to experimental data, there have been several attempts to apply neural complexity to EEG data, with mixed results [45–47]. These studies have also used relatively small *n* (e.g. *n*=18 in [45]); their divergence underlines the need for careful attention to details of practical application and may reflect such differences and/or a limitation of in depending exclusively on zero-lag covariance.

### (c) Consciousness and complexity

William James famously described consciousness as a ‘stream’ or ‘process’ [48]. The measures of dynamical complexity described in this paper—causal density, *Φ*_{E} and *Φ*_{AR}—are consistent with this view. Because these measures depend on the stationary statistics of the underlying neural dynamics, when considered as potential measures of conscious level they imply (i) the conscious level is constant during each stationary epoch in brain activity and (ii) the conscious level changes when functional connectivity changes, modifying the stationary statistics. Despite sharing a common ground in emphasizing dynamical complexity, this perspective is different from the one advocated by the IITC. According to the IITC, consciousness is best characterized as a ‘potential’ or ‘capacity’, as reflected by *Φ*_{DM} in virtue of its reliance on the maximum entropy distribution [6]. A challenging implication of this property is that conscious experience is modulated by states that the brain never in fact encounters, if these states are part of the maximum entropy but not the stationary distribution. Another feature of the IITC is that integrated information is *identified* with consciousness, implying a relation of sufficiency. In our view, dynamical complexity (information integration) may be necessary, but is unlikely to be sufficient for generating consciousness. A challenge to the IITC in this context is that all measures of integrated information so far described exhibit instabilities due to normalization (see above, and [8]), undermining the ascription of physical meaning to the quantity. Furthermore, *Φ* and causal density are not invariant under changes of coordinates or state–space representations. On the one hand, this further emphasizes that the measures should not be identified with consciousness; on the other, this raises the interesting possibility that there may be a particular state–space description that maximizes causal density or *Φ* [8], and which may have neurobiological relevance in doing so.

The differences among the various measures described in this paper underscore the importance of operationalizing intuitive properties of a target phenomenon (in this case, consciousness). As we have seen, a similar intuition (conjoined functional integration and functional differentiation) can be operationalized in significantly different ways, carrying major implications not just for practical applicability but also for underlying theoretical principles. Future challenges lie in continuing this programme of operationalization for other fundamental properties of consciousness, such as the existence of a first-person perspective, the shaping of conscious contents by mood, volition and agency, and the sense of subjective reality (presence) [44]. At the same time, additional theory and modelling is needed to fully disclose the relations among integrated information, causal density and neural complexity, and to understand how these quantities are affected by embedding a neural system within a sensorimotor loop involved in generating behaviour. But the most pressing concern is to close the gap between theory and practice, to examine whether theoretically grounded measures, such as those described here, offer additional empirical purchase in virtue of, and not in spite of, their theoretical properties.

## Acknowledgements

A.K.S. and A.B.B. are supported by EPSRC Leadership Fellowship EP/G007543/1 to A.K.S. A.K.S. and L.B. are supported by the Dr Mortimer and Theresa Sackler Foundation. We are grateful to four anonymous reviewers for their useful comments, which helped improve the manuscript.

## Footnotes

One contribution of 11 to a Theme Issue ‘The complexity of sleep’.

↵1 A further extension to a full ‘tripartition’ causal density is described in [14].

↵2 This measure has also been referred to as Tononi–Sporns–Edelman complexity.

↵3 Since mutual information is symmetric in its two arguments, this can equally well be read as the information generated by the past state about the current state.

↵4 The formalism applies straightforwardly to general partitions, see [8].

↵5 The normalization factor is determined by the smallest stationary entropy of a subsystem [8].

↵6 Note that, by symmetry of mutual information,

*φ*could be equivalently expressed in terms of regressions of the present state on the past state.↵7 For the restricted regressions for computing CD and BCD, we used five lags, which was sufficient to approximate the infinite lag limit.

↵8 The spectral radius of a (square) matrix is the maximum of the absolute values of its eigenvalues [25]. Intuitively, it reflects the overall level of feedback in the corresponding dynamical process.

- This journal is © 2011 The Royal Society