## Abstract

Understanding the operations of neural networks in the brain requires an understanding of whether interactions among neurons can be described by a pairwise interaction model, or whether a higher order interaction model is needed. In this article we consider the rate of synchronous discharge of a local population of neurons, a macroscopic index of the activation of the neural network that can be measured experimentally. We analyse a model based on physics’ maximum entropy principle that evaluates whether the probability of synchronous discharge can be described by interactions up to any given order. When compared with real neural population activity obtained from the rat somatosensory cortex, the model shows that interactions of at least order three or four are necessary to explain the data. We use Shannon information to compute the impact of high-order correlations on the amount of somatosensory information transmitted by the rate of synchronous discharge, and we find that correlations of higher order progressively decrease the information available through the neural population. These results are compatible with the hypothesis that high-order interactions play a role in shaping the dynamics of neural networks, and that they should be taken into account when computing the representational capacity of neural populations.

## 1. Introduction

Simultaneous recordings of the activity of individual neurons placed within local networks in the central nervous system show that a large fraction of pairs of neurons are correlated. The probability of observing near-simultaneous spikes from two different neurons is often significantly higher than the product of the probability of observing the individual spikes from each neuron (Li 1959; Perkel & Bullock 1969; Mastronarde 1983).

The ubiquitous presence of correlations among the activity of different neurons has raised the question of what is the impact of correlation upon neural population coding of sensory stimuli (see for recent reviews Salinas & Sejnowski 2001; Averbeck et al. 2006). Although the potential role of correlations in neural population codes is still unclear and robustly debated (Shadlen & Movshon 1999; von der Malsburg 1999), theoretical studies have suggested that correlations can profoundly affect the information transmitted by neural populations. On the one hand, correlations may play a crucial and beneficial role in the neural code, by increasing the information content of neural populations (Oram et al. 1998; Abbott & Dayan 1999; Pola et al. 2003), by serving as a scheme for implementing associations and binding of features (von der Malsburg 1999) or by implementing strategies for error correction (Schneidman et al. 2006). On the other hand, correlations may reflect correlated noise arising from the structure of cortical circuits, and may act as a detrimental, limiting factor to the representational capacity of neural populations (Zohary et al. 1994; Mazurek & Shadlen 2002). Whether correlations give a positive or negative contribution depends on the precise details of the correlational structure of neural activity (Oram et al. 1998; Abbott & Dayan 1999; Pola et al. 2003). Determining the precise structure of correlated activity is thus crucial for the progress of systems neuroscience.

A particularly important question about the structure of correlated activity of large neural populations is whether it can be described by considering only pairwise interactions, or if genuine high-order interactions between neurons are present. The understanding of the role of high-order interactions is important for several reasons. First, most studies of population codes are based on the recording of neural pairs and of pairwise correlations (Panzeri et al. 1999; Nirenberg et al. 2001; Panzeri & Schultz 2001; Petersen et al. 2001; Montani et al. 2007). Pairwise studies can only inform about the behaviour of large populations if higher order interactions are absent. Second, the presence of high-order interactions has implications on the understanding of the functional organization of neural networks in the brain (Martignon et al. 2000), because high-order interactions are compatible with information transmission by activation of tightly connected cell assemblies (Harris 2005). Third, understanding which is the minimal order of interaction sufficient to describe correlations among neurons is crucial to develop simple but effective models of decoding of neural population activity (Nirenberg & Victor 2007). This question is only now beginning to be addressed both at the theoretical level (Bohte et al. 2000; Nakahara & Amari 2002; Amari et al. 2003) and the experimental level (Schneidman et al. 2006; Shlens et al. 2006; Tang et al. 2008).

In this study we evaluate the presence of high-order correlations in the somatosensory cortical network, by investigating whether the observed probability of synchronous firing to a given sensory stimulus can only be explained by considering high-order interactions, and whether such high-order interactions play a role in the transmission of information about the stimuli.

## 2. Information geometry and the probability of synchronous discharge in a homogeneous neural population

We consider a population of *N* neurons whose activity is simultaneously observed during a specified short time window of size Δ*t* following the presentation of a sensory stimulus *s* taken from a set of *S* different stimuli. We represent neuronal population activity by a binary vector ** x**=(

*x*

_{1},…,

*x*

_{N}) in the space

*X*of all binary vectors of length

*N*, where

*x*

_{i}=0 if neuron

*i*is silent in some time window and

*x*

_{i}=1 if it is firing one or more spikes. The probability distribution

*P*(

**|**

*x**s*) of observing population response conditional to the presentation of stimulus

*s*can be expressed using different coordinate systems. The most obvious way of characterizing such a distribution is by specifying the 2

^{N}−1 individual probability values; these are called the

*p*-coordinates. Alternatively, the probability can be determined by the 2

^{N}−1 marginal probability values; these are called the η-coordinates (Amari 2001). Provided

*P*(

**|**

*x**s*)≠0 for any

**, any such distribution can be expanded in the so-called log-linear model, or θ-coordinates system (Martignon et al. 2000; Amari 2001; Nakahara & Amari 2002): 2.1 where the 2**

*x*^{N}−1 different coefficients uniquely determine the distribution and are, at least in principle, stimulus-dependent (although in the following their stimulus-dependence will be dropped for notational simplicity). The use of this coordinate system to study probabilities and interactions was pioneered in the early 1980s by Amari and co-workers (Amari 1980, 1982), and then was later refined by the same authors (Amari & Nagaoka 2000; Amari 2001); in part thanks to the influential work of Curado & Tsallis (1991) in developing a generalized theory of statistical mechanics. In this article, we will use the above θ-coordinate system because (as demonstrated in Amari (2001) and discussed in the next section) it is the most natural coordinate system to study interactions between variables.

In order to simplify the analysis, and following previous theoretical work (Bohte et al. 2000; Amari et al. 2003), we will make a strong assumption about the neural population. We assume that the neural population is a fully homogeneous pool; that is all the parameters characterizing single neuron properties and interactions between any group of neurons do not depend on the precise identity of the considered neurons, but only on the number of neurons considered. With this assumption the probability distribution is now characterized by only *N* parameters. Because of the symmetry of the population, all the θ-coordinates of a given order, *k*, are equal and can be represented by θ_{k}. For example, all interaction coefficients at order 3, , are equal to a single parameter that we indicate by θ_{3}. Under the homogeneous pool assumption, equation (2.1) becomes
2.2

Since the neurons are identical, the probabilities of all responses with *m* neurons firing are equal to each other. Thus, from the conditional distribution *P*(** x**|

*s*) of the population response conditional to stimulus

*s*, it is convenient to extract a simpler but still relevant probability distribution: the distribution

*P*(

*m*|

*s*) of the number

*m*from the space of neurons simultaneously firing in response to stimulus

*s*during the considered post-stimulus time interval. In the rest of the paper, we will denote

*m*as the rate of coincident firing (because it is proportional to the fraction of active neurons at any given time). Because of the homogeneity assumption,

*P*(

**|**

*x**s*) and

*P*(

*m*|

*s*) are simply related by combinatorial factors and

*P*(

*m*|

*s*) is given by the following: 2.3 where

*X*

_{m}is the set of all vectors

**containing exactly**

*x**m*cells firing. Hence, equation (2.1) becomes 2.4 where θ

_{i}represents the effect on the log-probability of interactions of order

*i*in the neuronal pool. The marginals η

_{m}, which are the probabilities of any

*m*particular neurons firing at the same time, are (Bohte et al. 2000) 2.5

The probability of the number of neurons coincidentally firing is of physiological interest, because the number of near-coincident inputs to a cell postsynaptic to the considered neural population is likely to be a key factor in determining the probability of firing the postsynaptic cell (Softky 1995; König et al. 1996). Thus, the probability of coincident firing is likely to play a role in the actual information transmission as well as in the information representation.

The assumption of a homogeneous neural pool is of course an oversimplification of the properties of real neural networks and strongly limits the domain of applicability of this formalism. It is however crucial to allow us to robustly study the response probabilities of relatively large neural populations (a few tens of neurons) at fixed stimuli. The analysis of tens of neurons would be more problematic when considering the full non-homogeneous model (equation (2.1)), because of both computational problems (Martignon et al. 2000) and data-sampling issues (Panzeri et al. 2007) related to the larger number of parameters to be estimated in the non-homogeneous model. Fortunately, the neural populations to which we will apply our analysis are relatively well described by the homogeneous-pool assumptions, as we will discuss below.

## 3. Investigating the order of interaction through the maximum entropy principle

A rigorous way to investigate the effects of different orders of interaction is provided by the technique of maximum entropy (ME), which was originally introduced in statistical physics (Jaynes 1957), and is now beginning to be used in neuroscience (Martignon et al. 2000; Schneidman et al. 2006; Shlens et al. 2006; Tang et al. 2006; Montemurro et al. 2007; Nirenberg & Victor 2007).

The idea of the ME principle is to first fix some constraints that are of interest. We then seek the simplest, or most random, distribution subject to those constraints. This removes all types of correlation or structure in the data that does not result from the constrained features. Since entropy is a measure of randomness, looking for the most random distribution corresponds to looking for the distribution with ME.

Here, we use the ME principle to address the problem of what is the order of interactions among neurons which is sufficient to describe the probabilities of neural response. We will consider the distribution *P*^{(k)}(** x**|

*s*) with ME within the class of all distributions with the same marginals up to order

*k*as the real measured distribution

*P*(

**|**

*x**s*). The ME condition ensures that, though interactions of up to order

*k*are preserved, there are no higher order interactions present. We can then compare these ME models for different orders to the real measured distribution. In practice, the comparison will be done on the lower-dimensional probability distribution of coincident firing

*P*

^{(k)}(

*m*|

*s*) and

*P*(

*m*|

*s*), which under the homogeneity assumption are univocally related to

*P*

^{(k)}(

**|**

*x**s*) and

*P*(

**|**

*x**s*) through simple combinatorial factors, as given in equation (2.3).

The log-linear form using θ-coordinates (equations (2.2) and (2.4)) provides a convenient framework with which to obtain the ME distributions. The general form of the ME solution subject to constraints (Cover & Thomas 1991) has the same form as the log-linear model, and it has been shown (Amari et al. 2001) that, subject to constraints on the marginals of the distribution of up to order *k*, the ME solution is given by equation (2.2) with the model truncated to include only θ’s of up to order *k*. Thus, it is possible to leverage the coordinate systems described above and the transformations between them to efficiently compute the ME solutions.

For a given order *k*, we compute the ME solutions as follows (more details are given in Ince et al. 2009). We start by matching interactions up to order *k* to those of the measured distribution by setting the low-order η-coordinates of the ME solution to equal those of the measured distribution. Then, following Amari (2001) and Amari et al. (2003), the ME solution among the distributions with the appropriate low-order marginals is found by setting the high-order components of the θ-coordinates to zero.

As shown in Amari (2001) and Amari et al. (2003), by enforcing both of these constraints simultaneously, we obtain a set of simultaneous equations. The coordinate transformation from *p*- to η-coordinates is given by equation (2.5) and denoted as . The coordinate transformation from θ- to *p*-coordinates is given by equation (2.4) and denoted as . The ME requirement is enforced by setting where are the θ-coordinates of the ME model. The ME distribution is then completely determined by the . These can be obtained through numerical solution of
3.1
where η_{k−}={η_{i}}_{i≤k} of the measured distribution. This is a system of *k* equations in *k* unknowns. The Jacobian of this function can also be obtained analytically allowing efficient solution using a range of numerical optimization methods. Here, we solve it by employing a least-squares approach using a Levenberg–Marquardt algorithm (Jones et al. 2001). Using this we are able to solve for orders up to approximately 10 for populations of approximately 30 cells in a few minutes on a laptop computer. Solutions of all orders are possible, but computation time grows exponentially with the order considered.

It is important to note that we maximize the entropy of the distribution of the population response ** x**, over the probabilities defined on the space

*X*and given the constraints on marginals of up to order

*k*. We do not maximize the entropy of the distribution of the rate of coincident firing

*m*. This is correct because we want to impose no interactions among neurons apart from those fixed by the marginals of up to level

*k*, and the interactions among neurons are defined in the population response space

*X*. The entropy of the stimulus-conditional population response

*H*[

**|**

*X**s*] is not equal to the entropy of the rate of coincident firing

*H*[

**|**

*M**s*]. In addition, the relationship between the two is not monotonic, so maximizing

*H*[

**|**

*X**s*] does not result in

*H*[

**|**

*M**s*] being maximal. However, a relationship between the two entropies can be derived using equation (2.3): 3.2 Since the relationship is not monotonic, it is possible that a distribution which has ME on the population responses

**is not a ME distribution of the rate of coincident firing**

*x**m*. For example, as noted in Amari et al. (2003), the model where all neurons are independent from each other is a model of ME given the single-neuron marginal probability but it leads to a fully concentrated distribution of the number of coincidentally active neurons in the large

*N*limit.

## 4. Predictions from theoretical analyses of the distribution of synchronous discharge in homogeneous populations

Recent seminal theoretical articles using the ME principle (Bohte et al. 2000; Amari et al. 2003) have begun to elucidate the effect of high-order interactions on the probability of the number of synchronously firing neurons, *P*(*m*|*s*). Particular important predictions come from the work of Amari et al. (2003), in which the authors consider the behaviour of *P*(*m*|*s*) in the thermodynamic (large *N*) limit. They found that, in the absence of correlations, *P*(*m*|*s*) is concentrated around its mean owing to the central limit theorem. They analysed what conditions are required to obtain a widespread distribution, in which (even in the thermodynamic limit) different numbers of neurons simultaneously firing are possible. They found that even when pairwise or third-order interactions are considered, the concentration is not resolved. Weak interactions at all orders are needed to obtain widespread distributions.

These predictions assign a strong role to high-order interactions for all neural systems exhibiting widespread distribution of synchronous firing. Since these predictions are obtained in the thermodynamic limit, it is difficult to test these predictions on real data because the number of simultaneously recorded neurons in a typical experiment is small (up to a few tens of neurons at best). However, it is interesting to consider whether real data tend to produce concentrated or widespread distributions, and whether observed widespread distributions require high-order interactions in order to be explained. We address this question directly in the next section.

## 5. The role of high-order correlations in shaping synchronous discharge in somatosensory cortex

We apply the techniques described above related to a pooled population of neurons recorded from the whisker representation in the somatosensory cortex of urethane-anaesthetized rats. The dataset (previously published in Arabzadeh et al. 2003, 2004) consists of 24 simultaneously recorded neural clusters, each sampled with a different electrode with a minimal inter-electrode distance of 400 μm. Spike times from each electrode were determined by a voltage threshold set to a value 2.5 times the root mean square voltage. Since it was not possible to sort well-isolated units from each channel, spikes from the same recording channel were considered together as a single neural cluster. It has been estimated that, under these recoding conditions, each cluster captured the spikes of approximately two to five neurons located near the tip of the electrode (see Petersen & Diamond 2000). Neural activity was recorded in response to stimulation (with a piezoelectric wafer controlled by a voltage generator) consisting of sinusoidal whisker vibrations, each defined by a different value of vibration velocity and delivered for 500 ms (see Arabzadeh et al. (2004), for full details). Thirteen different values of vibration velocity were tested, ranging between *A**f*=0.15 mm s^{−1} and *A**f*=47.7 mm s^{−1}. Each value of vibration velocity was treated as a different stimulus *s* (there were 13 stimulus classes in total). The number of recorded repetitions for each stimulus (called ‘trials’ in neurophysiology), from which the probability of response at fixed stimulus is determined, varied between a minimum of 200 and a maximum of 1400 across the stimulus classes. We note that this dataset is a convenient one for studying a neural population under the homogeneous-pool assumption. In fact, it was found that (i) all the neurons analysed here respond with the same type of profile to velocity (Arabzadeh et al. 2003) and (ii) when considering pairs of neurons, neglecting the label of which neuron fired which spikes (which is equivalent to transforming the response ** x** into the response

*m*) did not lead to any significant information loss (Arabzadeh et al. 2004), which suggests that non-homogeneities are negligible as far as information transmission is concerned.

We measure the neural responses as follows. We first select a post-stimulus window in which to measure the neural response. It has been shown (Arabzadeh et al. 2004) that the majority of the information is transmitted very early post-stimulus onset (typically between 5 and 30 ms). We therefore concentrate on data taken from these early, highly informative windows. In each trial, the population response ** x** is computed as follows. For each recording channel, we set the response to 1 if at least one spike occurs in the time window, and 0 otherwise. The number

*m*of neural clusters coincidentally firing is simply computed as the number of clusters firing at least one spike in the considered window.

We use these data to study the shape of the distribution of the number of clusters simultaneously firing at fixed stimulus, and the order of neural interactions needed to describe this distribution. We note that some previous studies (Schneidman et al. 2006) have focused on the overall probability of response to many different stimuli. However, this has the potential problem that the resulting correlation may arise both from correlations in the stimulus and in correlations arising from actual neural interactions, and it is difficult to separate them (Nirenberg & Victor 2007). Here, we have decided to consider distributions at fixed stimuli to ensure we only investigate interactions of neural origin.

We first consider whether the distributions of the number of clusters simultaneously firing at fixed stimulus are widespread or concentrated. Distributions conditional to one particular stimulus (velocity=2.66 mm s^{−1}; 1400 trials available) are shown in figure 1, for different values of the size of the post-stimulus window used to measure the response. For all windows considered, the distribution is clearly widespread, with no concentration around a single value. Choosing a larger window, we observe a higher expectation value of the number of neurons firing, but the distribution remains widespread. These results and trends apply to all 13 stimuli considered (data not shown).

We next consider whether the observed widespread distributions need high-order interactions among neural activity to be explained. We investigate this issue by applying the ME algorithms described in §3 to the stimulus-conditional neural response probabilities and considering ME solutions of various orders.

Figure 2 reports a comparison between the real distribution of synchronously firing clusters in response to stimulus velocity=2.66 mm s^{−1} and the corresponding ME models at orders *k* between 1 and 5. The first- and second-order models provide bell-shaped distributions which are a very poor approximation to the measured distribution. It is clear that third order is a better approximation than the first- and second order, and at fourth and fifth order the ME model becomes difficult to distinguish from the true distribution. To quantify the goodness-of-fit of the ME models, we used standard χ^{2} statistics. The ME model at order *k*=1,…,3 had to be rejected at *p*=0.05. Models of order 4 and higher were not rejected at *p*=0.05. The results shown in figure 2 are a good description of the typical behaviour of the dataset. Considering all probabilities of coincident firing to all the 13 stimuli, 11 stimulus conditional distributions needed at least order 3 interactions to fit the real data (*p*=0.05), 8 needed at least order 4 (*p*=0.05), and 6 needed at least order 5 (*p*=0.05). The only two distributions that could be fit by a model of order two (*p*=0.05) were those with fewer number of trials (those with less statistical power).

## 6. Effect of interactions on somatosensory information encoding

Determining the presence of high-order interactions suggests that they cannot be neglected in models of information transmission, but it does not tell how much these correlations are important. To quantify this, we next compute the information between the stimulus and the population activity, and we compare it with that derived from the ME models.

The mutual information between the stimuli and the neural population activity is defined as follows:
6.1
where *H*(*X*) and *H*(*X*|*S*) are the response entropy and noise entropy, respectively:
6.2
6.3
where in the above . Note that, because of the homogeneity assumption, and because of the data processing inequality, the information about the stimuli *I*(*S*;*X*) carried by the population response is equal to the one carried by the rate of coincident firing *I*(*X*;*M*), although as previously discussed the entropies are different.

We investigate the impact of interactions at a given order *k* by calculating the mutual information that would result from a system exhibiting the probability distributions obtained from the ME solution, as follows:
6.4
where *H*^{(k)}(*X*) and *H*^{(k)}(*X*|*S*) are the response and noise entropies, respectively, of the *k*th order ME model. These entropies are obtained by replacing *P*(*X*|*s*) and *P*(*X*|*s*) with *P*^{(k)}(*X*|*s*) and *P*^{(k)}(*X*|*s*) in equations (6.2) and (6.3), where *P*^{(k)}(*X*|*s*) is the ME solution preserving up to *k*th order marginals equal to *P*(*X*|*s*) and . Then
6.5

*I*^{(k)}(*S*;*X*) was computed as follows. First, we obtain the homogeneous ME solution, *P*^{(k)}(*X*|*s*), for each order of interest and for each stimulus-conditional response. Then, from each of these stimulus-conditional ME solutions, we simulate data with the same number of trials as available in the experimental data (this is different for each stimulus). These trials are generated using inverse transform sampling. This is done to ensure a fair comparison between the measured data and the generated data; any bias effects should affect both equally. Bias is corrected for using the quadratic extrapolation method (Strong et al. 1998) from the Pyentropy library (Ince et al. 2009). The values obtained are averaged over 1000 repetitions to remove any trial-to-trial variation from the inverse transform sampling step.

Figure 3 shows the effect of including higher order interactions on information. Correlations have a limiting rather than an enhancing effect in this neural system. The first- and second-order ME models convey significantly higher information than the true system. The third-order information is significantly lower than the second-order one, suggesting that correlations of order higher than 2 still have a sizeable effect on limiting information. Correlations of order higher than 4 (though present, see previous section) do not influence information to a significant amount.

The fact that correlations of increasing order limit the information may appear surprising at first glance, but can be explained by considering the variance of the distributions of the rate of coincident firing in figure 2. The rate of coincidences from low-order ME solutions have the same mean as the true distribution, but are much more concentrated. As a consequence, the noise entropies of the distributions of coincident firing *H*(*M*|*S*) (obtained from equation (6.3) by replacing *P*(*m*|*s*) in lieu of *P*(** x**|

*s*)) also increase with the interaction order (figure 4). Since, as explained above, the only informative variable in homogeneous population activity is the rate of coincident firing

*m*, the information value has to decrease as the interaction order

*k*increases.

Figure 4 reports the noise and response entropies of the population activity as a function of the interaction order considered. Because of the constrained maximization, the noise entropies have to decrease with the interaction order. However, the noise entropy *H*(*X*), which is made up mixing all stimulus-conditional responses, is not constrained to necessarily decrease as fast as *H*(*X*|*S*) with the interaction order (Schultz & Panzeri 2001). In fact, *H*(*X*) decreases more quickly with *k* than *H*(*X*|*S*) does (figure 4), thus leading to an overall information decrease with increase of order *k*.

It is interesting that the mutual information of the system is already well approximated by models containing interactions of up to order 3. Interactions of order higher than 3, though statistically significant, do not appear to play a qualitatively important role in information transmission. This is a significant simplification since it greatly reduces the parameters required to describe the system. While it is still challenging to sample up to third-order marginals, it is a much more tractable problem than the case where all orders of interaction must be accurately determined.

## 7. Discussion

Here, we have considered, to our knowledge for the first time, how a homogeneous-pool model containing interactions of arbitrary order (Bohte et al. 2000; Amari et al. 2003) fits real distributions of the rate of coincident firing in real in vivo neural networks. We found that, when considering stimulus-conditional distributions of the rate of simultaneous discharge in populations including tens of recorded neural clusters, interactions of order higher than two are typically needed to describe the distributions. Thus, interactions of order two may not always be sufficient to describe the correlational structure of neural activity, as recently reported (Schneidman et al. 2006; Shlens et al. 2006; Tang et al. 2008). In addition to studying the effects of interactions on the mutual information, it would also be interesting to investigate the θ-coordinate scaling properties as proposed in Amari et al. (2003). However, the currently available experimental data are insufficient, since they do not contain enough simultaneously recorded channels to approach the large *N* regime of the theory.

In this paper, we have also reported what constitutes, to our knowledge, the first calculation of the impact of higher order firing on the mutual information about sensory stimuli carried by a neural population. Previous studies mostly concentrated on the effect of interaction on response entropy. Since typically mutual information is smaller when compared with both the response and the noise entropy, an impact that may be proportionally small for entropy may be proportionally much larger when considering information. In the system considered here, we found that correlations decreased the information, and the decrease in information became larger as the interaction order grew, and saturated at order three. This suggests that, when evaluating the computational capacity of cortical population in sensory areas, it may be necessary sometimes to take into account correlations of order higher than two. Thus, it is particularly important to be able to extend analytical models of the effect of correlation on information to order higher than two, and to be able to compute their scaling in their thermodynamic limit. We believe that this will become an important topic for future theoretical research on the encoding capacity of neural networks.

## Acknowledgements

This work was supported by BMI project at IIT (Italy) and by the EPSRC ‘CARMEN’ grant EP/E002331/1 (UK).

## Footnotes

↵† These two authors contributed equally to this work.

One contribution of 14 to a Theme Issue ‘Topics on non-equilibrium statistical mechanics and nonlinear physics’.

- © 2009 The Royal Society