The aim of this study was to provide a computational system model of effective connectivity in the human brain underlying overt speech production. Meta-analysis of neuroimaging studies and functional magnetic resonance imaging data acquired during a verbal fluency task revealed a core network consisting of Brodmann's area (BA) 44 in Broca's region, anterior insula, basal ganglia, cerebellum, premotor cortex (PMC, BA 6) and primary motor cortex (M1, areas 4a/4p). Dynamic causal modelling (DCM) indicated the highest evidence for a system architecture featuring the insula in a serial position between BA 44 and two parallel nodes (cerebellum/basal ganglia), from which information converges onto the PMC and finally M1. Parameter inference revealed that effective connectivity from the insular relay into the cerebellum/basal ganglia is primarily task driven (preparation) while the output into the cortical motor system strongly depends on the actual word production rate (execution). DCM hence allowed not only a quantitative characterization of the human speech production network, but also the distinction of a preparatory and an executive subsystem within it. The proposed model of physiological integration during speech production may now serve as a reference for investigations into the neurobiology of pathological states such as dysarthria and apraxia of speech.
Communication by means of verbal expressions, i.e. the reception and production of overt speech, is a fundamental constituent of human behaviour. The integrity of speech and language functions is hence among the most important determinants for the quality of life in neurological patients (Jordan & Hillis 2006). From a theoretical point of view, an essential question about the organization of the cortical networks sustaining language (defined as the cognitive aspects of verbal communication) and speech (the motor act of vocalizing language) pertains to the contribution of functional segregation and integration towards these functions. Functional segregation refers to the distinct localization of different functions in separate cortical areas or modules (Friston 2002a). It is hereby assumed that a particular process, e.g. accessing the mental lexicon or producing the speech sound of a particular word, can be specifically attributed to a particular location (area) in the cortex. Damage to this area would result in a deficient process. This concept is the theoretical basis for classical (Broca 1861) and modern (Rorden & Karnath 2004) approaches to map human brain functions based on the correlation between the location of a lesion and the resulting symptoms. A radical interpretation of this idea would moreover postulate that not only is a particular function supported by a specific cortical site but that the respective region is also selective for the particular process (Friston 2002a; Stephan 2004; Eickhoff et al. 2005). In other words, this concept, which has long been fundamental to functional neuroimaging, emphasizes a one-to-one mapping between (functional) brain processes and cortical areas, which leads to an ever-refined subdivision of the cortex, using increasingly subtle variations of an experimental paradigm.
Over the last decade, this approach has led to an enormous amount of knowledge about the localization of individual cognitive, sensory and motor functions. In addition, the combination of functional localization experiments with probabilistic cytoarchitectonic maps of the human cerebral cortex (Zilles et al. 2002; Amunts et al. 2007; Eickhoff et al. 2007) has provided an ever-growing insight into the correspondence between the histological organization of the brain and its functional segregation (Indefrey et al. 2001; Horwitz et al. 2003; Amunts et al. 2004; Kleber et al. 2007). However, these studies also posed the question whether the organization of the human brain may be sufficiently explained by functional segregation alone. In particular, it could be demonstrated that the same cortical area (Brodmann's area (BA) 44) may be activated not only by expressive speech and lexical, phonological or semantic judgements, but also during somatosensory and motor tasks or during visual search (Zatorre et al. 1996; Horwitz et al. 2003; Manjaly et al. 2003; Amunts et al. 2004; Naito et al. 2005; Kleber et al. 2007; Vogt et al. 2007). Such observations obviously contradict a simple one-to-one mapping. They may, however, be well explained by the concept of functional integration, which emphasizes the role of inter-regional interactions (Friston 2002b; Friston et al. 2003; Stephan 2004). By taking a more systemic approach, this point of view attributes functional processes, such as understanding language or performing motor acts, to the interactions between several areas forming an interconnected network. It has to be stressed, though, that functional segregation and integration supplement rather than contradict each other. While concepts of functional integration regard the brain as a dynamical system from which functional properties emerge, they also acknowledge that individual elements within this system are not equivalent but sustain distinct roles in information processing (for a more detailed discussion on this topic and the ensuing network properties, see Sporns et al. 2004; Bassett & Bullmore 2006).
Historically, both concepts also had a considerable impact on neurobiological theories of language and speech. Based on brain lesions resulting in non-fluent and fluent aphasia, Paul Broca and Carl Wernicke hypothesized that language is supported by specialized, functionally segregated areas. By contrast, Wernicke and Lichtheim later proposed a connectionist model of language functions that is based on the interaction between an ‘auditory centre’, a ‘motor centre’ and a ‘concept centre’ (Lichtheim 1885). More recently, the development of functional neuroimaging brought forward a hitherto unprecedented precision in localizing differences in neuronal activation between experimental conditions. The ensuing possibility to probe for brain regions responding to particular stimuli, tasks or contexts then led to a detailed and ever-increasing knowledge about the functional neuroanatomy underlying cognitive aspects of language such as reading, picture naming or performing lexical, affective and phonological judgements (Friederici & Alter 2004; Hagoort 2005; Vigneau et al. 2006; Hickok & Poeppel 2007).
Investigating the functional integration of language areas requires the analysis of effective connectivity (defined as the influence that one brain region exerts over another). These kinds of studies showed, for example, that the cerebellum may be involved in refinement of cortical representations during a rhyming judgement task (Booth et al. 2007). Other investigations into the effective connectivity of language functions demonstrated that the differential activation of BA 45 in lexical and phonological tasks can be attributed to modulations of input from the inferior temporal gyrus (Heim et al. 2007) or that premotor activation for pseudowords was associated with connectivity from the posterior fusiform gyrus while activation in Broca's region for words was caused by interactions with the anterior fusiform gyrus (Mechelli et al. 2005).
In contrast to the frequently studied and already well-described cognitive components of language, the motor act of overt speech has received yet considerably less attention. In particular, the final step in most models of speech production is a pure propagation of word form information from Broca's area to the motor system for articulation (Indefrey & Levelt 2004). That is, the motor aspect of vocalization has frequently been conceptualized as a simple output route. Clinical observations, however, have motivated the distinction between at least two divergent types of speech production impairment, apraxia of speech and dysarthria (Ogar et al. 2005; Jordan & Hillis 2006). Both can be distinguished from aphasia by featuring preserved language cognition (such as reading, understanding, word retrieval and grammar). However, whereas apraxia of speech is characterized by difficulties in putting together sounds and syllables in the correct order during the planning of articulation, dysarthria refers to an impaired execution of articulatory programmes, resulting in an inadequate pronunciation with slow, imprecise or uncoordinated speech. The presence of at least two distinct clinical symptoms suggests a more complex organization of the speech production network in the human brain, possibly involving distinguishable subsystems.
Neuroimaging studies of overt speech production (Wildgruber et al. 2001; Ackermann & Riecker 2004; Riecker et al. 2005) as well as observations in patients with brain lesions (Dronkers 1996) have identified a number of regions that seem to contribute to speech motor output (e.g. the Broca region, the cerebellum and the motor cortex). The organization of this system and the effective connectivity within it, however, are yet largely unknown. Therefore, the aim of the present study was to provide a computational model describing the vocalization network from a systemic perspective. A meta-analysis of 19 published neuroimaging studies reporting activation during various overt speech production tasks was used to confirm that the activations of the present study do indeed reflect the core network for speech production. The effective connectivity between the identified regions was then assessed using dynamic causal modelling (DCM; Friston et al. 2003). The architecture of the network was identified using Bayesian model comparison across alternative models reflecting competing hypotheses on serial and/or parallel processing. Models (figure 1) were estimated based on functional magnetic resonance imaging (fMRI) data on 20 subjects performing an overt verbal fluency task. Subsequently, inference was sought on the coupling parameters of that model which received the highest evidence based on our data, in order to compare the coupling strengths of the inter-areal connections and their modulation by the commencement of the task per se as well as by the actual word production rate.
2. Material and methods
(a) Meta-analysis of speech motor areas
In order to identify those areas that are consistently implicated in speech production, we performed a meta-analysis of 19 functional imaging studies published between 1996 and 2007 (table 1) using activation likelihood estimation (ALE; Turkeltaub et al. 2002; Laird et al. 2005; Eickhoff et al. in press). ALE is a method for coordinate-based meta-analysis aimed at revealing a convergence between foci of activation reported in different studies. The algorithm is based on the idea of treating those foci not as individual points but rather as probability density functions centred on the reported coordinates. By calculating the union of activation probabilities (represented by the modelled Gaussian distributions) across experiments, an ‘activation likelihood score’ can be computed for each voxel. However, given sufficient data, virtually every voxel in the brain will have non-zero activation likelihood. ALE values representing non-random clustering of foci (i.e. a true convergence of reported coordinates) hence have to be differentiated from those representing random clustering (i.e. noise). In order to statistically delineate those regions that were consistently activated, the obtained ALE values are therefore compared with an empirical null-distribution reflecting a random spatial association across the included experiments (Eickhoff et al. in press). Inference is then performed based on the derived non-parametrical p-values. In the present analysis, the level of significance was p<0.01 corrected for multiple comparisons using the false discovery rate (FDR) procedure (Laird et al. 2005).
(b) fMRI set-up, data acquisition and analysis
We examined 20 volunteers (mean age 28.3 years; 14 women), all of whom were native German speakers with normal or corrected-to-normal vision and right-handed preference as assessed by the Edinburgh inventory (Oldfield 1971). Participants had no history of neurological or psychiatric disorders and gave informed consent. The study was approved by the ethics committee of the University of Aachen. The experimental paradigm is described in detail in Heim et al. (2008) as well as in the electronic supplementary material, which also provides a more extensive summary of the experimental findings not directly related to the current analysis. The performed experiment employed a block design, whereby each activation block lasted 20 s. Each block was preceded for 6 s by a written instruction and followed by 20 s of rest before the next instruction was displayed. In total, there were 24 activation blocks corresponding to four conditions repeated six times. The conditions differed from each other by the criterion for word generation (‘semantic’, ‘syntactic’, ‘phonological’, ‘free’). Each block consisted of 10 trials lasting for 2 s. fMRI data were acquired in the first 1.04 s of each trial using a bunched-early sequence (de Zubicaray et al. 2001; Heim et al. 2006). Acquisition was followed by 0.96 s of silence, indicating the subject to utter the next word. This approach reduced motion-induced artefacts, since subjects only spoke (i.e. moved) when no fMRI data was recorded.
The participants' speech production was recorded using the microphone of the goggle system employed for visual presentation of the instructions, which was connected to the line-in port of a notebook positioned outside the scanner. These recordings were used for qualitative and quantitative analyses of the participants' overt responses in each condition, yielding information about what they said and how many items were generated (Heim et al. 2008). For the present analysis, we used a one-factorial repeated-measure ANOVA on the number of generated words in order to test for differences in the amount of overt speech production between conditions.
Three-dimensional images were acquired on a 3T Siemens Trio scanner with a standard birdcage head coil from 17 sagittal slices in the left hemisphere (gradient-echo echo-planar imaging, TE=30 ms, 3.1×3.1×4 mm resolution). The sagittal slice orientation was chosen for alignment with the predominant plane of head motion, minimizing between-plane motion. Foam paddings were used to further reduce head motion. Data analysis was performed using SPM5 (www.fil.ion.ucl.ac.uk/spm). It included standard procedures of realignment, normalization to the Montreal Neurological Institute (MNI) single-subject template and spatial smoothing (full width at half maximum=8 mm). For single-subject analysis, the input functions reflecting the experimental timing were convolved with a canonical haemodynamic response function and used as regressors in a general linear model. For the group analysis, the individual baseline contrasts were entered into a repeated-measures ANOVA (including non-sphericity correction) as a second-level random effects analysis to identify areas significantly activated by overt speech (conjunction analysis, p<0.05, family-wise error (FWE) corrected). Activations were anatomically localized by using the SPM anatomy toolbox (http://www.fz-juelich.de/ime/spm_anatomy_toolbox) and a histological maximum probability map (MPM; Eickhoff et al. 2005). This map denotes the most likely cytoarchitectonic area at each voxel in MNI space based on probabilistic cytoarchitectonic maps derived from the microstructural analysis in a sample of 10 post-mortem brains (Zilles et al. 2002; Amunts et al. 2007).
(c) Dynamic causal modelling set-up
Based on the meta-analysis of previous studies, the results of the fMRI random-effects inference and the location of the reliably detected single-subject maxima (cf. figure 2), the following left hemispheric regions were entered as nodes in the effective connectivity models:
BA 44 of Broca's area (MNI coordinates: −50/10/5; this location was assigned to cytoarchitectonically defined BA 44 in the MPM based on a probability of 40%, which was the highest of any area observed at that position);
the anterior insula (MNI coordinates: −32/16/2; the location of this node is rostral to the central sulcus of the insula, which forms the macroanatomic border of the anterior insula);
the cerebellum (MNI coordinates: −40/−60/−18, assigned to lobe VI according to macroanatomical parcellation provided by the automated anatomical labeling atlas; Tzourio-Mazoyer et al. 2002);
the basal ganglia (MNI coordinates: −14/−1/17, corresponding to the head of the caudate nucleus in Tzourio-Mazoyer et al. 2002);
the ventral premotor cortex (PMC; MNI coordinates: −58/1/23; in the MPM this location was assigned to cytoarchitectonically defined BA 6 based on a probability of 50%); and
the primary motor cortex (M1; MNI coordinates: −45/−11/39; this location corresponded to the border between cytoarchitectonic areas 4a and 4p, showing probabilities of 50 and 40%).
For each subject and area, the individual local maximum (p<0.05 uncorrected; cf. Mechelli et al. 2005; Eickhoff et al. 2008; Grefkes et al. 2008a) in the contrast ‘overt speech versus resting baseline’ was identified that was closest to the group maximum. Time series for these regions was extracted as the first principal component of all voxel time series within a spherical (radius 4 mm) volume of interest centred on the individual peak coordinates for each maximum.
Each of the four evaluated models (figure 1) featured task-related driving input into BA 44 based on the four experimental conditions, which reflected the activation of this area by the cognitive aspects of speech production (Indefrey & Levelt 2004; Hagoort 2005; Heim et al. 2008). In all models, each of the defined connections was allowed to be modulated by any of the four conditions. The structure of the hypothesized network, however, was varied between the four analysed models, in particular, with respect to the presence or absence of parallel processing at different stages of the output cascade (figure 1). These models are evidently but a sample of all possible configurations. They were selected among all possible ones for further analysis based on theoretical considerations and the current knowledge about the human speech motor system as derived from functional neuroimaging and lesion studies. In particular, the four models analysed here using DCM reflect neurobiological plausible (given previous findings) alternative hypotheses about the architecture of the vocalization network. It should be noted that none of the four assessed models is based on direct cortico-cortical connections between the inferior frontal gyrus and PMC, although such connections have been demonstrated in neuroanatomical studies in non-human primates. The homology of the inferior frontal gyrus and, in particular, Broca's region, however, is still a matter of conjecture and corresponding connections have not yet been demonstrated in the human brain. Secondly and more importantly, there is ample evidence implicating the basal ganglia and in particular the insula as key nodes and prerequisites for speech production, which argues against a short-circuiting of information from BA 44 to the PMC (Wise et al. 1999; Bookheimer et al. 2000; Heim et al. 2002; Kemeny et al. 2006). In our analysis we therefore followed this view, which also matches well with current models for the initiation and execution of non-speech movement (Middleton & Strick 2000; Jueptner & Krukenberg 2001), and modelled the insula, the basal ganglia and the cerebellum as intermediates before information reaches the PMC.
Model 1 is based on the hypothesis that information from the insula reaches the PMC exclusively via two subcortical pathways (Taniwaki et al. 2006) involving the caudate nucleus and the cerebellum, respectively. The output of these structures then converges onto the PMC. The latter in turn is the only source of input into the primary motor cortex. In summary, this model hence proposes a serial interaction of cortical areas and an interposed parallel processing in the two subcortical network nodes.
Model 2 is the only model that does not feature the insula as a serial relay between BA 44 and subsequent processing stages. Rather, it reflects a distributed motor preparation (cf. Jueptner & Krukenberg 2001), taking place in parallel in the insula, the caudate nucleus and the cerebellum. All of these structures receive input from BA 44 and project onto the PMC. The key difference between the hypothesis reflected by this model and those implemented by the remaining ones hence pertains to the parallel rather than the serial role for the insula in speech production.
Model 3 describes, in contrast to the previous models, a situation in which the insula, after receiving information from BA 44, projects not only to the two subcortical nodes (caudate nucleus and cerebellum) but also to the PMC (Mesulam & Mufson 1982). All three of these nodes then project directly to the primary motor cortex. In contrast to models 1 and 2, this view hence does not hypothesize the PMC as a final relay, which exclusively forwards information into M1. The hypotheses reflected by models 2 and 3 hence reflect two modifications of the view proposed by model 1, postulating that either the insula or PMC works in parallel with the subcortical nodes.
Model 4, finally, is similar to model 3 but entails one crucial difference. The caudate nucleus and the cerebellum are now assumed to project to the premotor rather than the primary motor cortex. This view hence attributes an integrative role to the PMC (Jueptner & Weiller 1998; Rizzolatti et al. 2002), which is hypothesized to integrate input from the insula, the caudate nucleus and the cerebellum and provides the sole input into M1. From a different point of view, model 4 also reflects a hypothesis similar to that featured in model 1 supplemented by direct input from the insula into the PMC.
(d) DCM model selection and parameter inference
In order to compare these competing hypotheses on the organization of the speech motor network represented by the four models, we used Bayesian model selection (BMS; Penny et al. 2004). BMS evaluates the relative posterior probabilities p(y|m) of observing the data y given a particular model m integrated over the model parameters. Given two models i and j, the Bayes factor (BF) Bij is defined as the ratio p(y|m=i)/p(y|m=j). It hence measures how much evidence the experimental data provides in favour of model i (relative to model j). The BFs for the all pairwise comparisons between the four models were calculated for each subject. Since subjects represent independent observations, the average BF (ABF) for each comparison was calculated as the geometric mean of the individual BFs (Stephan 2004) for group inference. ABFs may, however, become skewed by outliers showing very strong evidence in favour of one model. Consequently, we corroborated the obtained results by the positive evidence ratio (PER), which is defined as the number of subjects for which evidence in favour of a particular model was obtained as opposed to the number of subjects showing evidence in favour of the opposing model.
After determining the most appropriate system model using BMS, the posterior estimates of its model parameter (driving inputs, intrinsic connections and modulations) were subjected to further analysis for statistical random-effects inference over the whole sample (20 subjects). In particular, it was examined (i) whether each individual parameter differs significantly from zero (one-sample t-test), (ii) whether there was a significant difference between the parameters (repeated-measures ANOVA, followed by pairwise comparisons using Tukey's correction), and (iii) whether there was a significant correlation between the modulation of effective connectivity during verbal fluency and speech production rate in each of the four fluency conditions. Importantly, both the behavioural performance (speech production) and the coupling parameters within the modelled speech production network were calculated in a condition-specific manner, not separately for the individual blocks. That is, speech production was assessed by the total number of words uttered by a subject in the 60 trials (6 blocks, each containing 10 trials) for each of the four conditions (semantic, syntactic, phonological and free). Likewise, the coupling parameters of the effective connectivity model were also fitted across all blocks pertaining to a particular fluency condition. The effective connectivity between two regions (four values per subject, calculated across all realizations of a particular condition) and the behavioural measurements (four values per subject, calculated across all realizations of a particular condition) could hence be directly compared with each other.
Evidently, the number of produced words, i.e. the behavioural performance, is determined primarily by the subjects' capability for retrieving rather than uttering the requested words. It should be noted, however, that in the applied connectivity model these cognitive effects are accounted for in the driving input evoking activation in BA 44. The modulations of effective connectivity downstream of BA 44, i.e. in the investigated speech production network, on the other hand, are not related to the strength of this driving input (reflecting successful word retrieval), as those effects would be forwarded linearly. Rather, the respective modulations of intrinsic connectivity depend entirely on the effective connectivity between the involved areas. The analysed correlation between word production rate and modulations of effective connectivity therefore reflects context-dependent differences in downstream coupling independent of cognitive retrieval processes.
(a) The speech motor network
The core areas of the network involved in the production of overt speech were reliably identified in the meta-analysis of published neuroimaging studies, the random-effects group analysis of the fMRI recorded for our verbal fluency paradigm and, importantly, also based on local maxima in the individual fMRI data. When comparing the results of the ALE meta-analysis with the location of significant activation in the current group analysis, a highly similar pattern of regions emerges (figure 2). In particular, both analyses revealed activation in Broca's region in the inferior frontal gyrus (cytoarchitectonically assigned to BA 44, cf. above), the face region of M1 (areas 4a and 4p), the anterior insula and the lateral PMC (BA 6). Furthermore, we found a significant activation in the left cerebellar hemisphere (lobe VI) and the basal ganglia (head of the caudate nucleus). The robust involvement of the aforementioned regions was further corroborated by the locations of the individual activation foci, which clustered tightly around the maxima revealed by group and meta-analysis. BA 44, insula, caudate, cerebellum, PMC and M1 hence constituted the nodes for all four competing system models for the effective connectivity underlying overt speech production.
(b) Inference on model architecture
In order to identify the most likely architecture of the speech production system, we employed BMS (table 2) on the four alternative models representing different hypotheses about the functional architecture of the vocalization network (figure 1). A consistent support for model 1 was found based on both the ABF and the PER. Model 1 hence represents the most likely network architecture given the measured data. This model, which received superior evidence in the pairwise comparison to each competing model, reflects a serial feed-forward from BA 44 to the insula, from which parallel pathways emerge towards the caudate and the cerebellum. Both of these two regions then project to the PMC, which connects to the primary motor cortex to produce the final motor output.
(c) Analysis of driving inputs and intrinsic coupling
Activation entered the modelled system as driving input into BA 44. The estimated parameters for these direct effects entering BA 44 reflect the result of all cognitive aspects underlying word generation and were significantly greater than zero for each of the four fluency conditions (one-sample t-test; p<0.05, Bonferroni-corrected, cf. table 2). There was no significant difference between driving input exerted on the system by the four verbal fluency conditions as assessed by a one-way within-subject ANOVA (F3,57=0.71; p=0.553). The input into the speech production network was therefore unrelated to the criterion for word generation, underlining the independence between motor and cognitive aspects of speech production. While the analysis of the behavioural recordings showed a significant difference in the number of words produced during each condition (table 2, within-subject ANOVA: F3,76=16.91; p<0.05), there was no significant correlation between the strength of the driving inputs and the respective production rates (p=0.44, Pearson's r=−0.09).
The intrinsic connection parameters between the network nodes reflect the context-independent degree of inter-areal coupling (table 2). All of them were significantly greater than zero (p<0.05, Bonferroni-corrected random-effects analysis). A comparison of the intrinsic coupling parameters obtained for the different connections within our model by a one-way within-subject ANOVA, however, yielded only a non-significant trend towards difference (F5,95=1.98; p=0.08).
(d) Analysis of context-dependent modulations in coupling
All inter-areal connections showed a significantly positive task-dependent modulation by each of the four verbal fluency conditions (p<0.05, Bonferroni-corrected random-effects analysis). That is, whenever the subjects had to produce words overtly, each of the (positive) intrinsic connections showed a significantly enhanced coupling, indicative of increased excitatory influences. Differences between the modulation parameters were analysed by a within-subject ANOVA using the factors ‘connection’, ‘task’ and (due to the between-condition differences in word production rate) ‘number of produced words’. The analysis revealed a significant effect of the factor ‘connection’ (F5,431=2.97, p=0.01), which was driven by a significantly (p<0.05, corrected) weaker modulation of the caudate→PMC and insula→PMC connections as compared with each of the remaining ones. In other words, although all connections were significantly enhanced when subjects engaged in overt speech production, this enhancement was least pronounced for the projections of the caudate and the cerebellum to the PMC.
Testing for the effects of the continuous factor ‘number of produced words’ on the coupling parameters showed a significant main effect of production rate (F1,431=10.9, p=0.001) and a significant interaction between production rate and connection (F5,431=10.9, p=0.006). The dependency of the effective connectivity on the speech production rate was hence further investigated by a correlation analysis (figure 3). This procedure demonstrated a significant positive correlation between the modulation of effective connectivity within the speech motor network and the number of produced words when all connections were considered jointly (main effect, figure 3a). Resolving the ‘rate×connection’ interaction by separate correlation analyses for each connection revealed that the increase in connectivity from the caudate nucleus and the cerebellum to the PMC was strongly and positively correlated with the number of produced words (figure 3b). The correlation between PMC→M1 connectivity and production rate barely reached significance. The coupling between BA 44 and the insula and the efferent projections of the latter to the caudate and the cerebellum, however, did not depend on the number of spoken words.
Finally, there was neither a significant effect of the factor ‘task’ (F3,431=2.14, p=0.095) nor a significant ‘task×connection’ interaction (F15,431=1.06, p=0.392), indicating that different verbal fluency conditions per se did not influence inter-areal coupling.
The human speech production system was identified by a meta-analysis of published functional neuroimaging studies and fMRI data obtained from verbal fluency tasks. Robust activation was demonstrated in a core network consisting of BA 44, anterior insula, caudate nucleus, cerebellum, PMC (BA 6) and M1 (areas 4a and 4p). The functional architecture of this network and the effective connectivity within this system were assessed using DCM. Bayesian model comparison indicated highest evidence from the measured data for a model that features the insula in a serial position between BA 44 on one side and the cerebellum as well as the basal ganglia on the other. Information from the two latter structures then converges onto the PMC from where it is forwarded to the M1. Parameter inference revealed, that the projections from the insular relay into two parallel loops is primarily task-set driven, while their output into the cortical motor system is strongly dependent on the actual word production rate as recorded during the performance of the fMRI experiment (figure 4).
(a) Methodological considerations: meta-analysis
Functional neuroimaging has several limitations, which render the informative value of a single fMRI or positron emission tomography study rather ambiguous. For once, the sample sizes of neuroimaging experiments are rather small (usually less than 30–40 subjects) as compared with those in other medical and social sciences. Related to the aforementioned issue is the fact that the reliability of neuroimaging data is usually considered not high enough as to definitively conclude the presence or absence of an effect from a single experiment (Feredoes & Postle 2007; Raemaekers et al. 2007). Most importantly, however, the results of all neuroimaging experiments are dependent on the experimental set-up and their inherent subtraction logic that is only sensitive to differences between conditions (Stark & Squire 2001; Price et al. 2005). Consequently, integrating data from several studies in order to identify locations that show a consistent response across experiments (collectively involving hundreds of subjects and numerous variations in experimental design) has become an important tool for assessing functional localization in the brain. The advantage of meta-analyses in this context is demonstrated by the current summary of tasks involving overt vocalization. In spite of the considerable variability in the applied tasks (ranging from standard verbal fluency tasks to overt reading of unpronounceable letter sequences and repeating piano tones), the performed meta-analysis robustly revealed a core network of brain regions forming the human vocalization network. Pooling the results of a wide variety of functional neuroimaging studies using overt vocalization hence allowed delineation of those areas that are generically involved in producing overt speech independently of the experimental set-up and the precise task that has been applied.
(b) Methodological considerations: connectivity analysis
DCM is a hypotheses-driven approach to effective connectivity analysis, which treats the brain as a nonlinear deterministic system in which external inputs cause changes in neural activity that in turn lead to changes in the fMRI signal (Friston et al. 2003). This approach explicitly models neuronal activity, which is then linked via a biophysically validated haemodynamic model (Stephan et al. 2004) to the measured blood-oxygenation-level-dependent response. It hence represents a modelling approach that is much closer related to changes in neural dynamics as compared to, for example, correlation- or coherence-based approaches. In particular, these latter techniques suffer from insensitivity to directional and timing information of neural connectivity and are therefore closer to the notion of functional (coherence of neural events) as opposed to effective (influence that one neural system exerts over another) connectivity. Moreover, these methods and other techniques of effective connectivity working on the level of observed responses, such as structural equation modelling, assume that interactions are instantaneous and/or that the system is driven by unknown stochastic effects. By contrast, DCM explicitly involves the modelling of neuronal dynamics driven by known experimental manipulation. While the data modality used in the current study (fMRI) has the advantage of a precise anatomical localization of effects and homogeneously reliable assessment of both cortical and subcortical structures, it holds drawbacks against electrophysiological approaches such as electroencephalography and magnetoencephalography. In particular, owing to the slower sampling of the data and the physiological low-pass filter represented by the haemodynamic response function, it cannot reveal the precise timing of interactions (Levelt et al. 1998) or reflect changes in induced as opposed to evoked responses (Saarinen et al. 2006). A future combination of fMRI measurements and electrophysiological data, which would combine the strengths of both approaches, should hence foster an advanced understanding of the human speech production network.
(c) Broca's region/BA 44
BA 44 on the left inferior frontal gyrus serves as the entry point of activation into the modelled system from which activation is then further propagated to the subsequent network nodes (Friston et al. 2003). The driving input into BA 44 is modelled as a task-related input function, i.e. BA 44 was assumed to be activated by the four experimental conditions (semantic, phonological, syntactic and free fluency). The presence of an experimental condition was hereby treated as an external cause perturbing the system's intrinsic dynamics. Evidently, however, these inputs are not direct task-related effects similar to, for example, those acting on the visual cortex during visual stimulation (Grefkes et al. 2008a). Rather, this driving input reflects net effects that the (here unmodelled) activity in a widespread network of ‘cognitive’ brain regions has on BA 44 (Indefrey & Levelt 2004; Hagoort 2005; Vigneau et al. 2006). The present model is hence based on the view that BA 44 is the final stage of word retrieval, which initiates the actual articulation. Such a view is in turn well in line with the classical concept of Broca's region being the ‘expressive speech centre’ and supported by both fMRI data (e.g. Heim et al. 2006; Kleber et al. 2007) as well as analyses of brain lesions in aphasic patients (Hillis et al. 2004).
(d) Anterior insula
The contribution of the insular cortex to overt speech production may be conceptualized as a relay between more cognitive aspects of language and the preparation of the actual vocalization movements in the motor circuits of the basal ganglia and the cerebellum. As suggested in a review by Ackermann & Riecker (2004), such a relay might serve as a filter necessary for the coordination of the high number of muscles engaged in articulation and phonation. That is, the insula receives the phonetic concept of an intended speech act from Broca's region and translates it into vocal tract motor patterns by filtering and sound-to-movement mappings (Ackermann & Riecker 2004; Guenther 2006). This view is also supported by lesion mapping in aphasics, which implied the insula as a key region for coordinating speech articulation (Dronkers 1996). In addition to such an orchestrating function, it may also be speculated as to whether the prosodic (e.g. affective) modulation of intended words or sentences is taking place at this stage (Phan et al. 2002; Wildgruber et al. 2005; Koelsch et al. 2006; Kleber et al. 2007).
(e) Basal ganglia/cerebellum
The insula forwards the computed vocalization plans in a parallel fashion to the cerebellum and the basal ganglia, i.e. structures that are well established constituents of cortical-subcortical loops for movement preparation (Jueptner & Krukenberg 2001; Monchi et al. 2006; Taniwaki et al. 2006). Based on neuroanatomical, electrophysiological and functional neuroimaging data, the cerebellum and the basal ganglia can be expected to implement the further processing and selection of motor schemes, the incorporation of sensory feedback and the generation of a precise temporal calibration of movement representations (Jueptner & Krukenberg 2001; Booth et al. 2007). The strong modulation of input into the subcortical motor loops by the task, but not by the amount of motor output, as well as reverse pattern of modulations observed for their efferents to the PMC, supports a proposed preparatory role of both routes in speech production. In line with previous results (Bohland 2006; Booth et al. 2007), we hence conclude that selection and initiation of vocalization motor programmes (basal ganglia) as well as the temporal refinement of the discretely prepared sequence into a fluent action (cerebellum) are engaged in parallel in a task-set dependent manner and converge onto the PMC in an output-rate dependent fashion.
(f) Primary motor and premotor cortices
The basal ganglia and the cerebellum both forward their information to the PMC, which precedes M1 in a serial fashion. The parallel engagement of the subcortical motor loops is thus followed by a sequentially organized common final pathway: the PMC first combines the processed information about selected movement programmes and their temporal sequencing provided by the basal ganglia and the cerebellum, respectively, into a final movement representation. It seems plausible that the integration of information at the level of the PMC involves the precise mapping between the conceptual movement information provided by the preparatory loops (Middleton & Strick 2000; Jueptner & Krukenberg 2001; Monchi et al. 2006; Taniwaki et al. 2006) and more fine graded muscle representations in M1. That is, the PMC translates intended actions into patterns of specific muscle contractions (Jueptner & Weiller 1998; Rizzolatti et al. 2002). These are then forwarded to M1 for the generation of the final output to lower motor neurons and hence execution. This view is supported by the effective connectivity pattern at the level of PMC/M1. In contrast to the primarily task-set driven input into the subcortical pathways (indicative of a preparatory role), their outputs into the PMC as well as its subsequent coupling with M1 show a significant modulation by word production rate. The information processing at this final hierarchical level is therefore closely related to the amount of movement performance that needs to be controlled by the brain, i.e. motor execution demands.
(g) Implications and outlook
The distinction of a preparatory and an executive subsystem within the vocalization network ensuing from their effective connectivity pattern demonstrates the potential of system-based modelling of functional integration for the characterization of human brain function. The localization of areas contributing to a particular task is well served by investigating functional segregation using the contrast approach prevailing in functional neuroimaging. Investigations of effective connectivity, however, may provide additional insight into the nature of the dynamic interactions between these regions (Mechelli et al. 2005; Booth et al. 2007; Heim et al. 2007), but can also provide mechanistic insight into the pathological impairments thereof (Eickhoff et al. 2008; Grefkes et al. 2008b). In particular, it may be hypothesized that distinct clinical disturbances of speech production might result from differential impairment of individual components within the network characterized here (Spencer & Slocomb 2007). For example, among neurological speech disorders, apraxia of speech is distinguished from dysarthria, both at the diagnostic and at the model level (e.g. Georgopoulos & Malandraki 2005). Dysarthria is clinically characterized by an uncoordinated execution of vocalization. The pathology relates to lesions in the basal ganglia, cerebellum, and premotor and motor cortex (Huber 1997; Jordan & Hillis 2006). Based on our present findings, one can suspect that its pathophysiology relates to an impaired connectivity between the cerebellum and the PMC. Evidence for this notion comes from a mathematical model (Guenther & Ghosh 2003), which has further been elaborated since (Guenther et al. 2006). The core features of apraxia of speech, on the other hand, are difficulties in putting together sounds during articulation planning. Earlier research demonstrated that apraxia of speech was related to lesions of the left insula and the basal ganglia (for a review see Ogar et al. 2005). Given our present data, one could therefore hypothesize that in particular, the preparatory components of the speech network, i.e. the projections into the subcortical loops, are impaired in apraxia of speech. Thus, the network model identified in the present study has clinical implications. Most importantly, these implications go beyond previous neuroanatomical findings, which only correlated lesions in brain structures with clinical syndromes. Here, we provide evidence for the notion that the pathology may result from impaired functional interplay of several components such as insula and basal ganglia, or cerebellum and premotor/motor cortex, in distinct disorders of the speech system. The described network may provide fundamental insight into the neurobiology of these conditions and hence allow a characterization of patients based on their pathophysiology rather than the clinical presentation. Furthermore, it may be useful to constrain computer simulations of impaired speech output (e.g. Guenther et al. 2006).
In addition to investigating the relationships between disturbed interactions and clinical syndromes, further developments of computational models for overt speech may also comprise the extension of the analysed system to cover aspects not included in the initial model. Interesting topics for further studies include in particular the characterization of differential roles of the two parallel subcortical loops by using variations of experimental design. Moreover, additional components such as emotional modulation, auditory feedback processing and executive control mechanisms provided by regions such as the prefrontal cortex and the supplementary motor area may also be integrated into the current effective connectivity model of vocalization. Such extensions and modifications would further add to the potential of computational models to enable a mechanistic approach to neurobiology and ultimately a system-based account of both brain function and dysfunction.
This Human Brain Project/Neuroinformatics research was funded by the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke and the National Institute of Mental Health. Further funding was provided by the DFG (KFO-112; to K.Z.), the BMBF (01GW0771; to K.A.), the Human Brain Project (R01-MH074457-01A1; to S.B.E.) and the Helmholz Initiative on Systems-Biology (to K.Z. and S.B.E.).
One contribution of 15 to a Theme Issue ‘The virtual physiological human: tools and applications II’.
- © 2009 The Royal Society