## Abstract

We analyse cardiovascular time series with the aim of performing early prediction of preeclampsia (PE), a pregnancy-specific disorder causing maternal and foetal morbidity and mortality. The analysis is made using a novel approach, namely the *ε*-recurrence networks applied to a phase space constructed by means of the time series of the variabilities of the heart rate and the blood pressure (systolic and diastolic). All the possible coupling structures among these variables are considered for the analysis. Network measures such as average path length, mean coreness, global clustering coefficient and scale-local transitivity dimension are computed and constitute the parameters for the subsequent quadratic discriminant analysis. This allows us to predict PE with a sensitivity of 91.7 per cent and a specificity of 68.1 per cent, thus validating the use of this method for classifying healthy and preeclamptic patients.

## 1. Introduction

Detection of cardiovascular (CV) disorders has been considerably improved due to both technological advances and new methods of time-series analysis. Nevertheless, there are still difficulties that cannot be explained by standard data analysis. Nonlinear data analysis and modelling methods of CV physics allow improved clinical diagnostics and give the opportunity for a better understanding of CV regulation. One of the most important aspects of these methods is that they focus on non-invasive recordings of biosignals. A review of these methods is outlined in earlier studies [1–5]. Among the biosignals that CV physics deals with are the heart rate variability (HRV), the variabilities of systolic blood pressure (SBPV) and diastolic blood pressure (DBPV) and the baroreflex sensitivity. A detailed description of these biosignals and their use can be found in earlier studies [6–10] and underlying models using these measures have been developed [11–13].

Preeclampsia (PE) is a major hypertensive disorder in pregnant women also characterized by proteinuria for which the pathophysiology remains unclear and constitutes a serious risk for both the mother and the foetus. PE affects healthy nulliparous women in a range between 2 and 7 per cent worldwide [14]. Several strategies are used in order to predict PE, among which we can mention some biochemical markers, such as fms-like tyrosine kinase 1 (sFlt-1), placental growth factor (PlGF), soluble endoglin [15,16], maternal autoantibody, the angiotensin II type I receptor agonistic autoantibody (AT1-AA) [17], the urinary biomarkers [18], ultrasonographic markers [19], non-invasive CV markers [20] or the combination of some of those [21–24].

Time-series analysis plays an essential role in the understanding of various real systems regardless of their nature. The techniques used in time-series analysis evolve systematically supported by new insights in statistics, mathematics and physics. Numerous works are devoted to nonlinear time-series methods [25,26] and further developments arise continuously [27–32] allowing the improvement of the analysis and interpretation, and also providing a deeper understanding of the phenomena dealt with. Recurrence methods have recently become a useful tool in order to study time series and acquired importance because they do not need long time series to identify transitions in dynamical systems [28] and their use may be applied to a wide diversity of systems and phenomena as for instance climate ones [33]. In particular, recurrence quantification analysis (RQA) has been successfully used in the study of CV signals [34,35]. Recently, the recurrence concept has been extended to networks and novel time-series analysis methods have arisen [36]. In this work, we apply the approach of *ε*-recurrence networks to analyse non-invasive CV indicators with the aim of developing a classification method to identify healthy subjects (control) from patients who develop PE. The method used in this article differs from the methods dealing with time-varying dynamics [37] in the fact that in a recurrence network, information about the temporal ordering of observations is not explicitly regarded, and we focus on the topological features of the resulting graphs that might reflect dynamically invariant properties associated with the specific dynamical system. The quantification of these topological features enables us to classify patients who develop the pathology from the control individuals.

The paper is organized as follows: in §2, we give an overview of the used complex network-based model; we also explain the clinical aspects (patients and their CV measures), the data processing and statistics. In §3, we present the most important results for the classification. Finally, in §4, we discuss the results giving the conclusion and perspectives of this work.

## 2. Methods

### (a) Recurrence networks

Since the foundation of graph theory in 1736 by Leonhard Euler who solved the Königsberg bridge problem, the use of graphs composed of vertices and edges has witnessed an astonishing evolution towards the complex networks theory in which the principal elements are nodes and links, both exhibiting dynamic aspects in the sense that each node might have an intrinsic behaviour governed by nonlinear dynamical equations and even the links can evolve due to the motion of the nodes [38]. In the last two decades, complex networks theory became an important research topic and its application to numerous complex systems of very different nature including physical [39], chemical [40], biological [41], economic [42], social [43] and climatic [33], among others, gives evidence of its practical importance. Details of complex network theory can be found in the studies of Albert & Barabasi [44] and Boccaletti *et al*. [45].

The basic idea of time-series analysis based on complex network techniques lies on the fact that a time series might be transformed into a complex network from which we can extract the adjacency matrix allowing us to obtain local and global network properties. Roughly speaking, there are three classes of transforming methods: visibility graphs, proximity networks and transition networks [46]. The latter two are based on the concept of recurrence which was stated first by Poincaré [47], and reemerges with the introduction of recurrence plots [48] and its further developments and refinements such as RQA [28]. The concept of recurrence applied to a single trajectory of a dynamical system allows us to obtain the recurrence matrix whose elements are given by *R*_{i,j}=*Θ*(*ε*−∥*x*_{i}−*x*_{j}∥), where *Θ*(⋅) represents the Heaviside function, ∥⋅∥ is a suitable norm, and *ε* is a threshold distance that should be chosen adequately according to the characteristics of the embedded attractor into the phase space. We interpret the recurrence matrix ** R** as the adjacency matrix of an unweighted and undirected complex network, commonly called the

*ε*-recurrence network which is associated with a given time series. Possible self-loops must be avoided in this network; thus a Kronecker delta must be subtracted from the recurrence matrix to obtain a zero-diagonal and, as a consequence, the elements of the adjacency matrix for an

*ε*-recurrence network are 2.1where the

*ε*-dependence (i.e. the level of coarse-graining of phase space involved in this procedure) is considered explicitly [49]. There is not a universal criterion for choosing

*ε*but the choice must be made avoiding too small values that lead to a situation in which there are not enough recurrence points, or too large values implying that every vertex is connected with many other vertices irrespective of their actual mutual proximity in phase space [31]. Having reconstructed the adjacency matrix

**from a time series, we can apply appropriate network characteristics to analyse and obtain information of the underlying system. In this work, we focus our interest on four global network measures.**

*A*#### (i) Average path length ()

The average path length is the mean value of the shortest geodetic path lengths *l*_{i,j} considering all pairs of vertices (*i*,*j*). Thus,
2.2*N* being the number of nodes in the network (i.e. the length of the time series [31]).

#### (ii) Mean coreness ()

The notion of core was introduced in social networks related to the concept of cohesiveness [50]. The concept of node coreness indicates the significance of a node and its popularity in the network. The *k*-core of a graph is the maximal subgraph in which each vertex has at least degree (in the subgraph) *k*. The coreness of a vertex is the highest order of a *k*-core containing the vertex and is calculated using Batagelj's algorithm [51]. The mean coreness is obtained by averaging the corenesses of all the vertices.

#### (iii) Global clustering coefficient

As for the coreness, the clustering coefficient might be defined locally for each vertex as the ratio of triangles, including vertex *i* and the number of triples centred on vertex *i* where triple refers to a pair (*j*,*k*) of vertices that are both linked with *i*, but not necessarily mutually linked. In terms of the adjacency matrix elements,
2.3where *k*_{i} is the connectivity (degree) of the vertex *i*. The global clustering coefficient is obtained from the averaging of . Thus, [45].

#### (iv) Scale local transitivity dimension

Transitivity is defined as the ratio of the number of triangles in the network times three and the number of linked triples of vertices [45]. The latter can be written as 2.4The scale local transitivity dimension is defined as [49].

Note that all these measures depend on *ε* and have a global character. Some of these measures serve as well to characterize the network topology such as in the case of small-world networks that have high clustering coefficient and short characteristic path length [52]. On the contrary, random networks follow the relationships: and [53], *N* being the number of nodes.

### (b) Clinical aspects

We consider for this study 96 patients with abnormal uterine perfusion, followed by means of Doppler sonography in the second trimester, between the 18th and the 26th week of gestation (WOG) of pregnancy, at the Department of Obstetrics and Gynecology of the University of Leipzig. Immediately after the Doppler examination, non-invasive continuous blood pressure monitoring was conducted via finger cuff during 30 min. The continuous blood pressure curves were used to extract the time series of beat-to-beat intervals, systolic and diastolic blood pressures allowing us to obtain the CV indicators (HRV, SBPV and DBPV). The length of the dataset per variable is roughly 1700. At the time of examination, the women were healthy, normotensive, without clinical signs of cervical incompetence and on no medication. After the 30th WOG, 24 patients developed PE. Further details on the methodology can be found in the study of Malberg *et al*. [20].

### (c) Data processing and statistics

In order to avoid artefacts such as double recognition of beats, the original recurrence rate (RR) time series were filtered using a preprocessing algorithm which first removes obvious recognition errors; then applies an adaptive percent filter, and finally, an adapting controlling filter [1]. An example of these filtered signals that constitute the CV indicators is shown in figure 1. A preliminary inspection of figure 1 allows us to realize that there are no significant differences in the median values of the CV indicators between both groups.

With the aim of using a recurrence network approach, we consider the three CV indicators and some possible coupling structures. An estimation of the coupling structure of CV indicators has been performed using nonlinear additive autoregressive models with external input following the idea of Granger causality [54]. This coupling analysis shows that HRV, DBPV and SBPV respond to respiration; SBPV responds to DBPV and the latter to HRV. In our case, we do not consider respiration; thus, the coupling structure might be represented as in figure 2*a* where according to the coupling scheme, there is a delay between the HRV, the DBPV and the SBPV. For simplicity, we write down the coupling structure as (HRV(*t*), DBPV(*t*+1), SBPV(*t*+2)) or simply *H*(*t*)*D*(*t*+1) *S*(*t*+2)≡012. Here, an arrow indicates a causal relation: when there are two arrows, it indicates that the ‘first source’ of arrows (*H*(*t*)) plays the role of the ‘main driver’ so that the other variables do not influence it (consequently, we adopt the 0 for that, i.e. no arrows converge on it); the ‘second source’ of arrows (*D*(*t*)) plays the role of the ‘secondary driver’, thus, it is driven by *H* but it drives to *S* (consequently, we adopt the 1 (label *t*+1) for that); finally, *S* is driven by *D* and consequently by *H*, i.e. there are two time steps separating *S* from the ‘first source’ *H*, and for this reason, we adopt the 2 (label *t*+2) for that.

We sought to predict whether or not a patient develops PE using the CV indicators embedded in a phase space determined by the structure of coupling. We consider a minimalist assumption in which the structure of coupling between HRV, DBPV and SBPV is equal in each subject of a group and that this structure does not change during the measurement. In this study, we set out to test all the possible structures of coupling shown in figure 2 and a wide range of the threshold *ε* going from 0.01*σ* to 0.99*σ*, where *σ* is the standard deviation of the underlying process in the embedded phase space. From a simple CV time series corresponding to each patient, we construct a complex network for each possible structure of coupling and each value of *ε*. Then, we compute the four network measures , and with these new measures we perform an analysis to classify the groups of patients. For that purpose, we firstly verify whether or not these new parameters are significant by means of a Mann–Whitney *U*-test and considering a significance level of 5 per cent; the null hypothesis being that data in the vectors corresponding to control and preeclamptic patients are independent samples from identical continuous distributions with equal medians, against the alternative that they do not have equal medians.

The methodology to obtain the results might be summarized in the following steps. (i) Preprocessing the raw data and obtaining the values of HRV, DBPV and SBPV. (ii) Identifying the interaction among HRV, DBPV and SBPV (coupling structure). (iii) Obtaining the recurrence matrix for HDS(*t*). (iv) Obtaining the recurrence network. (v) Using the structure of the coupling and the threshold *ε* to construct the parameter space. (vi) Performing the classification analysis from the parameter space and the statistical issues.

## 3. Results

As the approach is based on recurrence complex networks, firstly we obtain the matrices ** R** and

**, and we expect that visualization of structures related to these matrices could also give some information about the differences between the PE and the control group. Visualizations of the recurrence plots and the associated networks, obtained using the medians of the time series, are shown in figures 3 and 4, respectively. We note slight differences between the recurrence plots corresponding to PE and control group (figure 3). The network representations are made using a layout algorithm that uses basic principles of physics to iteratively determine an optimal layout, i.e. a given mass and an electric charge are associated with each node, and each edge is represented as a spring. For this second construction, we have used only 500 nodes in order to better visualize the differences between the PE and control networks (figure 4**

*A**a*,

*b*, respectively). Nonetheless, this visualization inspection is just a first checkup that cannot replace the quantification of the network measures.

The results for each network measure are represented in the phase plane embedding (structure of coupling) versus *ε* as shown in figure 5. The colour code indicates the *p*-values of the statistical test when the null hypothesis *H*_{0} of equal medians at 5 per cent significance level is rejected. The white pixels denote that there is no difference between both groups (*p*≥0.05), and pink ones the impossibility to compute *p*. On the contrary, the black pixels represent the minimum *p*-value among all the possibilities on the phase plane.

According to figure 5, the significant values for each network measure occur only for some coupling structures and thresholds *ε*. Figure 6*a* shows the same plane as in figure 5 but considering the cases in which all the four network measures are simultaneously significant, i.e. *p*<0.05 (black pixels). The inspection of figure 6*a* shows that there are 22 situations in which the four network measures satisfy simultaneously the statistical significance test, and we further restrict the analysis to these selected cases which do not necessarily correspond to the lower *p*-values. Now, considering these four measures as the parameters for the classification of control and PE groups, we perform a quadratic discriminant analysis for all the possible structures of the coupling and *ε* (figure 6*b*). At first sight, we identify the minimum *p*-value which corresponds to the coupling structure of 120 and *ε*=0.60*σ*. Nevertheless, this situation does not correspond to the selected ones in figure 6*a* and consequently it is discarded. Table 1 shows the statistical measures of the performance of a binary classification test for the 22 selected cases. Such measures are misclassification error (percentage of observations that are misclassified), sensitivity (proportion of true positives that are correctly identified by the test), specificity (proportion of true negatives correctly identified by the test) [56], positive predictive value (PPV), i.e. the proportion of patients with positive test results who are correctly diagnosed, and negative predictive value (NPV), i.e. the proportion of patients with negative test results who are correctly diagnosed [57]. From table 1, we select the situation corresponding to a coupling structure 120 and *ε*=0.61*σ* (italic fonts) whose misclassification error is 20.1 per cent giving consequently the best values for the classification results, i.e. a sensitivity of 91.7 per cent, a specificity of 68.1 per cent, a PPV of 48.9 per cent and an NPV of 96.1 per cent. We stress again the fact that the *p*-values for the four considered network measures are statistically significant (table 2). It is worthwhile to mention that performing 10-fold cross-validation, the results obtained using this technique give for sensitivity and specificity 60.0 per cent and 69.7 per cent, respectively.

Finally, we must point out several aspects in order to justify our methodology compared with other possible approaches. The utilization of *ε*-recurrence networks to classify using the embedding of the individual time series, i.e. *x*(*t*)=(*H*(*t*),*H*(*t*−1),*H*(*t*−2)), *y*(*t*)=(*S*(*t*),*S*(*t*−1),*S*(*t*−2)) or *z*(*t*)=(*D*(*t*),*D*(*t*−1),*D*(*t*−2)), restricts considerably the possibilities for the analysis; there are few cases in which a set of four network measures satisfies the condition *p*<0.05 for all the four variables simultaneously. We obtained interesting results using the embedding of the DBPV for small values of *ε*. On the contrary, for HRV and SBPV, there are not four network measures satisfying simultaneously the condition *p*<0.05. Despite the ‘good results’ (sensitivity and specificity of 41.7% and 93.1%, respectively) obtained with the embedding of DBPV, this approach obviates the coupling among the CV indicators. Thus, the conceptual richness is less than the approach used in this work. We also performed RQA considering the same threshold and coupling structure (*ε*=0.61*σ* and 120) and found that sensitivity and specificity are 91.7 per cent and 45.8 per cent, respectively, for a set of four variables (RR, determinism, longest vertical line and trapping time) satisfying simultaneously the condition *p*<0.05. The latter confirms that our method is more adequate for this problem of classification.

## 4. Discussion

Although large improvements concerning the early detection of PE have been made [58], this topic is still one of the most important challenges in obstetrics. Several researches are oriented to cheap and non-invasive methods and the application of new methods and ideas contributes to improve the statistical predictive values. The essential aspect of the approach used in this work lies in its novelty when applying to CV signals, i.e. complex time series that in their raw form are not useful for classification are transformed into recurrence networks from which we extract several measures that allow a classification with suitable results. In fact, after the choice of an adequate structure of the coupling and the threshold *ε*, only one complex network is constructed from the three CV indicators for each person and then we quantify the network features that constitute the parameters for the classification analysis. In summary, our exploratory results show that the used approach constitutes a useful tool to study such a classification problem.

Note that the analysis presented here is in some sense only a first approximation of the recurrence networks approach. Our method takes into account the coupling among the CV indicators (HRV, SBPV and DBPV). The coupling is very important in the dynamics of these quantities and consequently in the causes that could lead to deregulation of the blood pressure. We see for future research several ways of improvements, as explained in the following. In spite of the minimalist assumptions concerning the structure of the coupling, and just one value of *ε* in order to avoid the ambiguities stated in [59], our results give useful information for the classification and are similar to those obtained in [20], thus validating our method. The consideration of dynamic structures of the coupling (i.e. temporal variations in the coupling structure) could improve our results and also give us a deeper insight in the underlying physiological processes. For that, it is necessary to design an adaptive algorithm taking into account possible transitions in several time windows. This method could be also useful as an alternative to find the adequate structure of coupling and thus minimizing poor performances of the specific coupling structures to describe some casual relations. We are also aware of the major role of respiration [60] and related to the understanding of PE pathogenesis [54]. Therefore, adequate data processing, including respiration, should improve our results. Additionally, in order to improve the accuracy and applicability of the method, further complementary measures of causality could be performed either using Granger causality or convergent cross mapping, a new technique applied successfully to real ecological systems [61]. Finally, the method used in this study offers some additional advantages such as the ability to cope with short time series and showing that it is not mandatory to have observations equally spaced in time.

The used classification method might also be improved considering some other consistent network measures for a better characterization of the phase-space properties or combining with RQA that could give additional insights into the recurrence structure of the underlying dynamical system. The quantification of the motifs [62] consisting of three or four nodes in the underlying networks constitutes another aspect that could be considered in the classification. This quantification seems to be a feasible tool to understand the dynamics of certain systems and deserves a deeper study. The combination of these methods and adequate interpretation could also help to a better comprehension of the related physiological aspects.

In our tests, we have observed that the discriminant analysis could even be improved by considering shorter time series and other types of coupling structure such as 013, 014, etc. (results not shown here). Nevertheless, we do not report these results because we prefer to keep on the complete time series and the assumptions concerning the coupling structure and, consequently, the consistency of the physiological interpretations.

The consideration of other qualitative aspects related to the history of the patients (age, ethnicity and body mass index) [63] and in general predisposing factors such as genetic [64], behavioural [65] or environmental [66] could give additional information to improve the classification analysis combined with the technique used in this paper.

Our study follows the same line as previous works [20] in which the biosignal analysis (in our case, the associated recurrence complex network analysis) constitutes a non-invasive, cheap and universal diagnostic approach whose utilization offers new possibilities both in the understanding of PE pathogenesis and for envisaging new therapeutic strategies. Moreover, the method used in this work has similar performance at the level of classification to other ones. The novel aspects considered here are the inclusion of the coupling structure and the fact that few parameters (the network measures) can give us suitable classification results.

## Acknowledgements

This work has been financially supported by the German Academic Exchange Service (DAAD), by the Deutsche Forschungsgemeinschaft (KU 837/20-1, KU-837/29-2, KU-837/34-1), by the Federal Ministry of Economics and Technology (FKZ KF2248001FR9) as well as the European projects EU NEST-pathfinder and BRACCIA.

## Footnotes

One contribution of 13 to a Theme Issue ‘Assessing causality in brain dynamics and cardiovascular control’.

- © 2013 The Author(s) Published by the Royal Society. All rights reserved.