## Abstract

Ordinal symbolic analysis opens an interesting and powerful perspective on time-series analysis. Here, we review this relatively new approach and highlight its relation to symbolic dynamics and representations. Our exposition reaches from the general ideas up to recent developments, with special emphasis on its applications to biomedical recordings. The latter will be illustrated with epilepsy data.

## 1. Introduction

Symbolic representation is an efficient technique in time-series analysis. For a symbolic representation to be useful, it must be simpler to analyse than the original time series, while still retaining the information one is interested in. This procedure is familiar from linear time-series analysis, where data binning is the first step in any statistical analysis. In this paper, we will focus on nonlinear time-series analysis [1,2] because of the kind of applications under investigation. Thus, our series will be sequences of real numbers obtained by sampling a complex dynamics and, more specifically, a biological process evolving in time. This is the realm of dynamical systems, the theory that models and studies time-varying phenomena to characterize different behaviours.

Consider a discrete-time dynamical system, i.e. a difference evolution law defined on a certain space whose points represent the states of our system. Given an initial state, the evolution law determines unambiguously the state of the system, step by step, at any future time. The result is the trajectory of the initial state, with each state in the trajectory being reached from the previous one in one time step. At first glance, the time-discrete approach seems to be rather restrictive; however, the approach includes equidistant discretizations of time-continuous dynamical systems as given by autonomous differential equation systems, and Poincaré sections of time-continuous dynamical systems.

A working hypothesis of nonlinear time-series analysis is that the trajectory is unfolding in an attractor, possibly a chaotic one (at least, after an initial transient), so that stationarity, existence of an invariant measure and ergodicity may be taken for granted. Typical scopes of a nonlinear time-series analysis include the reconstruction of the attractor (in particular, the determination of fractal dimensions), and the computation of Lyapunov exponents.

Concurrently or alternatively, one may opt for a symbolic representation of the observations, depending on the scope and actual possibilities of the data analysis. In this approach, the state space is partitioned into a finite number of pieces, and the original trajectories are traded off for trajectories with respect to that partition. These so-called coarse-grained trajectories turn out to be realizations of a stationary random process with a finite alphabet. As a result, the analyst can now gain information from the underlying system in a more accessible way. In particular, recourse to symbol distributions and information-theoretical tools such as entropy and entropy-like quantities have a long tradition in biomedicine since MacKay and McCulloch [3] studied the information conveyed by neuronal signals. More recent contributions of symbolic representations are concerned with heart rate complexity and variability [4–8].

In this paper, we are going to concentrate on *ordinal symbolic representations*, i.e. representations whose symbols are ordinal patterns or, for that matter, permutations. Ordinal patterns were introduced by C. Bandt and B. Pompe in their foundational paper on permutation entropy [9]. An ordinal pattern of length *L*≥2 (or ordinal *L*-pattern) is a permutation displaying the rank order of *L* consecutive entries in a random or deterministic time series, while permutation entropy is the Shannon entropy of the resulting probability distribution. Therefore, ordinal patterns are not symbols *ad hoc* but they actually encapsulate qualitative information about the temporal structure of the underlying data. Ordinal symbolic analysis, i.e. the analysis of ordinal representations, has some practical advantages. First of all, it is in general acknowledged as being conceptually simple and computationally fast. Furthermore, ordinal patterns, being defined by inequalities, are relatively robust against observational noise. Let us mention in this regard that ordinal patterns of deterministic signals are more robust than those of random signals due to a mechanism called dynamic robustness [10, §9.1]. Another practical advantage is that the calculation of ordinal patterns does not require a knowledge of the data range, which is very useful in real-world data analysis. For practical and computational aspects of ordinal patterns, see [11,12].

Since its conception in 2002, ordinal symbolic representation has found a number of interesting applications in science and engineering. (For a large collection of related papers with further references, see [13].) Of this diversity, we are going to consider in the present paper only applications to biomedical recordings. Bearing in mind the scientific and social impact of biomedical research, and the constant need for ever better diagnostic tools, it is not surprising that the analysis of electroencephalograms (EEGs) of patients with epilepsy was among the very first applications of ordinal patterns and permutation entropy [14,15]. Since then ordinal symbolic analysis has remained a popular method in biology and medicine, especially when it comes to distinguishing abnormal from normal health conditions in real time. Specific examples include epilepsy [16–24], anaesthesia [25–35], cardiology [36–41], event-related potential (ERP) analysis [42,43], Alzheimer's disease [44], sleep research [45] and brain–computer interfaces [46]. For a review of biomedical (and econophysics) applications of permutation entropy, see also [47].

In the following review of ordinal symbolic dynamics, the interested reader will find an account of the concepts, methods and potentialities of the ordinal symbolic analysis of biomedical recordings. This is accompanied by illustrations using data from epilepsy research.

In order to present the material in a systematic and comprehensive way, the rest of this paper is organized as follows. In §2, we give an overview of dynamical systems, symbolic dynamics and symbolic representations. We intend hereby not only to set the mathematical framework to symbolic representations, but also to integrate ordinal representations in the bigger picture. Section 3 presents some methods of visualizing the ordinal pattern distributions of a time series. Visualization is a useful way of displaying information so that interesting events can be easily recognized, as well as a complement to other, more quantitative tools. The latter, in the form of ordinal pattern-based complexity quantifiers, are considered in §4. They include permutation entropy along with other entropy-like measures such as symbolic transfer entropy, sorting information transfer, complexity coupling coefficients, transcript-based entropies and information directionality indices, regularity parameters, etc. The applicability and efficiency of most of them is still the subject of intense research, as evidenced by the many recent references given in the text. The paper ends with a summary in §5.

## 2. Symbolic dynamics

The beginning of symbolic dynamics is usually traced back to Hadamard's [48] coding of closed geodesics by symbolic words. From the time when Morse & Hedlund [49] first used the name ‘symbolic dynamics’ to present days, it has developed to a stand-alone field with a strongly formalized theory and widespread applications, in particular to time-series analysis. For the history of symbolic dynamics, see also [50], and for an overview of applications, see [51].

In the following subsections, we provide a somewhat formal account of symbolic representations via symbolic dynamics. The aim is to highlight the relationship between symbolic representations and their original time series in the setting of nonlinear dynamics, as well as to put ordinal symbolic representations in the right mathematical perspective.

### (a) Symbolic representation of time series

To motivate symbolic dynamics, consider the dynamics generated by a self-map *f* of a subset *Ω* of a (low-dimensional) Euclidean space. The dynamics is introduced in the *state space* *Ω* via the repeated action of *f* on *Ω*. Given *ω*_{0}∈*Ω*, the (forward) *orbit* or *trajectory* of *ω*_{0} under *f* is defined as
where , *f*^{0}(*ω*)=*ω* and *f*^{n}(*ω*)=*f*( *f*^{n−1}(*ω*)). In the particular case that *f* is invertible, one can also consider the full orbit (…,*f*^{−2}(*ω*_{0}),*f*^{−1}(*ω*_{0}),*ω*_{0},*f*(*ω*_{0}),*f*^{2}(*ω*_{0}),…). Since in time-series analysis data acquisition starts necessarily at a finite time, we consider henceforth only the general case of forward orbits. The name ‘orbit’ clearly hints to the interpretation of the iteration index *n* as discrete time: each application of *f* on the point *ω*_{n}=*f*(*ω*_{n−1})∈*Ω* updates the state of the system.

Furthermore, to obtain interesting pay-offs one often supposes that *Ω* is endowed with a probability measure *μ*, modelling the distribution of the states of a system. In this extended setting, the time evolution of the system should preserve the probability of the ‘events’ (measurable subsets of *Ω*), i.e. *μ* is supposed to be * f-invariant* in the sense that

*μ*(

*f*

^{−1}

*B*)=

*μ*(

*B*) for any event

*B*.

If a dynamics is complicated, we might content ourselves with a ‘blurred’ picture of the orbit behaviour. This can be done as follows. Divide *Ω* into a finite number of disjoint pieces *A*_{i}, *i*=0,1,…,*k*−1, and keep track of the trajectory of *ω*∈*Ω* with the precision set by the *partition* *α*={*A*_{0},…,*A*_{k−1}}. Specifically, we assign to *ω* a sequence
the *n*th entry *i*_{n}∈{0,1,…,*N*−1} telling us in which element of *α* the iterate *f*^{n}(*ω*) is to be found. We call **S**(*ω*) the *itinerary* (or address) of *ω* with respect to the partition *α*. Formally,
for all . Note that **S**(*ω*) depends, in general, on the partition *α* chosen, so we write whenever convenient to stress that dependency. In principle, the relation between points *ω* and their symbolic representations will be many-to-one, i.e. several orbits may have the same symbolic representation, unless *α* is a generating partition (or a generator): given an *f*-invariant probability measure *μ*, the partition *α* is said to be a generator of *f* if on a set of probability 1 the relation between states *ω*∈*Ω* (hence, orbits ) and their itineraries **S**^{α}(*ω*) is 1-to-1.

In the case that the partition *α* is not generating, one can still use the symbolic representations for practical purposes. For example, one can use the statistics of the different symbols along single symbolic time series or samples thereof to characterize different dynamical behaviours or regimes of the data source. Another typical approach to the same objective is to calculate the entropy of the symbolic representation. Since this is a finite-alphabet message, one may resort to Shannon entropy or, of course, to other fancier proposals such as permutation entropy (see below).

### (b) Entropy

It turns out then that the maps , sending *ω* to the *n*th component of its itinerary **S**^{α}(*ω*), define a stationary stochastic process , called the *symbolic dynamics* of *f* with respect to the partition *α*, for which elements *A*_{i}, 1≤*i*≤*N*−1, are supposed to be measurable. The joined probability functions are given by (or derived from)
2.1
We denote this random process by **S**^{α}. In this way, the symbolic representations **S**^{α}(*ω*)=(*i*_{0},*i*_{1},…) become realizations of the stochastic process **S**^{α}.

Thus far, we set out from a (measure-preserving) dynamical system, possibly with a complex dynamics, and have coarse-grained the dynamics (via a finite partition of the state space) to obtain a stationary random process—a seemingly simpler environment.

In information theory, perhaps the most important quantity one can attach to a stationary random process (also called information source in this setting) is its entropy, which measures the average information per unit time conveyed by the outcomes of the process. For an account of entropy see, for example, [52].

Owing to (2.1), the *Kolmogorov–Sinai entropy* of *f* *with respect to the partition* *α*,
2.2
and the *Shannon entropy* of the random process **S**^{α},
coincide:
2.3
As the logarithm base we take Euler's number *e*. The other usual choice is base 2. Clearly, the information we obtain from the original dynamical system via the random process **S**^{α} depends on the partition *α*. The famous Kolmogorov–Sinai theorem, however, says that if the partition *γ* is a generator of *f* then
where the maximum on the right-hand side is taken over all finite partitions of *Ω*. Consideration of the case with no generating partitions requires this maximum to be substituted by a supremum to get a partition-independent quantity, called the *Kolmogorov–Sinai* (KS) *entropy* of *f*,
which turns out to be an invariant for the concept of equivalence in dynamical systems. As a result of (2.3), if *γ* is a generating partition, then
and, therefore, all the information derived from it will be intrinsic to the system (and only then!). The KS entropy *h*_{μ}( *f*) measures the (pseudo-)randomness of the dynamics generated by *f*.

As a simple example, consider the tent map defined as
See figure 1*a* for its graph. It is known that the Lebesgue measure *λ* (i.e. the usual concept of length on the real axis) is invariant under *Λ*. If we choose the partition *α*={*A*_{0}, *A*_{1}}, where
(to which subset the middle point belongs is irrelevant), then the symbolic representation of any orbit under *Λ* will be a binary sequence. Moreover, it can be proved that *α* is a generator of *Λ*, that is, the relationship between the initial conditions *ω*_{0}∈[0,1] and the binary sequences **S**^{α}(*ω*_{0}) is 1-to-1 (except for a countable subset of points). For instance, if **S**^{α}(*ω*_{0})=(0,1,…), then . More generally, the knowledge of *k* bits of **S**^{α}(*ω*_{0}) locates *ω*_{0} within a dyadic interval [*m*/2^{k},*m*+1/2^{k}] of length 2^{−k}, 0≤*m*≤2^{k}−1, and every further digit in **S**^{α}(*ω*_{0}) halves the previous interval, pinpointing *ω*_{0} up to a precision 2^{−(k+1)}. The corresponding KS entropy of the tent map can be calculated by means of the generator *α*,
This is the maximum randomness a variable (or an independent and identically distributed process) with two outcomes can have, the prototype being the tossing of a fair coin.

Generators happen to exist under rather general conditions, though they are explicitly known only for a few maps. In some cases, they can be constructed numerically. In any case, as already pointed out in the previous section, the scope of symbolic dynamics is usually more modest. In general, no invariant characterization of the underlying dynamics is sought, but a method to classify or filter out different dynamical behaviours. In such cases, the computation of partition-dependent complexity measures, such as *h*(**S**^{α}), might be sufficient.

### (c) The ordinal view

In the general setting of symbolic dynamics, we have considered a finite partition of the state space leading to a symbolization. Now we are going the other way around. The symbols will be ordinal patterns of length *L*≥2, describing the order structure of vectors taken from a time series.

The main idea is to assign to a real vector **v**=(*x*_{0},*x*_{1},…,*x*_{L−1}) a permutation. We say that **v** has the *ordinal*(*L*-)*pattern* *o*(**v**)=**r**=(*r*_{0},…,*r*_{L−1}) if
where, in case *x*_{i}=*x*_{j}, we agree to set *r*_{i}<*r*_{j} if *i*<*j*. Since {*r*_{0},…,*r*_{L−1}} is a reshuffling of the integers {0,…,*L*−1}, the ordinal pattern **r** of a vector **v** can be identified with the permutation sending *i* to *r*_{i}, 0≤*i*≤*L*−1.

Note that the *L*! permutations of {0,1,…,*L*−1} elements, endowed with the composition
build an algebraic group called the *symmetric group of degree* *L*, which we denote by . This fact is used in the quantification of the coupling between time series, as we will discuss later on.

The ordinal representation of a time series is given via ordinal patterns of delay vectors. Given a real time series and some natural number called *delay time*, to each one assigns the ordinal patterns *o*(**v**^{T,L}(*n*)) of the *time delay vectors*

In order to relate this symbolization to measurements taken from a possibly more than one-dimensional state space, let *X* be an ‘observable’ defined on the state space as a measurable real-valued function. If is an orbit in *Ω*, one is interested in the ‘measured’ time series
With
one gets
for all . Thus, a so-called *ordinal partition* of *Ω*,
where
is obtained by classifying points according to ordinal patterns. The symbols obtained from this partition can be considered as the ordinal patterns themselves.

In the situation of ‘directly measuring’ the states of *Ω*, the points *ω*_{i} and *x*_{i} are coinciding, which means that *X* is the identity map. Figure 1*b* shows how in such a situation the corresponding ordinal partition *π*^{1,L} can be determined graphically for *L*=3 (and likewise for other *T* and moderate values of *L*) for the logistic map *f*(*ω*)=4*ω*(1−*ω*), 0≤*ω*≤1. The function graphs of the identity map, of *f* and its second iterates are shown. Since ordinal patterns can only change at points where at least two of the three functions provide the same value, one has only to check such points in order to get the boundary of the partition pieces. In this concrete situation, *π*^{1,3}={*P*_{(0,1,2)},*P*_{(0,2,1)},*P*_{(2,0,1)},*P*_{(1,0,2)},*P*_{(1,2,0)}} with
where the boundary points are neglected.

Note for further reference that no point has the pattern (2,1,0). Such ordinal *L*-patterns, which cannot be materialized by a deterministic dynamics, are called *forbidden patterns* and, as shown in [53,54], they are the rule rather than the exception. Forbidden patterns have been used to discriminate noisy deterministic signals from white noise [55,56].

### Remark 2.1

When associating an ordinal pattern with a data vector **v**^{T,L}(*n*), the case of equal components was dealt with in a special way, having in mind continuously distributed data. In the case of data digitized with a low resolution, it was proposed by Bian *et al.* [57] to assign single symbols to vectors with equality of special components.

### (d) Permutation entropy

As before, let *μ* be an invariant probability measure under *f*. For *L*=2,3,… and and some given ‘observable’ *X*, one is interested in the * L-permutation entropies* with respect to

*X*defined by 2.4 as complexity measures.

In the special situation that *T*=1 all order relations of the elements of an orbit are coded by the ordinal *L*-patterns, *L*=2,3,…. Here we call
the *permutation entropy* of *f* with respect to *X* [58,59].

It can be proved that, under mild mathematical conditions (namely, that *f* is a piecewise, strictly monotonic self-map of a one-dimensional interval), coincides with *h*_{μ}( *f*), the KS entropy of *f* [58], where id is the identity map, i.e. the states of *Ω* are directly measured. Moreover, when *X* separates orbits in a certain sense [60], the sequence of ordinal partitions determines the KS entropy, i.e.
2.5
and
2.6
Let us remark that, in general, none of the ordinal partitions is generating for a given map *f* and that, by Taken's embedding theorem and variants of it, the set of *X* with the separation property mentioned above is large in a certain sense [59,60]. For a comprehensive discussion of the equality between the permutation and the KS entropies, see [59,60]. A different approach to permutation entropy, which turns out to be equivalent to the KS entropy, was studied in [61].

### (e) Practical aspects

In the previous subsections, we have dealt with some formal underpinnings of symbolic representations in general, and ordinal symbolic representations in particular, and their connections to random processes and dynamical systems. In practice though, when analysing empirical data output by a, say, physical or biological system, much of the above framework is missing or hidden to the analyst. Let us mention some typical issues in this regard.

— Although the working hypothesis is that the system is deterministic, the evolution law of the observed dynamics is virtually always unknown.

— Real datasets are finite and very often too short for good statistical estimations of probability distributions. In biological systems, time series need to be short to guarantee stationarity.

— Other usual features of experimental data that one should take into account are noise, instability, and temporal multi-scalarity.

Note that these shortcomings are common to any sort of time-series analysis, whether linear or nonlinear. Additionally, symbolic representations introduce parameters in the analysis through the coarse-graining partition. This calls for a stability analysis of the results with respect to the parameters in some applications.

*Ordinal time-series analysis* is based on the ordinal pattern distribution obtained from the data. Moreover, on the model side, the data are supposed to be output by a nonlinear dynamical system. In doing so, we approximate probabilities by relative frequencies under the assumption that the considered orbits are ‘typical’. These relative frequencies are called *empirical probabilities* in linear time-series analysis, and *natural measures* in nonlinear analysis. They are by construction ergodic (which means precisely that one may equate probabilities and time averages). Usually nonlinearity is checked by means of surrogate tests (e.g. [8]).

However, a purely descriptive data analysis is common as well. The above approach emphasizes (one-dimensional) time series which are the outputs of a possibly higher dimensional system, leading in particular to the problem of system reconstruction. The delay time *T* is chosen according to some criterion [2,62]. In the standard nonlinear time-series analysis, there are a number of criteria for choosing the parameter *L* (the so-called *embedding dimension*), perhaps the most popular being the false nearest neighbours method [63]. In ordinal time-series analysis, the usual criteria are computational cost and statistical significance in view of the amount of data available.

## 3. Ordinal pattern distributions of empirical data

### (a) Ordinal patterns distributions

As noted above, the basic ingredients in ordinal time-series analysis are the ordinal patterns of a chosen length *L* for some delay time *T*. Given a finite time series
of length *N*, a first important indicator is the distribution of relative frequencies
3.1
of ordinal *L*-patterns **r** observed in a time series or in a (time-dependent) sample of it.

These frequencies have been employed in a variety of classification studies. For example, single ordinal pattern frequencies have been considered in electrocardiograms [37,38]. Schindler *et al.* [24] and Rummel *et al.* [64] have observed higher numbers of forbidden ordinal patterns during epileptic seizures, and Ouyang *et al.* [19] have shown a significant growth of forbidden ordinal patterns before absence seizures in rats.

Further insights into biomedical data have been obtained by considering ordinal pattern distributions or special properties of them. Particularly, frequencies of single ordinal patterns are of some interest. For example, Graff *et al.* [40] have discussed the recently found physiological phenomenon of heart rate asymmetry on the basis of ordinal pattern frequencies, providing new views on this topic. In particular, they present a study which shows significant frequency differences for some ordinal 4-patterns and their time reversals in the heart rate data of healthy subjects.

Taking into consideration the symbolic nature of the data obtained and the kind of problem considered, many ideas of nominal statistics and information theory can be applied. For example, Keller & Wittfeld [16] have employed a generalized correspondence analysis in order to visualize changes in the similarity between the ordinal pattern of EEG components, and Cammerota & Rogora [36] have introduced a symbolic independence test for investigating atrial fibrillation on the basis of heart rate data.

Below we will discuss commonly used quantifiers based on ordinal pattern distributions of whole time series or parts of them.

### (b) Useful representations of ordinal patterns

Besides extracting information from given data via certain quantifiers, the visualization of ordinal pattern distributions can provide interesting insights into the dynamics of time series and systems behind them and is recommended as the first step of analysis. We want to use two visualization methods here. For this, we first describe a natural enumeration of ordinal patterns with some convenient properties.

Given an ordinal *L*-pattern **r**=(*r*_{0},*r*_{1},…,*r*_{L−1}), for *l*=1,…,*L*−1 let
It is easy to see that **r** can be reconstructed from (*u*_{1}(**r**),…,*u*_{L−1}(**r**)) and that each finite sequence in {0,1}×{0,1,2}×⋯×{0,…,*L*−1} corresponds to an ordinal pattern. From this, similarly to Keller *et al.* [65], one obtains that the assignment
3.2
is a bijection from the set of ordinal *L*-patterns onto {0,1,…,*L*!−1}. We call *y*(**r**) the *number representation* of **r**. Table 1 illustrates the bijection described for *L*=3.

In order to see that the number representation of ordinal patterns obtained in this way is rather natural and well adapted to the analysis of a time series, consider a time series of real numbers. For and , let a sequence
be defined by
for . Fixing some *L*, then by construction
for all *l*=1,2,…,*L*−1; hence is the number representation of the ordinal pattern of **v**^{T,L}(*n*) (compare (3.2)). Division by *L*! provides the number
3.3
converging to
for . Both *y*_{T}(*n*) and the sequence **u**_{T}(*n*) contain all information about the ordinal structure of the delay vectors **v**^{T,L}(*n*), , and can therefore be considered as the ‘infinite ordinal pattern’ of the time series at time *n* for delay time *T*. Note that for white noise processes the ‘infinite ordinal patterns’ considered as numbers in [0,1] are theoretically equidistributed on [0,1].

Figure 2 shows the well-known *Feigenbaum diagram* (*a*) together with some related graphics (*b*) based on ordinal symbolic analysis. Recall that in the Feigenbaum diagram typical orbits of the quadratic map *x*↦*rx*(1−*x*) on the interval [0, 1] are drawn in the vertical direction in dependence on *r*. We took equidistant values of the parameter *r* between 3.5 and 4 and kept in each case the iterates *x*_{1001},*x*_{1002},…,*x*_{2000} of some randomly chosen initial point *x*_{0}. In the second graphic instead of the iterates themselves their ordinal patterns *y*_{1,7}(1001),*y*_{1,7}(1002),…,*y*_{1,7}(2000) as numbers in [0, 1] are shown. Note that for many values of *r* one sees a very thin ‘attractor’, better unveiling determinism than in the Feigenbaum diagram.

Another interesting visualization of dynamical structure is provided by *recurrence plots*, which have ordinal analogues showing interesting details of the recurrence structure of a time series. Such order-recurrence plots have been considered by Groth [43] and Schinkel *et al.* [42] and applied to ERP data, particularly with the success of reducing the number of trials necessary.

### (c) Illustration by two electroencephalographic examples

To illustrate ordinal pattern distributions in real-world data, we consider two datasets from the European Epilepsy Database [66] (figures 3 and 4). The upper plot (*a*) in both figures represents raw EEG data recorded at a sampling rate of 256 Hz; in each the duration of an epileptic seizure is marked in grey. Figure 3 illustrates scalp data of a 52-year-old female patient with epilepsy caused by inflammation. The data were recorded from electrode F4, and the seizure is a partial one registered during sleep stage I. Figure 4 illustrates scalp data of a 54-year-old female patient with epilepsy caused by hippocampal sclerosis. Here the data were recorded from electrode C4 and the seizure is a complex partial one registered during sleep stage II. The following plots are aimed at reflecting changes in EEG data before and during the seizure by visualization of ordinal pattern distributions.

Plot (*b*) represents the numbers *y*_{1,7}(*n*)∈[0,1] (see (3.3)) coding ordinal patterns in dependence on the time *n*. Since the representation is very truncated in the time direction, time-localized ordinal pattern distributions and changes in the distribution are well illustrated.

In plot (*c*), for *T*=1 the relative frequencies of all ordinal 3-patterns in a sliding window of 2 s are represented. The plot has to be read as follows: enumerate the curves including the baselines at 1 and 0 from above to below by 0,1,…,6. Then, in the window starting at *n*, the relative frequency of the ordinal pattern with number *k*=0,1,…,5 according to table 1 is the vertical distance of the *k*th and *k*+1th curve at time *n*.

Interesting aspects shown by figures 3 and 4 are the changes in ordinal pattern distribution at the beginning of epileptic seizures, and at 2240 s (figure 3*b*) and 1870 s (figure 4*b*) accompanied by changes (figure 3) in relative frequencies of up patterns (0) and down patterns (5), respectively. Although in figure 3 the EEG returns relatively quickly to the ‘normal’ level, in figure 4 the ‘relaxation’ seems to take much longer.

## 4. Ordinal complexity quantifiers

We proceed to present a few quantifiers or classifiers in ordinal symbolic representations that have been employed in the literature to characterize and/or discriminate different dynamical behaviours. Such quantities are also called indicators or markers.

### (a) Permutation entropy

#### (i) The original concept

Historically, the first ordinal dynamic indicator was the permutation entropy of order *L* constructed with the frequencies (3.1) and to be considered as an estimator of (2.4),
4.1
Indeed, (4.1) with *L*=2,3,4 has already been used in the foundational paper of permutation entropy [9] to analyse speech signals.

It did not take long for further applications to appear. Indeed, just 2 years later Keller & Lauffer [14] discussed changes in brain dynamics related to vagus stimulation on the basis of permutation entropy of EEG data. About the same time, Cao *et al.* [15] used *h*^{T,L} to find that the EEGs of patients with epilepsy are usually less complex during epileptic seizures than in the normal state. Since then permutation entropy has remained a popular biomarker of epileptic seizures. Thus, Veisi *et al.* [23] discussed the classification of normal and epileptic EEGs, Nicolaou & Georgiou [21] considered permutation entropy for automatic detection of epileptic seizures and Li *et al.* [17] even used it for predicting them. Also note that Ouyang *et al.* [20] applied permutation entropy and a special dissimilarity measure for detecting changes in the dynamics of data from rats with absence epilepsy, and Bruzzo *et al.* [18] included the problem of vigilance changes in the discussion of distinguishing epileptic and normal EEG epoches.

In anaesthesia, permutation entropy of EEG data is being used or tested increasingly as an anaesthetic depth measure for evaluating anaesthetic drug effects ([27,29,31]; for the concept of multi-scale permutation entropy, see below and [28,34,35]) or for distinguishing between consciousness and unconsciousness [26,33]. In particular, the performance of permutation entropy has been compared with that of other measures such as approximate entropy, sample entropy and index of consciousness with mainly good or best results [26,27,29–32,34,35]. Note that in [32] local field potentials were considered instead of EEGs, and that good results with permutation entropy were obtained after some burst suppression correction. Burst suppression patterns occur in EEGs in inactivated brain states.

Nicolaou & Georgiou [45] report that different sleep stages correspond to significantly different permutation entropies in the pertinent EEG segments, showing potential in automatic sleep stage classification. Moreover, permutation entropy has been used in a brain–computer interface system for mental task classification [46] and in exploring heart rate variability [39,57]. In Cysarz *et al.* [39], three specially constructed symbolic representations and ordinal representations were considered in order to reflect changes in the cardiac autonomic nervous system during head-up tilt with nearly the same results.

It is not surprising that permutation entropy and similar measures are applied to physical data because such data are characterized by special underlying patterns often related to certain states of a system (e.g. spike-and-wave patterns related to epilepsy, special sleep patterns such as sleep spindles, and burst suppression patterns related to inactivated brain states, or Wolff–Parkinson–White patterns in abnormal electrocardiograms). It seems that ordinal patterns are appropriate for capturing structures containing such patterns and also abrupt changes in their distributions. Here because of the assumed nonlinear nature of the systems behind the data, the idea of pattern complexity is indicated.

Again in figures 3 and 4, note that plots (*d*) reflect changes in the permutation entropy *h*^{1,3} computed in sliding windows of 2 s. Relations to the plots (*b*,*c*) are obvious.

#### (ii) Variants of permutation entropy

There are some quantifiers similar to the permutation entropy. Liu & Wang [67] have defined a *fine-grained permutation entropy*. In contrast to a modification of permutation entropy by Bian *et al.* [57] based on considering extra symbols in the case of equal components of the corresponding delay vector as mentioned above, they divide ordinal patterns into more symbols regarding distances between vector components.

The use of different delay times in permutation analysis gives some scale-dependent information on a time series. Another approach for gaining such information is the *multi-scale permutation entropy* invented by Li *et al.* [31] in anaesthesia. Here a given time series is averaged within non-overlapping windows of different lengths, and the permutation entropy is determined as a function of the window length. Then a special measure combining this information is compared with usual permutation entropy. The authors report that transitions between light and deep anaesthesia are better reflected by the new measure.

For the analysis of Alzheimer's disease, Morabito *et al.* [44] combine the new approach with the idea of pooling ordinal patterns from different EEG channels, first considered in [14]. Here the distribution of all ordinal patterns obtained, independent of their source, is investigated.

### (b) Similar quantifiers

In the case that *h*^{1,L} stabilizes for increasing *L*'s one can even take *h*^{1,L} for sufficiently large *L* to estimate the permutation entropy of the data source. Note in this connection that algebraic correction terms to finite size effects can be found in [68,69]. Numerical simulations show that, even in the simple case of logistic maps, where permutation entropy and KS entropy coincide, the stabilization mentioned is very slow. For this reason, Unakafov & Keller [70] (for *T*=1) have considered the conditional entropy of successive ordinal *L*-patterns, providing better approximation convergency to KS entropy. The latter is illustrated for the logistic family by figure 2*c*, where the Lyapunov exponent, *h*^{1,10} and the conditional entropy of successive ordinal 10-patterns are plotted in black, grey and dark grey, respectively. Having in mind that, by Pesin's theorem, the KS entropy is 0 if the Lyapunov exponent is negative and equal to the Lyapunov exponent (cf. [71]), these graphics show that the KS entropies, hence the permutation entropies, and considered conditional entropies are nearly the same. *h*^{1,10} shows the same tendency as the other quantities but is significantly larger.

Finally, we refer to measures based on counting monotone changes in time series and applied to EEG data in Keller *et al.* [72]. Such measures can quantify ‘roughness’ of data and are, for example, used for a robust estimation of the Hurst exponent of a fractional Brownian motion.

### (c) Ordinal coupling quantifiers

There are, of course, ordinal versions of other entropies and information-theoretical quantities (such as conditional entropies and mutual informations), mainly describing coupling. We do not want to go into detail here since this would fill a complete other paper.

One of the most notorious coupling measures is *symbolic transfer entropy* (actually a conditional mutual information), introduced by Staniek & Lehnertz [73], which is used to measure the information directionality in coupled time series. Symbolic transfer entropy, which can be considered as an ordinal variant of transfer entropy [74] has been used in Li & Ouyang [75] to study the interaction between epilepsy foci. It has been developed in different directions by Kugiumtzis [76,77] (transfer entropy on rank vectors, partial transfer entropy on rank vectors), Pompe & Runge [78] (momentary sorting information transfer) and Nakajima & Haruna [79] (symbolic local information transfer).

Along these lines, *coupling complexity coefficients* and information directionality indicators have also been proposed by means of the so-called ‘transcripts’ between two coupled ordinal symbolic representations [80,81]. Transcripts exploit the algebraic structure of the permutations [82,83] and are useful also for statistical reasons: whereas all combinations of two *L*-patterns provide (*L*!)^{2} possible combinations, the corresponding transcript representation aggregates them to only *L*! combinations. Biomedical applications of transcripts and coupling complexity coefficients can be found in Bunk *et al.* [84] and Amigó *et al.* [80] and Monetti *et al.* [83], respectively. Biomedical applications of the transcript-based directionality indicators is a topic of current research.

Finally, there are other ordinal indicators tailored to specific purposes. Such is the case for the *regularity parameters* introduced in Amigó *et al.* [85] to differentiate among different phases in coupled map lattices and also to apply to the analysis of multi-channel magnetoencephalograms recorded on human scalps.

## 5. Summary

Ordinal symbolic representation provides an interesting and promising approach to the analysis of complex time series, with a fast increasing number of biomedical applications and beyond. The potential of the method in biomedicine ranges from the detection of dynamical state changes of a system through the distinction of its states to state classification, mainly in epileptology, anaesthesia and cardiology. The applicability of ordinal time-series analysis has been demonstrated by many examples.

Let us summarize next some features of ordinal symbolic representations that distinguish them from other symbolic representations and make them special.

To find appropriate partitions in symbolic dynamics, one has to look for natural levels subdividing the range of measurements. In many practical situations, there is no clear indication of such levels. By contrast, ordinal patterns provide a kind of universal partitioning with standard symbols, moreover without knowing the range of measurements. Yet, partitions are not given

*a priori*on the state space but based on the dynamics. The latter is not a practical problem, though, since one is only interested in ordinal patterns as the symbols obtained from the data in a simple way. There are situations, however, in which ordinal methods might be inapplicable without adjustments. For example, close delay vectors can provide very different ordinal patterns, this being a reason for the ordinal pattern modification proposed by Liu & Wang [67].The symbols in ordinal symbolic dynamics are easily interpretable. For example, one has ‘upward’ and ‘downward’ ordinal patterns and ordinal patterns describing monotone changes. There are moreover symmetries in the ordinal pattern distributions related to time and data range reversal. Both can be exploited when considering special order-related aspects of data analysis; for example, for describing roughness by change statistics [86] and for quantifying the phenomenon of heart rate asymmetry [40] described above.

Finding convenient generating partitions in dynamical systems is an unsolvable problem in many cases. However, as noted above, the KS entropy can be approximated on the base of ordinal partitions for sufficiently large pattern length

*L*. This is important at least from a conceptual view point, but in practice*L*can be too large to estimate the KS entropy accurately. Nevertheless, ordinal analysis with a small*L*is useful in descriptive data analysis. The fact that conditional variants of permutation entropy seem better estimators of the KS entropy than the original concepts is certainly worth further research [70].As permutations, ordinal

*L*-patterns are elements of the symmetric group of degree*L*. This additional algebraic structure of the symbols allows applications to the analysis of coupled dynamics [80,81].

Ordinal symbolic analysis is a relatively new field and many methods are not consolidated yet. Future research should particularly focus on the development of statistical models and methods aimed at showing the significance and reliability of the results obtained by ordinal analysis. There is moreover some need for systematic studies comparing different complexity and coupling measures, on both the ordinal and metric level.

## Funding statement

J.M.A. acknowledges the financial support by the Spanish *Ministerio de Economía y Competitividad*, grant no. MTM2012-31698. V.A.U. was supported by the Graduate School for Computing in Medicine and Life Sciences funded by Germany's Excellence Initiative (DFG GSC 235/1).

## Footnotes

One contribution of 12 to a theme issue ‘Enhancing dynamical signatures of complex systems through symbolic computation’.

- © 2014 The Author(s) Published by the Royal Society. All rights reserved.