## Abstract

Ongoing developments in cardiac modelling have resulted, in particular, in the development of advanced and increasingly complex computational frameworks for simulating cardiac tissue electrophysiology. The goal of these simulations is often to represent the detailed physiology and pathologies of the heart using codes that exploit the computational potential of high-performance computing architectures. These developments have rapidly progressed the simulation capacity of cardiac virtual physiological human style models; however, they have also made it increasingly challenging to verify that a given code provides a faithful representation of the purported governing equations and corresponding solution techniques. This study provides the first cardiac tissue electrophysiology simulation benchmark to allow these codes to be verified. The benchmark was successfully evaluated on 11 simulation platforms to generate a consensus gold-standard converged solution. The benchmark definition in combination with the gold-standard solution can now be used to verify new simulation codes and numerical methods in the future.

## 1. Introduction

Underpinned by improved numerical methods and increased performance per unit cost of computing, simulations of electrical activation in cardiac tissue have advanced from analysing emergent phenomena in simplified systems and generic excitable media to detailed representations of physiology and pathologies within the heart [1].

The large and active community focused on this work, including groups from within the Virtual Physiological Human (VPH) [2] and International Union of Physiological Sciences (IUPS) Physiome [3] projects, has meant that these simulation frameworks have progressed to the point where they now provide some of the most advanced current exemplars of multi-scale organ models. Specific examples include organ-scale cardiac electrophysiology models developed to represent both specific species and diseases [4]. Most recently, this work has been extended to focus also on the clinically relevant application of these models, including the *in-silico* evaluation of drugs [5], patient selection and treatment using implantable devices [6,7].

The impact of cardiac electrophysiology simulations across an expanded range of applications has seen an increase in model complexity through increasing numbers of equations to describe cellular and sub-cellular functions [8], region-specific parameter sets for cell models [9,10] and high-resolution meshes to capture anatomical detail [11]. Distinct from the development of new models is the corresponding development of simulation codes that solve numerically the equations of a given model. These codes have also increased in complexity to represent new models, as well as to incorporate advanced numerical techniques [11,12], and exploit opportunities provided by increasingly complex computer architectures, including graphics processing units [9] and peta-scale systems [10]. These developments in turn have produced a growth in the number of software codes available to perform these simulations.

This process of code platform development in parallel by multiple groups has led to a wide range of numerical techniques and solution strategies [13]. Furthermore, each software code is designed for a specific purpose, with customized input and output file formats, specific mesh structures and numerical schemes for the governing equations. This variety and complexity in software implementations poses significant challenges for the verification of cardiac tissue electrophysiology simulation codes, which invariably limits the subsequent validation of model results. Model validation and code verification are two distinct, but equally important, steps in evaluating the quality of model simulation results. As models are intrinsically approximations, validating a model by comparing the results of simulations against experimental measurements assesses the ability of a model to capture the salient features of the system under study [14,15]. In contrast, verification determines if a particular simulation code provides a reliable representation and numerical solution of the underlying equations. Verification is therefore independent of and distinct from the validity of the model equations or parameter sets. This is important because the incorrect implementation of a particular model in software will give useless and potentially misleading output regardless of how well validated the model equations are against experimental data. Hence, an essential step prior to the validation of a model must be verification that the equations are correctly represented and solved by the code. Otherwise, spurious effects can appear, such as waves pinning to the lattice or wave breakup arising only from inadequate spatial resolution [16].

While model validation has been a central focus in cardiac electrophysiology studies for the evaluation of model quality and applicability to a specific question, limited attention has been paid to the verification of the simulation codes that underpin these results. The complexity and diversity of simulation codes used within the cardiac tissue electrophysiology community to represent detailed models, implement novel numerical techniques and exploit advanced machine architectures now motivate the need to verify each code independently to ensure that it provides a faithful representation of the governing equations that it aims to solve. The need to provide community standard quality controls will become increasingly important as cardiac tissue electrophysiology simulator codes move towards direct clinical applications and need to meet the corresponding stringent regulatory requirements. As evidenced by the routine evaluation of novel simulation approaches against benchmarks and analytical solutions within other simulation communities, this context is not unique to either computational cardiac electrophysiology or the VPH project. However, in contrast to other fields, to date there have been few attempts to systematically compare results derived from different numerical approaches and implementations, which have been developed in parallel with the aim of achieving a common objective.

The goal of this study was therefore to begin the verification of 11 cardiac tissue electrophysiology simulation codes. Code verification is achieved through the evaluation of the convergence properties in space and time for a standardized benchmark problem. The use of a standard problem definition means that all simulation codes should converge to the same solution as the space and time steps are reduced, regardless of the numerical method, computer architecture or coding language, if the numerical methods are valid, and the model equations and the numerical methods are correctly implemented. In §2, we review code verification and discuss a relatively established set of approaches within the novel context of the VPH. This discussion is followed by a description of the necessary characteristics and definition of the benchmark problem. The last section compares the solutions from the 11 codes.

## 2. Code verification

The challenges of code verification are not unique to the field of cardiac electrophysiology simulations. The fields of astronomy, fluid dynamics, earth sciences and weather modelling all rely heavily on numerical simulations for the analysis or extrapolation of data. Within these fields, two related strategies have been applied to address the need to verify complex codes developed independently to simulate common systems. The first approach is the development of benchmark problems, which has been applied successfully by the computational fluid dynamics community. Specifically, simulations of lid-driven flow [17], backward-facing step [18,19] and vortex dynamics in the wake of a sphere [20] or cylinder [21] in two and three dimensions are compared against analytical solutions, other simulation results and experimental data. This allows code developers to evaluate their code independently against a combination of actual measurements and high-fidelity simulation results. All of these models are characterized by a limited number of parameters, simple geometries and, in many cases, the ability to validate simulation results against experimental measurements.

An alternative verification strategy is the evaluation of multiple codes simultaneously for a single problem, termed an *N*-version code evaluation [22]. This allows for the simulation of problems with more complex input datasets or geometries. This approach has been applied to evaluate earth science codes for seismic data processing [22], astronomy codes for gas-dynamical simulations of star clusters [23] and meteorology codes for simulating large eddy currents [24]. Unlike the computational fluid dynamics benchmark problems, where experimental data are often available in some form, the *N*-version code comparison studies are typically applied in contexts where simulations are intrinsically unable to be validated against direct observations.

Some studies have aimed to combine the two strategies by proposing a benchmark problem and simulating it on multiple codes at the same time. The Earth mantle simulation community has applied this approach to develop a series of benchmark problems and at the same time compared 12 simulation codes [25]. This strategy has the added benefits that it both compares many current codes and demonstrates that the defined benchmark is reproducible and unambiguously defined.

The unstable dynamics associated with predicting the development and evolution (or suppression) of electrical behaviour intrinsic to arrhythmias and other types of cardiac rhythm disturbances highlights the importance of code verification for cardiac electrophysiology studies. Specifically, the potential for chaotic phenomena in cardiac electrophysiology necessitates an accurate solution of the underlying equations independent of numerical method or software implementation, if the results are to be reproducible. Either the verification strategies defined above verify the simulation code by assuming that, if multiple parallel implementations of a common set of governing equations produce the same answer, then they can be considered true representations of the governing equations, or they combine model validation and simulation code verification by comparing simulation results against a gold-standard experimental solution.

Comparing cardiac electrophysiology simulations against experimental observations is challenging owing to difficulties of extracting comprehensive and unambiguous measurements from current experimental protocols. Furthermore, unlike conventional engineering simulations, where material properties are often well characterized, relatively homogeneous and constant, cardiac electrophysiology is characterized by intrinsic biological variability, heterogeneity and temporal variation with experiment duration. Hence, it can be very difficult to develop an *a priori* cardiac electrophysiology model to accurately predict the outcome of an arbitrary experiment, as is the case in conventional engineering simulations, such as computational fluid dynamics. This motivates the use of an *N*-version simulation code comparison verification strategy for cardiac electrophysiology simulation code. The converged consensus result from multiple simulation codes of a reproducible benchmark problem is used as a gold-standard solution for the verification of the cardiac tissue electrophysiology codes evaluated here. Detailed results are available both in an online database and in the electronic supplementary material to facilitate the verification of cardiac tissue electrophysiology simulation codes in the future.

## 3. Virtual physiological human benchmark requirements

Cardiac electrophysiology benchmark problems need to be agreed upon by the community, be easily communicated and unambiguously defined with a minimal definition, be solvable by multiple software codes, take a limited amount of time and computational resources to perform, be physiologically relevant, and have succinct and unambiguous metrics of model behaviour. In this section, we define a simulation for the benchmark problem with the goal of meeting these criteria by providing the description and motivation for the governing equations, geometry, material properties, boundary and initial conditions, numerical methods and metrics of simulation results.

### (a) Models of cardiac electrophysiology

Cardiac electrical activation can be simulated using various methods, including cellular automata, eikonal mapping, and monodomain or bidomain equations [4]. Although all four methods have valid applications, the most common form of cardiac electrophysiology equations implemented in cardiac electrophysiology simulation are those in the monodomain model.

The monodomain equations are given by
3.1
where *χ* is the surface-to-volume ratio of cells (mm^{−1}), *I*_{ion} is transmembrane current density (μA mm^{−2}), *V* is membrane voltage (mV), *σ* is the conductivity (mS mm^{−1}) and *C*_{m} is the specific membrane capacitance (μF mm^{−2}). *I*_{ion} is defined by a function (*f*) of a set of state variables (*y*) defined by a system of nonlinear ordinary differential equations (*g*). These equations provide a biophysical continuum representation of cardiac electrophysiology in both space and time, linking tissue-scale electrical propagation with cellular dynamics. Representative values for *σ*, *C*_{m} and *χ* are defined from previous studies and are given in §4 [4,26–28]. Multiple models exist to represent the cellular-scale dynamics in the monodomain equations with varying degrees of physiological detail, mathematical complexity and computational cost to evaluate [8]. The benchmark problem uses the ten Tusscher & Panfilov [29] model of human epicardial myocytes, which provides a balance of these requirements. The model includes all the major ion channels as well as intracellular calcium dynamics, it can be represented solely by a limited number (19) of ordinary differential equations (ODEs), and it can be solved using multiple general ODE integrators. The resulting equation set has been widely implemented across the cardiac electrophysiology modelling community and is relatively computationally inexpensive to solve. At the same time, these equations can be used to perform physiologically [29] and clinically relevant simulations [6].

### (b) Geometry of the tissue model

The tissue model geometry for the benchmark was defined to be a balance between physiological relevance, computational tractability and ease of communication. It also contained no bias for a specific numerical scheme. The tissue geometry was defined as a cuboid, with dimensions of 3×7×20 mm. The problem could therefore be solved with equal ease using finite difference, finite volume or finite element methods with tetrahedral and hexahedral meshes. The size of the cuboid was taken to result in a computationally tractable simulation at a refinement of 0.1 mm with approximately 500 000 degrees of freedom that was physiologically relevant and not dominated by boundary conditions. This geometry could be communicated unambiguously and succinctly, and for many codes was a standard automatically generated geometry, which reduced the burden of setting-up and running the simulations.

Ventricular cardiac muscle is thought to consist of myocyte fibres arranged in laminar sheets [30]. This tissue microstructure plays an important role in determining the electrical properties of the myocardium [31]. To represent the microstructure, an anisotropic conductivity tensor can be introduced with its principal axes aligned with fibre orientation described throughout the geometric domain. The majority of codes support the characterization of transversely isotropic conductivities, whereas only a subset support the representation of fully anisotropic conductivities. Thus, our proposed benchmark defines conduction parameters in two directions for a transversely isotropic material consistent with experimental measurement [28,32]. Fibre directions are defined to be orientated in the long (20 mm) axis of the rectangular cuboid. This fibre orientation is easily described and avoids any ambiguity that a more complicated heterogeneous fibre orientation could introduce through the use of both unit vectors and fibre angles to describe the fibre orientation in different codes.

### (c) Boundary conditions and initial values

Simulations of the propagation of an electrical wave of activation across the myocardium require the definition of the boundary conditions, a stimulus current and the initial values of the cell models. For the benchmark, all boundaries were assumed implicitly to have a zero-flux boundary condition. The stimulus current was delivered to a volume of 1.5×1.5×1.5 mm located at one corner of the slab. Defining initial activation in the corner ensured that the principal direction of wave propagation was not aligned with the fibre orientation, and the symmetry of the problem meant that choice of corner did not affect the results. The initial values of the cell models are stated explicitly in the problem definition in §4.

### (d) Numerical methods

To ensure that the benchmark problem was inclusive, the choice of cell model integration, pre-conditioner, matrix solver, element type, trial functions or solution method (finite element, difference or volume) were left open to be defined by each participant. To characterize the convergence in space and time of each of the simulation codes, the benchmark problem was solved using combinations of discretizations of the order of those used in the literature. The problem was solved at three spatial (Δ*x*=0.5, 0.2 and 0.1 mm [33–36]) resolutions, and each spatial resolution was then solved with three different partial differential equation (PDE) time steps (Δ*t*=0.05, 0.01 and 0.005 ms [33,34,36,37]), resulting in nine simulation results in total per simulation code applied to the benchmark problem.

### (e) Problem description

Owing to the use of a state-of-the-art physiological cell model, each code included a large number of equations and parameters. To define the benchmark clearly, we used existing VPH- and Physiome-related tools. Specifically, for cardiac electrophysiology cell models, the standardization of model descriptions through CellML [38] and the publishing of the models in online repositories (http://models.cellml.org/) enabled all participating codes to use an agreed-upon version of the cell model. The remaining problem definition was spelt out in a set of tables summarized below (§4) and made publicly available on the Anatomical Model Database (AMDB; https://euheartdb.physiomeproject.org/AMDBWebInt/) website (formerly euHeartDB [30]). These data cover every aspect of the problem definition and provide a comprehensive template for describing reproducible cardiac electrophysiology simulations.

### (f) Metrics of model behaviour

The quantitative comparison of cardiac electrophysiology models also requires an unambiguous definition of simulation results. The time that the membrane potential passes through 0 mV upon first activation was chosen, as activation times are commonly evaluated in experiments, providing a spatial and temporal metric of code function, the results of which could be stored in a small file size.

## 4. Benchmark definition

A summary of the model definition is provided in tables 1–3 and is drawn schematically in figure 1*a*.

## 5. Benchmark simulations

Eleven codes were included in this evaluation. Each code is given and indexed in table 4. A brief description of the numerical methods for each code is provided in the table.

Participants were invited to submit a solution to the benchmark problem defined above. The simulation results have all been uploaded into AMDB [46] and are publicly available for comparison. Participants submitted their results in their raw format, which were converted to the CMISS graphical user interface (CMGUI) [47] format to allow for a consistent visualization platform. Once a result was analysed, the participant was contacted if their results differed significantly from the other results. This process identified user error and misunderstandings in the problem definition as well as numerical errors in some of the submitted results. This process led to an iterative refinement of the problem description, resulting in a complete definition that provided participants with sufficient information to reproduce the simulations.

### (a) Results

Ninety-nine simulations were performed for the benchmark. To compare these results, we evaluated activation time at points P_{1}–P_{9} (labelled in figure 1*b*), along a line (blue line in figure 1*b*), across a surface (shown in figure 1*b*) and over the whole solution domain. The point-wise solutions provide unambiguous evaluations of the solution and allow for simple comparisons between results. By evaluating and plotting the activation time at point P_{8} in figure 1*b* for all simulations (figure 2), we could evaluate the impact of spatial and temporal refinements on the solution. Notably, this showed that, while other codes achieved a solution for every given combination, codes G, I, J and K failed for the combination Δ*x*=0.1 mm, Δ*t*=0.05 ms, while code D produced an answer that was unphysiological, indicating instability. Figure 2 also shows that codes A and H have significantly less variation in their solutions as the spatial resolution is altered, compared with the other simulation codes.

The activation times at point P_{8} represent an accumulation of any errors as the wave propagated across the cuboid. To determine how this error was distributed in space, we evaluated the activation times on a line across the solution domain (blue line, figure 1*b*) from P_{1} to P_{8}. As the time discretization did not have a significant impact on the point-wise activation times (figure 2), we compared the activation times for Δ*t*=0.005 ms for the three levels of spatial refinement. The majority of the activation time curves have a common morphology, with the activation wave increasing in velocity as the meshes were refined. At low spatial resolutions (blue line, figure 3), most of the codes (B–F and I–K) showed some boundary effects, while these were not seen in code A and are minimal in G and H. The results in figure 3 again highlight the differences in results of codes A and H, with both codes showing less dependence on the spatial resolution. Also for code A the blue line (Δ*x*=0.5 mm) is below the red line (Δ*x*=0.1 mm). Thus, as the mesh was refined, the conduction velocity of the activation wave decreased slightly for this code but increased for all other codes.

The simulations were performed with a transversely isotropic conduction tensor. To show the impact of changing spatial resolution on activation wave curvature, and propagation along and across the preferential fibre direction, we evaluated activation times in the plane shown in figure 1*b*. Figure 4 shows activation times for the highest and lowest spatial resolution with Δ*t*=0.005 ms. These data show that codes A and H were better able to capture off-fibre conduction velocities at courser spatial resolutions, and that codes C, E and F had far greater boundary effects at coarser resolutions.

All of the benchmark simulations were performed using either the finite element method (FEM) or the finite difference method (FDM). Codes that use the same underlying method would be expected to be more similar than codes that use an alternative method. To evaluate quantitatively the difference between simulation results, we determined the *L*^{2} norm of the difference in activation times between simulations with the highest spatial and temporal resolution (table 5). Interestingly, in a number of cases, the simulation results were closest when the simulations were performed using different numerical methods. Specifically, code A, solved using the FEM, was closest to code G, which used the FDM. Similarly, code B, which used tetrahedral finite elements, was closest to code I, which used a regular grid FDM.

### (b) Anatomical Model Database (AMDB) website

To foster dissemination and interaction across the research community, the results from each simulation were stored on and are available from the AMDB website (formally called euHeartDB [46]) in the form of meshes representing the cuboid in the CMGUI [47] format. Each result is a mesh representing the cuboid, with each node recording the time of activation at that point. This allows automated comparisons to be performed online through the website's interface.

An analysis section of the site allows users to select these result meshes and compare them by generating graphs like those in figure 2. The time of activation along the line from the initial corner to the far corner is represented as one line on the graph per result. The graphs for figure 3 were generated using this functionality. In addition to this simple comparison, a second analysis operation collects the activation times of the corners of all selected results, as well as calculating the activation time at the centre of the cuboid.

Lastly, a third analysis performs an *L*^{2} norm comparison between pairs of results that have the same spatial resolution. This provides a quantitative method for comparing results of the same resolution over differing time steps and codes. Combined, these analysis functions allow participants and observers to communicate their results with one another and compare these results in meaningful ways automatically through a common interface.

The online database is extendable and is available for the publication of additional benchmarks in the future.

## 6. Discussion

This study provides the first evaluation of multiple cardiac tissue simulation codes, a quantification of the reproducibility of cardiac tissue electrophysiology simulations and a minimum definition for a cardiac electrophysiology simulation. The benchmark problem and corresponding simulation results from the 11 codes evaluated provide a set of solutions that now enable future cardiac electrophysiology simulation codes to be quantitatively compared and verified. The results from this study are made available online to provide an ongoing resource for evaluating and comparing cardiac simulation codes and the results they produce.

The benchmark aims to characterize both the modelling and the numerical aspects of the simulation codes by simulating a physiological model at multiple spatial and temporal discretizations. Although the benchmark solutions provide a useful resource for the simulation community, the benchmark is clearly relatively simple compared with the complexity of physiological and clinical models. Additionally, there are numerous facets of cardiac electrophysiology simulation that the benchmark fails to evaluate and this limits its capacity to provide comprehensive code verification. For this reason, we are not proposing that the current benchmark is the final answer to cardiac electrophysiology simulation code verification. It is instead hoped that this benchmark is the first of a suite of benchmark problems to be proposed by the community that will collectively enable the complete verification of codes.

More complex and physiologically relevant benchmarks would undoubtedly have significant potential to reveal a wider array of important similarities and differences between simulation results. However, in addition to providing a rich set of metrics for validation, it is important to note that the value of an *N*-version evaluation is equally determined by the number of contributed solutions. For this reason, the current benchmark was necessarily simple, could be computed with limited resources, required unambiguous and easily implemented analysis, and used common equations and cell models to minimize any barriers of participation. The goal of the current study was not to provide the most comprehensive benchmark problem, but to achieve a balance between reaching a critical mass in the number of participants and providing a useful code verification tool. Evidence of this tension between complexity and participation is demonstrated by groups being unable to perform the simulations owing to the time required for set-up, time required to perform the computation, or the computational costs of the simulations. However, demonstration of the value of this benchmarking exercise was immediate, even during the preparation of this paper, when contributors were able to verify their codes and in some cases find as yet unidentified implementation errors. Some of these errors were subtle, such as the use of the wrong units for an input parameter. In other cases, these were more significant, and the inability of the code to replicate the benchmark activation times resulted in new studies into the underlying numerical methods.

Specifically, code A is based on the FEM. With no operator splitting method, the ionic current in equation (3.1) must be evaluated at the Gauss points within the mesh elements to calculate the finite element integrals. This can be achieved by solving the cell models at the mesh nodes and interpolating the ionic current to the Gauss points, or by interpolating the cell model state variables from the nodes and then evaluating the ionic current at the Gauss points. As the ionic current is a nonlinear function of the cell state variables, this will result in two different ionic currents at the Gauss points and hence two different solutions. This subtle difference in implementations can cause discrepancies of up to 8 per cent with a spatial discretization of 0.1 mm. The differences between these methods have now been explored fully in a focused study [48].^{1}

The benchmark also identified codes A and H as notably different from all other simulators. At coarse spatial resolutions, the activation times for these solvers were substantially closer to those obtained with the finest spatial resolution, compared with the other simulation codes. The code A developers suggested that a possible reason for the difference between the results of codes A and H, compared with the other finite element codes, is that neither code lumps the mass matrix during computations, as do most other simulators. To check for this effect, both codes performed simulations with and without mass lumping and demonstrated that activation times increased substantially for coarser resolutions when mass lumping was used.

### (a) Converging simulations

The simulation codes evaluated in this study have been designed for multiple machine architectures and applications. Despite these disparities, the results here show that there is an approximate convergence in the solutions (figures 2–4) at high spatial and temporal discretizations independent of numerical method, code or machine architecture, although two codes were distinctly different (codes A and H, as discussed above). However, on closer examination, we see that the time of last activation at the highest spatial and temporal resolutions still differs across the codes between 37.8 and 48.7 ms. Although the accuracy of a solution required for any given simulation is highly dependent on the problem and the question being asked, the absolute (11 ms) and relative (28.8%) difference between the extremes of the results is large in most contexts. The reason for these discrepancies is not clear, but it may compromise reproducibility of cardiac electrophysiology simulations between different codes in more complex contexts. Specifically, this apparent simulation-code-dependent variability is especially concerning for fibrillation and arrhythmia studies of intrinsically sensitive systems, where small errors can propagate to produce significantly different results.

The comparison of the results of different simulation codes at the highest spatial and temporal refinement implicitly assumes that these solutions are close to the converged answer. To test if the results were converged at a spatial scale of 0.1 mm, three groups performed highly refined simulations to evaluate a true converged result. With a 0.05 mm spatial resolution and a 0.001 ms temporal resolution, the last point of activation was 43.18, 42.76 and 42.53 ms using codes B, I and D, respectively. These codes use significantly different numerical methods (both finite element tetrahedral elements and regular grid finite difference), suggesting that the true solution lies between approximately 42.5 and 43 ms. This means that a number of simulations had errors of greater than 10 per cent at a spatial scale of 0.1 mm, which is half the often quoted value of 0.25 mm required for convergence [49].

The results all clearly show that current temporal discretizations used in previous cardiac electrophysiology studies are sufficient to provide converged solutions, if the model solves (failed simulations for Δ*x*=0.1 mm and Δ*t*=0.05 ms are discussed below). However, compared with the dependence on temporal discretization, the results are highly dependent on the spatial discretization. For all finite difference implementations and the majority of finite element implementations, increasing Δ*x* to 0.5 mm results in highly erroneous solutions. These results have an error of approximately 180–230% (assuming a converged solution of 43 ms). The benchmark demonstrates that, for the majority of cardiac electrophysiology simulation codes, the level of error introduced by Δ*x*≥0.5 mm is so large that any simulation results with this level of spatial discretization are extremely difficult to interpret as either solutions to the governing equations or physiologically meaningful results. However, it is important to note that the results from simulation codes A and H demonstrate that the error introduced by this spatial discretization does not have to be of this magnitude. From these results, we would suggest that if a study uses this level of spatial discretization, further evidence that the numerical techniques being applied are producing a converged solution are required. Comparing results from a code against the highest spatial resolution (Δ*x*=0.1 mm) benchmark results proposed here, and made available online, provides one such method for producing this evidence.

### (b) Failed simulations

Some of the finite difference models (codes D, G, I, J and K) failed to solve with Δ*x*=0.1 mm and Δ*t*=0.05 ms. This is expected based on the Courant–Friedrichs–Lewy condition [26,50], which gives a maximum time step of 0.046 ms with Δ*x*=0.1 mm or a minimum spatial resolution of 0.104 mm with Δ*t*=0.05 ms, with the benchmark model parameters. Further analysis by finite difference codes D, G and K are consistent with this analysis, with the model solving for Δ*t*=0.04 ms but not 0.05 ms in all three codes. Similarly, code G solved for Δ*x*=0.11 mm with Δ*t*=0.05 ms, but not 0.100 mm. Also, there was no increase in error in the vicinity of this critical combination at Δ*x*=0.1 mm and Δ*t*=0.04 ms or Δ*x*=0.11 mm and Δ*t*=0.05 ms, which means that if a solution is obtained it is likely to conform to the global convergence properties of the model and not be affected by additional error in the vicinity of the critical discretization combination.

### (c) Benchmark evolution

The current benchmark could be readily complemented by additional benchmarks in four ways: increasing the richness of the comparison metrics, increasing the complexity of the boundary conditions, increasing the complexity of the geometry and reporting on the computational efficiency of the codes. Currently, the spatial distribution of activation time was the only result requested from each simulation. While activation times can verify the implementation of the reaction–diffusion equation and the implementation of the sodium channel current, this single metric will invariably not characterize the implementation of the cellular calcium dynamics, potassium channels or the solution at the limit cycle. Beyond the evaluation of the cellular state variables metrics of emergent tissue-scale behaviour, we could include the frequency or filament location within induced scroll wave or fibrillation patterns, the restitution properties of the tissue and metrics of action potential morphology.

The zero-flux boundary conditions and stimulus protocol provide unambiguous boundary conditions for the model. However, a more comprehensive validation stimulation from multiple sites and/or at multiple pacing frequencies would allow the verification of a code's capacity to capture complex wave fronts, multiple wave interactions, frequency response and functional block. Additionally, solving the bidomain equations would also allow field stimulations and shocks to be verified.

As already outlined above, the choice of a simple regular geometry and fibre orientation was motivated by its ability to be unambiguously communicated and by its compatibility with both hexahedral and tetrahedral meshes. However, for both experimental and clinically relevant simulations, curved and/or highly detailed geometries are typically required. At this time, with no standard mesh language, analogous to CellML [38] or SBML [51] for cellular models, it is challenging to define efficiently and unambiguously even simple analogues of cardiac geometries, such as a truncated ellipse, in a form that can be shared and used by multiple cardiac electrophysiology simulation codes. In the future, the advent and implementation of mesh description standards such as FieldML [52] will hopefully enable the definition of benchmarks on more complicated and anatomically relevant geometries.

The current benchmark simulation does not require participants to provide any performance metrics such as solver iterations, instructions, wall time, etc. This was partially because of the desire to focus on the results and convergence properties using the proposed benchmark. However, tracking the progress and improvements in numerical methods and impact of computational power on electrophysiology simulations would be a useful resource for evaluating both current and new numerical methods and their compatibility on the increasingly heterogeneous array of available high-performance computing resources.

Our expectation is that electrophysiology and indeed other VPH simulator developers will increasingly need to provide comprehensive code verification to demonstrate that their simulations are reliable and repeatable, and conform to regulatory requirements for developing and deploying clinical applications. The current benchmark proposed here provides a starting point for this code verification process. By enabling community members independently to upload new benchmark definitions and simulation results to the AMDB database, we hope this work is a starting point for providing an increasingly rich verification process in the future.

### (d) Benchmark utility

The benchmark has four central applications. The first is that the problem definition contained sufficient information for 11 different groups to replicate the same model. The development of the first widely replicated cardiac tissue electrophysiology model provides a template for the development of mark-up languages for describing tissue-scale models. The second is the development of a consensus benchmark problem, widely endorsed by the community, to provide an agreed-upon standard for evaluating and verifying cardiac tissue electrophysiology simulators. The third use of the benchmark model is that it provides a standard problem for comparing the efficiency of new and existing numerical methods by providing a standard model with an agreed solution against which to compare results. Finally, the benchmark provides a means for researchers to debug codes. The complexity of cardiac tissue electrophysiology codes should not be underestimated and the use of known solutions provides an efficient method to identify bugs during code development. The benchmark problem will improve the efficiency and quality in the development of new codes for simulating cardiac tissue electrophysiology, improve the ability to share models between different groups and provide confidence in simulation results.

The benchmark problem here is specific to the field of modelling cardiac tissue electrophysiology, which arguably represents the most advanced multi-scale organ modelling framework. Yet, multiple equally complex multi-scale organ modelling codes exist across and beyond the VPH [2] and Physiome [3] projects that span the full range of organ systems. To our knowledge, no attempt has been made to verify any of these codes by comparing the results of standard test problems on multiple codes. This study represents the first attempt within this wider community to confirm the reproducibility of multi-scale simulations and demonstrates the community-wide benefits of verifying simulation code.

## 7. Conclusions

The development of the first benchmark for code simulating cardiac tissue electrophysiology is an important step towards improving the verification of code and the reproducibility of simulation results. This will be essential to provide confidence in the application of increasingly complex cardiac models to both physiological and clinical questions.

## Acknowledgements

This study was conceived during a workshop on the Cardiac Physiome Project held at the Newton Institute for Mathematical Sciences at the University of Cambridge in July 2009 and funded by the Newton Institute, the UK Biotechnology and Biological Sciences Research Council, and the Wellcome Trust. This work is supported in part by UK Engineering and Physical Sciences Research Council grants EP/F059361/1 (S.A.N., I.R. and N.P.S.), EP/F043929/1 (S.A.N.), EP/G007527/1 (N.P.S.) and EP/D048400/1 (P.P.), the Medical Research Council (S.L. and N.P.S.), the Richard A. and Nora Eccles Fund for Cardiovascular Research and the Nora Eccles Treadwell Foundation (F.B.S.), British Heart Foundation grant PG/08/019/24600 (A.G.), European Commission euHeart project no. 22449 (E.K., A.G. and N.P.S.) and preDiCT project no. 224381 (A.G. and M.O.B.), UK Medical Research Council (A.P.B. and O.B.), Plan Nacional de Investigación Científica, Desarrollo e Innovación Tecnológica del Ministerio de Ciencia e Innovación, Spain, TEC2008-0290 (E.H. and J.F.R.), National Science Foundation through grants no. 0824399 (E.M.C.), 0926190 (F.H.F.) and 1028261 (F.H.F. and E.M.C.), Research Council of Norway (M.M. and O.S.), Austria Science Fund by grant F3210-N18 (G.P.) and National Institute of Health grant 1RO1 HL 10119601 (G.P.).

## Footnotes

One contribution of 11 to a Theme Issue ‘Towards the virtual physiological human: mathematical and computational case studies’.

↵1 Users of code A should be aware that, in order to replicate these results, state-variable interpolation (‘SVI’) must be switched on (release 2.2 onwards).

- This journal is © 2011 The Royal Society