MoSeS (Modelling and Simulation for e-Social Science) is a research node of the National Centre for e-Social Science. MoSeS uses e-Science techniques to execute an events-driven model that simulates discrete demographic processes; this allows us to project the UK population 25 years into the future. This paper describes the architecture, simulation methodology and latest results obtained by MoSeS.
MoSeS (Modelling and Simulation for e-Social Science) is a research node of the National Centre for e-Social Science. MoSeS uses e-Science techniques to execute an events-driven model that simulates discrete demographic processes; this allows us to project the UK population 25 years into the future. While this approach is grounded in the methods of microsimulation, concepts from spatial interaction modelling and agent-based systems are incorporated in an innovative way. The specific aims of the MoSeS project are as follows:
to create a flagship modelling and simulation node, in which the capabilities of grid computing are mobilized to develop tools whose power and flexibility surpass those of existing and previous research outputs,
to demonstrate the applicability of grid-enabled modelling and simulation tools within a variety of substantive research and policy environments,
to provide a generic framework through which grid-enabled modelling and simulation might be exploited within any problem domain, and
to encourage the creation of a community of social scientists and policy users with a shared interest in MoSeS problems.
There is an abundance of simulation games relating to people, cities and societies (past, present and future). The underlying aim of the research reported in this paper is to translate such games into real-world policy environments. If planners were equipped with the means (through simulation) to understand social and demographic changes in response to shifts in policy, such a device would have valuable practical applications both as a ‘decision-support system’ and as a pedagogic tool for understanding how cities work. As an academic and intellectual challenge, the ability to reproduce and predict the behaviour of real city systems constitutes the ultimate demonstration of a deep understanding of such systems. We wish to produce a simulation model of the UK population, as it now is and as it can be expected to develop over a 25-year time horizon. Such simulations can form the basis of a wide range of applications in both e-Research and public policy analysis, with potentially substantial benefits such as:
a big policy impact through the generation of effective predictions,
a potential ‘wind tunnel’ or ‘flight simulator’ analogy: planners can gauge the effects of development scenarios in a laboratory environment, and
the use of simulations as a pedagogic tool allows planners to refine understanding of systemic behaviour and alternative futures, thus aiding clarity of thinking and improved decision-making.
Specifically, MoSeS aims to develop scenarios in the domains of health, transport and business. For example, one health scenario would be to provide perspectives on medical and social care within local communities for a changing and ageing population. A scenario in the transport domain might concern the sustainability of transport networks in response to demographic change and economic restructuring: for example, what kind of transport network is capable of sustaining long-term economic growth in West Yorkshire, Greater Manchester, and the intervening areas—the ‘Northern Way’. A scenario in the business domain might include the impact of diurnal population movements on retail location and profitability, or the impacts of a changing retirement age on personal wealth and living standards.
The MoSeS project stands to benefit from e-Science technologies in a number of ways; in particular, the simulation model draws on diverse data sources, deploys models that are richly specific and therefore computationally intensive, and provides outputs to a spatially distributed community of researchers and policy makers. MoSeS is building relationships with policy users in Social Services, Health Care Trusts, urban planning, consultancy and other domains in order to demonstrate the viability and potential impact of simulation modelling, enabled by e-Science. An e-Science approach such as this will enable a clear advance over existing techniques, as it will greatly ease the integration of new datasets, and quickly take advantage of new and large-scale computing resources in a dynamic fashion.
In order to exploit the full potential of e-Social Science technologies, flexible and advanced interfaces to MoSeS have been created; in addition to describing MoSeS simulations and latest results, this paper explores the capabilities offered by exposing MoSeS functionality through service orientation and Web 2.0 technologies. Service orientation is emerging as a highly useful means of developing flexible, agile and dependable software systems. A service can be defined as ‘a mechanism to enable access to a set of one or more capabilities, where the access is provided using a prescribed interface and is exercised consistent with constraints and policies as specified by the service description’, while a service-oriented architecture can be defined as ‘an application architecture within which all functions are defined as independent services with well-defined invokable interfaces, which can be called in defined sequences to form business processes’.
Section 2 discusses the service-oriented MoSeS architecture, while §3 gives an overview of how MoSeS simulations are performed. Section 4 presents the latest results obtained by MoSeS simulation runs on the city of Leeds, UK, and §5 describes the Web 2.0 technology that has been integrated into MoSeS to provide richer visualizations of simulations and their results. This paper is concluded in §6.
2. MoSeS architecture
To achieve the aims detailed in §1, the software architecture employed by the MoSeS project needs to be capable of securely storing large quantities of simulation data, dynamically retrieving diverse data from spatially distributed resources, utilizing the capabilities of high-performance computing resources and visualizing the results of simulation runs. In addition to this, a high level of flexibility is very much desired, to enable MoSeS software to be quickly adapted to use new datasets, cope with changes in the structure of existing datasets, perform simulations based on new attributes, etc. In order to address these requirements, it was decided to service-enable MoSeS functionality. Service orientation aims to facilitate the development of complex, dynamic and inter-organizational systems. It is also designed to greatly simplify the process of integrating existing legacy systems, and has a profound impact on the software development process. By implementing MoSeS functionality as a set of invokable services, a number of advantages are introduced. These can be summarized as follows.
Multiple entry points to the system. By exposing MoSeS functionality as a set of invokable services, it becomes possible to quickly create new user interfaces to MoSeS. The functionality of each service can be used by programs written in any language, and graphical user interface (GUI) clients can be created as front ends to both workflows (which can be defined as the description of the sequence of services that must be invoked to form a given scientific process) and individual services. Another important aspect of having multiple entry points into the systems is that other e-Social Science applications can now integrate with MoSeS much more easily, simply by invoking the required service.
Richer user interfaces. Multiple user interfaces with more features can be created in order to improve the user experience. For example, a Java application can be written as a front end to MoSeS, using the same service functionality that a JSR-168 compliant portlet interface uses, but with the addition of interactive maps, owing to the Java GUI not being constrained by JSR-168 standards, etc.
Increased user flexibility. In addition to pre-written analyses, users can now construct new MoSeS workflows through the use of high-level workflow engines such as Taverna and ActiveBPEL. This enables users to develop, share and reuse routines for producing specific maps, charts and reports. Developing such workflows is a far more intuitive form of programming, which is likely to be found easy by most users regardless of whether or not they have done programming before.
Improved scalability. It is possible to install a MoSeS service (such as the analysis service) on multiple machines, and then perform load balancing by dynamically binding to available/free service.
Fault tolerance. Related to the benefits of improved scalability mentioned above, a level of fault tolerance can be introduced into processing by, for example, invoking multiple MoSeS services in parallel and either cross-checking their results or else using the results of the first service to fail without raising an exception. This can also lead to performance gains, as multiple services can be invoked and the first result to be returned can be fed directly into an ongoing workflow.
Ease of maintenance. In spite of the increased number of entry points into the system, maintenance of core MoSeS functionality should not be affected, as changes made to each service will be reflected in each interface due to a clear separation between the presentation and application logic layers of the system.
A drawback to service-enabling MoSeS is the potentially slower performance of individual tasks (note that this is unrelated to the scalability issue), mainly when passing large volumes of data using the Simple Object Access Protocol (SOAP v. 1.2; World Wide Web Consortium 2007) messaging protocol. Typically, data passed between different MoSeS services would need to be serialized into extensible markup language (XML) form, sent over HTTP, and then deserialized at the other end of the connection, resulting in noticeable delays when considering the size of data that are often required to be passed between services. A partial solution to this problem has been developed, whereby each MoSeS service stores data for a particular user session in storage resource broker (SRB) space, and invokes other MoSeS services by sending a reference to these data, which can then be transmitted in binary form.
The service-oriented architecture for MoSeS is shown in figure 1. As can be seen, MoSeS web services can be invoked by any number of end-user interfaces, including Java GUIs, JSR-168 portlets, workflow enactment engines, etc. Services can be distributed remotely, as all data interactions are performed either through direct access with distributed resources over HTTP, or else through accessing the MoSeS SRB cluster, which in turn stores the results of the MoSeS demographic and forecasting modules.
Owing to the new ability to use multiple end-user interfaces within MoSeS, this architecture has also been designed to exploit a number of Web 2.0 technologies, particularly those related to mapping, in order to provide a more dynamic and rich user experience. Web 2.0 is defined by O'Reilly (2006) as ‘the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform’. The use of these technologies within MoSeS is discussed in §5.
In order to illustrate the event-driven simulation model used by MoSeS, the urban area of Leeds—a city of 730 000 people in the north of England—has been used. The Leeds area is used for illustrative purposes, but is completely generalizable between local areas across the country.
The base year simulation is established by reweighting the Household Sample of Anonymised Records (HSAR) to individual Census wards in Leeds Metropolitan District. HSAR comprises a 1 per cent sample of households from the UK Census of 2001 in which the Census questionnaires are completely enumerated for households and their constituent individuals. The essential mechanism for protection of the anonymity of individuals is through reduction in the geographical resolution of data—thus each household can be identified only to the level of a standard region (i.e. Southwest, Yorkshire and Humberside, etc.), which limits the value of the source data for the purpose of spatial microsimulation.
Our approach is then to synthesize records from the HSAR in accordance with the structure of individual wards. Each ward population is therefore a unique extract or ‘resampling’ from the HSAR in accordance with the local geography. Households are sampled ‘without replacement’ from the parent distribution and are therefore unique within an area, although it is not only possible but necessary that some households will be duplicated in the Leeds area, as Leeds accounts for more than 1 per cent of the UK population, and so there are more households in the city than in the household sample.
The reweighting procedure is based on successive proportional sampling from the HSAR. Initially, the desirability of selecting a candidate from the HSAR is random. From this random selection, the composition of the sample is compared with key population distributions from the 2001 Census Small Area Statistics (SAS). The weights are adjusted to increase the likelihood of selection for population subgroups that are under-represented, and vice versa. This adjustment process continues until the observed and predicted population distributions are similar within a specified tolerance. The simulation model is anchored in a base year population for the year 2001. This is necessary in order to exploit the richness of the UK Census of population and households for the purpose of the simulation; there are no appropriate data sources to facilitate the preparation of a more up-to-date base population.
The objective of dynamic modelling is simply to project the population forwards in time. For such purposes, the most commonly used approaches have been cohort-based ‘macrosimulation’, in which the population is divided into categories, and multipliers—such as mortality rates or birth rates—are applied to those individual categories. However, these methods are problematic whenever a relatively rich set of population attributes is involved, as the number of categories begins to grow exponentially. Van Imhoff & Post (1998) presented an example in which the population of France is represented more efficiently through an individual microsimulation model even though age, marital status and place of residence are the only variables.
A number of projects have therefore attempted to build dynamic microsimulation models, particularly, not only for economic applications (e.g. CORSIM, SVERIGE) but also for social and anthropological applications to problems of kinship and community (Van Imhoff & Post 1998; Murphy 2004). However, the only examples of demographic projection with spatial microsimulation use the technique of ‘static ageing’, in which a base population (the HSAR in our previous example) is resampled in the context of independent estimates of future population change (e.g. Ballas et al. 2004). Therefore, this method provides no means to monitor the dynamics of change within a population, and ignores the benefits of dynamic microsimulation as a means for the projection itself. Some authors have seen fit to make a distinction between policy and pedagogic applications of microsimulation (Van Imhoff & Post 1998). In this context, it is argued that the accuracy of models with a policy orientation needs to be validated in a real-world context, whereas pedagogic models need only to reproduce local interactions within the population. We remain sceptical of such a distinction, and view the process of validation as essential if robust conclusions are to be drawn that are independent of the model as artifice. The MoSeS dynamic model consists of the following seven components.
Mortality. The mortality component of the dynamic model predicts the expected number of deaths within each UK Census ward, using data obtained from Census data and Office for National Statistics (ONS) Vital Statistics. At each time period, a survival probability is applied to each individual on the basis of age, gender and location. The model is run in annual time increments, and therefore the ageing rule for all survivors is that they become a single year older in each time step.
Fertility. Ward-specific fertility rates are derived in a rather similar way to the mortality rates, with national rates again being localized in accordance with ONS Vital Statistics.
Health status. Individual health states are recorded in the HSAR and SAS within five categories ranging from very poor to very good, but for both convenience and robust estimation these are reduced to three categories of ‘poor’, ‘medium’ and ‘good’ within the simulation. For each individual, the probability of a change in health status is assumed to be dependent on current health status, age and gender. These rates of change are derived from the British Household Panel Survey (BHPS).
Household formation. Changes in household composition are determined by four processes in the model. These are the formation of new unions (including marriage), the dissolution of existing unions, movements in which one or more persons leave a household and the death of a household member.
Migration. In order to accommodate migration, the model has two important features: a stock of houses that are independent from the households that occupy them, and a location search process that is mediated through an aggregate spatial interaction model of migration. This model recognizes that different households—according to their size, composition and age—have different housing preferences and search horizons.
Module sequence. In this application, we have adopted a fixed order of events in the sequence fertility, health change, mortality, migration and household formation. Both fertility and health changes appear before mortality, since there are still risks of infant mortality, and although death rates in our model are independent of health status, there is a logical connection between mortality and deteriorating health. Since many new households are formed in association with the migration process, it makes sense to consolidate household structures once a move has taken place, while recognizing that a more desirable option would be to consider these two processes simultaneously.
Student migration. In wards where student migration has a great impact, the microsimulation failed to reproduce the student population renewal and they grow old in areas as other people do. We know, however, that students tend to stay only in such areas during the period of their study and then leave while other new students move in. Owing to the replenishment of the student population each year, the population in such wards stays younger than that in other wards. A hybrid approach combining agent-based model techniques is therefore adopted to strengthen the modelling of such subtlety of the local migration patterns and the behaviour modelling of the student migrants.
4. Latest simulation results
Some key results from the current implementation of the MoSeS dynamic spatial microsimulation model are described in this section. Results are presented over a 25-year projection horizon from 2004 to 2029. In relation to our previous discussion about assumptions, survival rates are expected to improve by 1 per cent in each age–gender–location combination in each year of the simulation. Fertility rates are expected to increase by five percentage points for each demographic subgroup in each year to 2011 before stabilizing.
Firstly, we note some basic demographic trends. Regarding the age composition of the population, we find a surge of 20 per cent or more in each of the elderly cohorts 65–74, 75–84 and above 85. The school-age cohorts all end up reasonably close to where they started, albeit by rather different trajectories. Current fertility levels in Leeds, as elsewhere, are close to their historical low, but beginning to rebound. An assumption of increased fertility levels is offset by lower numbers moving into the major child-bearing age groups. These results are very much in line with the ONS (2007) expectations for the Leeds area.
Ethnic group projections show substantial growth in all minority groups. These are the product of two effects—ongoing net migration of minorities into Leeds and a demographic bulge in the younger, more fecund age groups. Note that fertility rates are uniform between ethnic groups with the same age and marital status profiles. These projections are again in line with estimates produced on behalf of the local development agency (Rees et al. 2007; figure 2).
Spatial disaggregation in the growth of elderly populations is illustrated in figure 3. Here the notable trend is an accumulation in areas where the older age groups are most heavily concentrated already, notably the Wharfe Valley across the northern edge of the city, Cookridge and Halton wards. However, there is significant growth in all areas through ageing in situ: the migration process does not simply relocate all the pensionable age groups into a small number of retirement areas. More detailed effects can now be explored through secondary modelling. In the next illustration, we have used a data linkage model (C. Zuo 2007, unpublished thesis) in order to match records from the BHPS with individual HSAR records, and from this we have extracted disability rates. The total individuals with a disability is then summed back to Census wards and plotted as figure 4. Here we can see some relationship between disability and old age in the wards of North and Cookridge, but the dominant pattern is one of centralization. The poorer socio-economic groups, often associated with low economic activity rates and high unemployment, in the central areas clearly experience much higher like-for-like disability rates than their more affluent suburban counterparts (table 1).
Rates of disability are projected into the future in two ways. First, the projected individuals from the dynamic microsimulation are again matched with individuals from the BHPS 2004 to provide an estimate of disability with current health patterns. In this scenario, the proportion of disabled people in Leeds rises from 9.1 to 14.1 per cent in the 25-year period. In the second estimate, we assume that, for populations over 40, individual health improves steadily so that, in 25 years time, everyone is ‘five health years’ younger than at present—for example, a typical person aged 65 in 2031 has the average health characteristics of a 60-year old in 2006.
Given current concerns over conditions such as obesity and diabetes, such an assumption may well be optimistic, but even in this scenario the disabled population of Leeds grows from 51 600 to 70 400—a 36 per cent increase. On this metric it is clear that Social Services in Leeds will be under acute pressure over the next 25 years, and that the spatial consequences of increasing need will be uneven.
5. Web 2.0 visualizations
MoSeS has been given the capability to render simulation results using Web 2.0 technology, allowing for the use of both dynamic two- and three-dimensional maps of results. This is performed primarily through the use of GeoServer technology (http://geoserver.org/display/GEOS/Welcome). Geoserver is an open source, freely available implementation of the Open Geospatial Consortium Web Feature Service (http://www.opengeospatial.org/standards/wfs) and Web Map Service (http://www.opengeospatial.org/standards/wms) specifications, and allows map data to be uploaded and served out in a variety of formats.
The MoSeS Google Maps and OpenLayers interfaces are shown in figure 5, with the Google Maps interface in figure 5a and the OpenLayers interface in figure 5b. In addition to the OpenLayers interface, another highly useful format served out by Geoserver is keyhole markup language (KML), an XML-based language schema for expressing geographical annotation and visualization on two-dimensional maps and three-dimensional Earth browsers. KML is the native file format of the Google Earth application, and when the KML of a MoSeS map is combined with a generated styled layer descriptor, impressive three-dimensional visualizations of MoSeS data are possible, as shown in figure 5. Once Google Earth is loaded, additional information, such as place names, roads, etc., can be layered on top of this visualization by the end-user, using the Google Earth application interface (figure 6).
This paper has described the architecture, simulation methodology and latest results obtained by the MoSeS project at the University of Leeds, which uses e-Science technologies to improve existing social science tools by greatly easing the integration of new datasets, and to quickly take advantage of new and large-scale computing resources in a dynamic fashion. The functionality underpinning MoSeS is service enabled, giving a further advantage over existing tools by allowing for multiple entry points to the system, richer user interfaces, increased user flexibility, improved scalability, fault tolerance and easier system maintenance. A synthetic representation of the entire Leeds population has been generated from publicly available datasets. Using an events-driven model that simulates discrete demographic processes, the population has been projected 25 years into the future. While the approach is grounded in the methods of microsimulation, concepts from spatial interaction modelling and agent-based systems are incorporated in an innovative way. Although appropriate simplifying assumptions have been introduced, the model still incorporates a great many parameters and assumptions. Web 2.0 technologies have been employed to allow for a more dynamic and rich user experience when visualizing MoSeS results; this has been achieved through the use of third-party server software such as Geoserver, in addition to a bespoke MoSeS service for the generation of mapping styles, and allows MoSeS data to be visualized in Google Maps, Google Earth and with OpenLayers.
MoSeS is funded by ESRC grant RES-149-25-0034.
One contribution of 16 to a Theme Issue ‘Crossing boundaries: computational science, e-Science and global e-Infrastructure II. Selected papers from the UK e-Science All Hands Meeting 2008’.
- © 2009 The Royal Society