The application of e-Science technologies to disciplines in the arts and humanities raises major questions as to how those technologies can be most usefully exploited, what tools and infrastructures are needed for that exploitation, and what new research approaches can be generated. This paper reviews a number of activities in the UK and Europe in the last 5 years which have sought to address these questions through processes of experimentation and targeted infrastructure development. In the UK, the AHeSSC (Arts and Humanities e-Science Support Centre) has played a coordinating role for seven projects funded by the Arts and Humanities e-Science Initiative. In Europe, DARIAH (Digital Research Infrastructure for the Arts and Humanities) has sought to develop a deeper understanding of research information and communication in the arts and humanities, and to inform the development of e-infrastructures accordingly. Both sets of activity have indicated a common requirement: to construct a framework which consistently describes the methods and functions of scholarly activity which underlie digital arts and humanities research, and the relationships between them. Such a ‘methodological commons’ has been formulated in the field of the digital humanities. This paper describes the application of this approach to arts and humanities e-Science, with reference to the early work of DARIAH and AHeSSC.
This paper outlines early results of the work of two projects at the Centre for e-Research. Firstly, the Arts and Humanities e-Science Support Centre (AHeSSC) is investigating the implications of the new information and communication technologies associated with e-Science for research practices and knowledge generation in the arts and humanities. Specifically AHeSSC has observed, and collaborated with, the seven e-Science research projects funded by the AHRC-JISC-EPSRC Arts and Humanities e-Science Initiative. Secondly, the European FP7-funded Research Infrastructures project DARIAH (Digital Research Infrastructure for the Arts and Humanities) is seeking to understand scholarly information and knowledge practices to inform the development of a research infrastructure that is rooted in, and supports, arts and humanities research practices across Europe. The paper describes how the authors are employing two key concepts from the digital humanities—methodological commons and scholarly primitives—to abstract out and model the practices and activities of arts and humanities scholars engaging with e-Science, and to classify these practices and activities such that they can guide, inform and improve the development of new tools and infrastructures, and improve those already in existence.
The UK e-Science Programme and the European Research Infrastructures Programme were both funded to support and enhance scholarly work. Launched in 2001, the UK e-Science Programme was designed to leverage the growing capacity of computing, storage, communication and software systems to enhance, automate and support change in the way science was conducted, and, as the programme developed, to assist scientists in curating and analysing the ‘data deluge’ (Hey & Trefethen 2003). The overall objective of the FP7 Research Infrastructures Capacities Programme is to optimize the use and development of the best research infrastructures existing in Europe, and to help to create new research infrastructures of pan-European interest in all fields of science and technology. The Commission funds research infrastructures to help to create a new research environment in which all researchers—whether working in the context of their home institutions or in national or multinational scientific initiatives—have shared access to unique or distributed scientific facilities including data, instruments, computing and communications, regardless of their type and location in the world (see http://cordis.europa.eu/fp7/capacities/research-infrastructures_en.html; last accessed 25 April 2010).
The disciplines of the arts and humanities did not participate in the initial £250 million UK e-Science Programme, only becoming involved in 2004/2005 through a Scoping Survey (Anderson 2007), funded through the Arts and Humanities Research Council (AHRC) ICT in Arts and Humanities Research Programme. The ICT Programme was funded with a broader remit with the intention of building national capacity in the use of ICT across the arts and humanities, and advising the AHRC on matters of ICT strategy. In 2007 the Engineering and Physical Sciences Research Council (EPSRC) and the Joint Information Systems Committee (JISC) contributed additional funding to the programme to fund seven arts and humanities e-Science projects across a diverse range of disciplines, including musicology, archaeology, museum studies and the practice-led arts. In addition to the seven projects, AHeSSC was established at the Centre for e-Research (CeRch), at King’s College London, to coordinate, understand and promote e-Science in the arts and humanities.
In September 2008 the European Commission FP7 Research Infrastructures Programme funded a preparatory project called DARIAH (Digital Research Infrastructure for the Arts and Humanities). The technical and strategic work packages are led by the CeRch. The aim of DARIAH is to enhance and support digitally enabled research across the humanities and arts, to enable researchers to ask new research questions and for old questions to be explored in new ways, and to access, link and use the rapidly increasing volume of digitized source materials provided by libraries, archives, museums and research institutes. The DARIAH infrastructure includes systems, tools and technologies, the sharing of knowledge and expertise and education in methods and the use of digital data, tools and infrastructure.
These initiatives are providing an unparalleled opportunity to investigate what e-Science in the arts and humanities might mean in terms of scope, impact and potential across the arts and humanities, to identify and seek to understand the challenges and problems that are emerging as researchers engage (or attempt to engage) with new tools, methods and practices, and to reflect and analyse how these understandings can be used to inform the development and production of e-research infrastructures.
3. Context and approach
e-Science is most frequently characterized as data- and compute power-intensive, interdisciplinary and highly collaborative in nature. The National e-Science Centre defines e-Science as ‘…the large scale science that will increasingly be carried out through distributed global collaborations enabled by the Internet. Typically, a feature of such collaborative scientific enterprises is that they will require access to very large data collections, very large scale computing resources and high performance visualization back to the individual user scientists’ (http://www.nesc.ac.uk/nesc/define.html; last accessed 25 April 2010).
By contrast, research work in the humanities is commonly characterized by the four Rs: reading, writing, reflection and rustication (Unsworth 2000, 2006), far removed from the e-Science paradigm outlined above. e-Science tends to assume certain research practices and requirements, much of it framed around grids and high-performance computing (HPC), and the processing of large volumes of data, not least the ‘data deluge’ described by Hey & Trefethen (2003). Rather than a data deluge, the humanities and arts face a ‘complexity deluge’, dealing with a multiplicity of types of information, much of it highly dispersed, difficult to find and complex to use; and instead of a focus on grids and HPC it needs tools and infrastructures that can take account of the essentially hermeneutic and practice-led nature of research practice in the humanities and arts.
The starting point for our analysis of arts and humanities e-research and the associated requirements for research infrastructures is that e-research work takes place within a framework of mutual shaping (Williams & Edge 1996). Mutual shaping suggests that technological innovation arises from the interaction between the technological, social, organizational, economic and epistemological processes with each as important as the other. Mutual shaping argues that, taken alone, both technological determinism and socially constructivist views are inadequate. Technological determinism implies that technology development is top down, developed outside the knowledge-making context in which it will be used, and the role of the information science or technology researcher should be in assessing its impact. The contrasting view that technologies are entirely socially constructed can be seen in much of the work to capture ‘user requirements’ where technology development is entirely framed around current practices and perceived needs without reference to how the technologies might either constrain or enable new forms of working, and how they themselves might be adapted. Recognizing the process of mutual shaping of research practices and the technologies that support them is fundamental to the success and uptake of ICT and e-Science (see http://engage.ac.uk/e-uptake; last accessed 26 April 2010).
By investigating the interplay between humanities and arts research practices, ICT systems and tools, and disciplinary methods and approaches, we are able to identify those points of tension and transformation in both arts and humanities disciplines and in ICT, allowing both to move forward in mutually compatible ways. In this model, ICT and e-research is not something that happens to the arts and humanities, but rather something that happens as a mediated and mutual shaping process in collaboration with the arts and humanities. Our work seeks to observe and map this interplay so that we can start to understand the mutual shaping and interventions that are occurring in the practice of knowledge creation as researchers apply e-research methods and technologies to address their research questions (Beaulieu & Wouters 2009).
4. Methodological commons
Researchers within the digital humanities have over recent years sought to conceptualize and theorize their work so as to understand the complex relations between disciplinary practices, the digital materials which provide the sources for exploration, interpretation and analysis, and the methods and technologies that might be applied to answer research questions, and in the process to identify new research questions. McCarty & Short (2002) have developed an intellectual map to visualize these complex interactions around the concept of ‘methodological commons’ (figure 1). Identifying and understanding the ‘commons’ has proved a useful tool to explore methodological, epistemological and normative divides between disciplines with a view to bridging those divides to better enable interdisciplinary work and to develop research infrastructures that support that work.
McCarty & Short’s (2002) map identifies three areas of mutuality. At the top are the disciplinary clusters denoting the range of research in the arts and humanities. Double-headed arrows to and from the methodological commons in the centre represent the connections to the types of content, tools and methods most relevant to each.
At the centre of the map is the methodological commons representing the different types of content most prevalent in the humanities (text, numeric and alpha-numeric, images, sound, etc.), the analytical tools and structures used to interpret and analyse that content, and the formal methods applied to interpret and analyse content. This representation is fluid, shaped by interdisciplinary engagement with the ‘clouds’. Within the commons is also an unarticulated reference to the new forms of collaboration that this kind of interdisciplinarity requires.
Below are ‘clouds’ representing epistemological approaches and broad areas of disciplinary knowledge from both within and outside the arts and humanities with which scholars must engage in order to understand the arts and humanities e-research theoretically. McCarty & Short (2002) have used clouds to denote their nature as bodies of thought and the provisional understanding of their role in e-research. In this model mutual shaping takes place between the arts and humanities disciplines, the digital and non-digital source materials upon which they base their interpretations and analyses, the methods and tools applied to those materials, and the epistemological practices that are brought to bear to understand the interplay between the different elements of the commons. This is the space where interventions occur in the practices of knowledge creation that have the potential to provide new insights, and to lead to new e-research practices.
The methodological commons as it stands is an abstract model that offers a conceptual overview of the mutual shaping that takes place in arts and humanities e-research work. We are using this model to start to identify the mutual shaping and interventions that are taking place and to visualize the relationships and networks taking place in arts and humanities e-research work. Agent-based modelling (ABM) is an example of a method which illustrates not only commonality between different areas of the humanities, but also significant overlap with the e-Social Sciences. ABM is a method of simulating the behaviour of populations, where individual members of any group of any size, whether real or hypothetical, can be represented as a single ‘agent’ with particular attributes (Bonabeau 2002). Multiple simulations can then be run which illustrate how the agents could act under a given set of parameters. The Medieval Warfare on the Grid: the Case of Manzikert project, part of the second phase of the Arts and Humanities e-Science Initiative, used ABM to explore a particular research question concerned with the battle of Manzikert in AD 1071 (see http://www.ahessc.ac.uk/manzikert; last accessed 24 April 2010). The Battle of Manzikert was a key turning point in the decline of the Byzantine empire, when the Byzantine army was defeated by the Seljuk Turkish forces. Despite the importance of this event, historians of the period are not clear as to how the greatly outnumbered Turks were able to effect such a decisive victory. The ABM model allows the behaviours of individual agents within the whole population (i.e. the Byzantine army) to be modelled, taking account of the ability of each agent to influence the course of events (e.g. the agent representing the Emperor can influence events at a different level from that representing a private soldier or a mercenary). Specifically, this project is using agent modelling to reconstruct the route the Byzantine army took to reach the site of the battle, allowing historians to better understand the environmental, social, economic and military factors that influenced the battle’s outcome.
Like all ABM applications, therefore, this exercise is about generating hypothetical scenarios of decision-making across complex hierarchies, and over varying environmental conditions. As noted, however, this is a method that was developed for social science applications. e-Social Science applications of ABM are typically concerned with contemporary questions such as, for example, transport, crime and population movement. These are all fields which are extremely data rich; there is, therefore, a very high resolution in terms of the number of agents it is possible to generate, and predictive ABM modelling is possible across small time periods. Also, when modelling contemporary social scenarios, it is possible to verify the veracity of the model, as real-time data are generated. In an historical application such as the Manzikert example, however, the data have a lower resolution, and the model cannot be verified by real-time data.
5. Scholarly primitives
In 2000, Unsworth introduced the concept of scholarly primitives to describe a list of (recursive) functions that could provide the foundation for tool building in the arts and humanities. Unsworth used the term primitives ‘in a self-consciously analogical way, to refer to some basic functions common to scholarly activity across disciplines, over time, and independent of theoretical orientation. These “self-understood” functions form the basis for higher-level scholarly primitives, arguments, statements, interpretations—in terms of our original, mathematical/philosophical analogy, axioms’ (Unsworth 2000). Unsworth’s provisional list compiled for his observations and practice of working in the digital humanities included: discovering, annotating, comparing, referring, sampling, illustrating, representing.
In a report for the Online Computer Library Center, Palmer has adapted Unsworth’s work on primitives to derive scholarly information activities that can offer points of comparison across disciplines, identifying a set of core information activities: searching, collecting, reading, writing and collaborating, and associating two or more primitives for each of these activities (Palmer et al. 2009). Palmer recognized that her work is related to that of Unsworth but suggests that it ‘emphasizes the explicit role of information in the conduct of research and production of scholarship’. Palmer introduces a level of complexity to the concept of primitives, suggesting that they should be taken as the fundamental building blocks which form part of a larger process; for example, the scholarly primitives assembling, co-authoring, disseminating form part of a larger scholarly activity, writing (Palmer & Cragin 2008).
Palmer’s approach is attractive in that, rather than Unsworth’s focus on building tools to support discrete practices embodied by the primitives, it allows us to see scholarly primitives as part of a wider set of activities that could be translated into a set of functions for building a coherent research infrastructure that supports a chain of related activities. For example, we can start to visualize how the scholarly activity of searching, which includes at a lower level of granularity chaining and browsing, and the scholarly activity of collecting, which includes gathering and organizing, could combine to form a linked data infrastructure that allowed researchers to create their own dynamic representations of knowledge from the data deluge that is the Web. Adding Unsworth’s annotating increases the potential still further.
It is our contention that by employing two key concepts arising from the theoretical work that underpins e-research in the arts and humanities—methodological commons and scholarly primitives—we can gain richer insights and understand the mutual shaping and interventions that are occurring between research practices and technologies. By close observation of e-research projects we are able to identify a range of scholarly activities and the primitives associated with those activities and to use these to help frame functional specifications for tools and infrastructure building. The following two sections present examples which exemplify the early results of our research using the concepts of the methodological commons and scholarly primitives to map research work in arts and humanities e-Science, and to assist in scoping the research infrastructure for the European project DARIAH.
6. Example: motion capture and archaeology
It is significant to note that much previous analysis of ‘research methods’ in the past has focused on qualitative and (or versus) quantitative approaches to humanistic disciplines and social sciences such as anthropology (Bernard 2006), health research (Pope & Mays 1995) or sociology (Bulmer 1999). It is useful to maintain this conception of a method as a discrete and definable means of engaging with and/or producing research content—in other words, articulating the ‘bigger picture’ formed by the ‘primitives’ described above. It is also clear that the advantage of considering research methods in a critical and recursive manner is of value in understanding and mapping research processes within domains, and identifying areas of methodological cross-over.
One example of such a cross-over concerns motion capture. Although this can mean various things, for the purposes of the present discussion it exemplifies what is meant by a ‘method’. Currently, its primary research application is in the performance and practice-led domains, as a means of capturing visually (and quantitatively) the movements of performers by means of a sensor or sensors attached to their clothing or person. Motion capture was employed in the early phase of the Arts and Humanities e-Science Initiative by the Associated Motion User Categories (AMUC) project (see http://www.ncl.ac.uk/culturelab/people/publication/50135; last accessed 21 April 2010). The AMUC project sought to provide researchers with a more effective way of storing, managing, retrieving and viewing the three-dimensional traces rendered from motion capture suits on dancers. The main feature of the interface was an onscreen drawing pad where the user could define the shape of the type of ‘pathway’—or line, rendered in three dimensions, documenting where and how the sensor attached to the performer has moved—they are interested in, and generate a list of ranked results from a database of video clips of performers. In many ways, this kind of problem is familiar to the library and information science communities: the traces themselves are complex three-dimensional artefacts, and organizing them into a coherent digital library thus presents its own challenges. On the other hand, the e-Dance project, which was funded in the second phase of the initiative, sought to explore the potential of motion capture as a tool for planning and recording choreographic pieces (Bailey et al. 2009). The project sought to integrate motion capture, motion tracking and stereoscopic video technologies in order to create choreographic morphologies, ‘virtual sculptures’ that capture three-dimensional pathways which represent the movements of the part(s) of a dancer’s body as he or she executes a performance. The traces described represent both the spatial and temporal ‘histories’ of the movement. This in itself is a significant advance for the field of choreography, and the practice-led study of dance theory. This visualization as performance documentation is far more versatile than the standard forms of recording used in the field, which include notation techniques and still photo and video recording. The use of the motion capture sculptures allows the observer, or the researcher, to ‘remediate’ the ephemeral, motional pathways that the performer actually produces, which provides new ways of engaging with the data, and of reusing it. However, careful observation of scholarly practice, coupled with a creative interdisciplinary partnership, has shown that concurrent documentation of human movement through time and space can also be used to explore human interaction with other kinds of spaces, not just performative contexts.
One example is the process of archaeology, which is concerned with reconstructing past human use both of natural environments and of artificial spaces such as buildings and streets (see http://www.viarch.org.uk/content/research-summaries-detail.asp?ProjectID=12; last accessed 20 April 2010). The so-called New Archaeology tradition of the 1960s, and its successors in the processual archaeology movement (Browman & Givens 1996), contends that cultural processes can be adduced objectively from the material remains left behind by past societies. At the most mechanical level, this is reduced to questions such as what use did the inhabitants of this house make of this room, why was this artefact deposited in this location, why was this settlement laid out in the particular way it is, and so on. Whether or not these questions should be approached objectively or subjectively is not a matter for this paper, although most archaeologists would now agree that it is the latter. The main point is that, fundamentally, and however one documents or interprets the data, such questions are functions of human movement in relation to the material remains in question. Like performance, this movement is disembodied, and cannot be objectively reconstructed. The idea of qualitatively reconstructing movement in and around ancient remains is by no means new: the method of experimental archaeology, where ancient features are (re)created using methods from the period in question, is well established, as is the (perhaps less scholarly) method of historical re-enactment of specific events or practices. However, using the motion capture methods developed by e-Dance to capture the movement of archaeologists or performers through archaeological spaces will allow the same kind of objectivity now available to performance researchers in their assessment of how those spaces were used.
A direct result of this amalgam of choreographic and archaeological practice, which will seek to realize its potential, is the Motion in Place Platform (MiPP) project, funded by the Arts and Humanities Research Council in 2010–2011. This will develop and test a fully integrated motion capture system at the excavation at the Roman urban town site of Silchester in Hampshire in its summer field excavation season in 2010. Silchester, which has been excavated annually by the University of Reading since 1997, presents complex features, many varied kinds of small artefacts, and different overlapping occupation systems of the Iron Age and Roman periods, which present different spatial layouts of the roads and buildings. Over a hundred excavators, mainly students, interact with the site at any one time, in a variety of capacities and perform a variety of tasks. Some of these tasks involve digging, some dealing with finds, some with recording, others with gathering and processing environmental information. The MiPP project will equip individual volunteers with motion-tracking devices, which will record their ‘spatial footprints’ through, over and around the site’s physical features. This will provide a great range of information, about both the features—profiling the spaces between architectures, an approach not previously explored in archaeological fieldwork—and the way people at the site interact with them.
Capturing in quantitative terms the way in which people move through and around a site such as Silchester can contribute new perspectives to the way that the site is recorded. In common with other large, complex archaeological excavations, Silchester has a sophisticated recording system. This itself has been augmented by a virtual research environment (http://www.vera.rdg.ac.uk; last accessed 24 April 2010), which facilitates on-site digital data capture in conjunction with the site’s integrated archaeological database. However, it continues to rely on established recording strategies, which in turn rely on the identification of contexts. A context is a group of artefacts or features which are immediately associated in some spatio-temporal manner. Identification of a context is a specialized and highly skilled, subjective process. Once identified, a context is given a serial number, which is then related chronologically to the other contexts. However, although building contexts into a matrix is standard practice in archaeology as a means of understanding sequences of material culture, there is little scope to explore their spatial relationships with the present world: this is left to separate maps and plans, with a ‘disjointing’ in the information being thus inevitable. MiPP, however, will seek to use the fact that each context is spatially referenced, by the find-spots of artefacts and features within them, to relate them to the ‘spatial footprints’ of those creating the dataset. This will provide new bases for classifying and managing the context sequences, both spatially and temporally.
In addition to new perspectives on archaeological and cultural heritage recording that bringing motion capture into field archeology will produce, the same process will offer new approaches in the fields of choreography and performative composition. As previously noted, Silchester is a complex site with features which interlock both spatially and temporally: representing these choreographically will be an interesting and challenging exercise. There are, of course, pronounced tensions between application of motion capture in these two apparently unrelated areas: whereas one is concerned with contemporary movement in a performative context, the other is concerned with reconstructing (potentially) hypothetical movements through societal and cultural spaces which do not now exist in their original forms. However, although the application of the methodology may differ, the methodology itself is common to both of them. The methodological commons described here seeks to make explicit such links.
Motion capture, then, provides an example of a replicable method, with established conventions and applications, and which requires training and expertise to use. It is applied as a means of documenting a specific kind of activity (human motion). It relies on various technologies, and can be deployed in various different ways; yet, in order to establish its place in the broader methodological commons, careful observation must be made of the scholarly activities which use it to increase, or augment, knowledge within their respective domains. In the case discussed, the Silchester excavation and the e-Dance projects provide the subjects of such observation, and the close collaboration developed with experts in both projects is crucial. More generally, however, such activities can be broken down into more granular categories at the subdiscipline level; but, such an exercise is future work beyond the scope of this paper. In this specific case, however, these scholarly activities include cultural heritage documentation, reconstruction and documentation of performance. They map to the scholarly primitive (see below, table 1) of collecting information. The process of collecting relies on specific hardware (motion sensors, ultrasonic receiving stations, GPS units, etc). However, once collected, those data cannot be reused in any way by other projects unless they are made accessible in some form of digital media. Indeed, a key challenge that many motion capture projects have encountered in the past is storing and delivering the digital objects once they are created: if motion capture files are not delivered to an interface and platform that can handle them three-dimensionally, then the user loses most, if not all, of the rich information that is captured in the first place. A responsive set of e-infrastructure tools and services is therefore needed for delivering the information. In almost all cases this will involve asynchronous delivery to remote locations. This model of observation and extrapolation is what enables us to frame the A&H e-Science methodological commons in practical terms, rather than the abstract framework developed previously in the digital humanities (see §4 above).
A broader benefit that this example highlights is the ability to approach subjects that are not concerned with text. It has frequently been noted in recent reviews of the digital humanities as a subject area (which, for the purposes of this paper, is not the same as an academic discipline) that, although the field is concerned with a wide range of content types including imagery, video, sound and statistical data, the medium of text enjoys a privileged position in the field (Svensson 2009). A scan of the proceedings of the Digital Humanities conference, or of the journal Literary and Linguistic Computing, respectively the domain’s principal conference and journal, will confirm that the bulk of digital humanities applications research is concerned with digital texts. However, the example given here shows that the application of methods across disciplines in the arts and humanities provides a basis for dealing with visual and physical objects as well. Thus, the application of a methodological commons shows, clearly, that the scholarly primitives that make up the digital humanities can, and should, extend beyond textual content. In briefly citing the specific example of motion capture, we seek to illustrate how a definable method with an established and proven history of application in performance studies can be used to make an intellectual contribution to a very different kind of scholarly activity. As highlighted in the previous section, this is a concept which has been used in the development of the digital humanities’ methodological commons. The Arts and Humanities e-Science Initiative has provided a laboratory environment in which different methods can be applied in different domains. However, any scenario in which this becomes commonplace relies on an ability of researchers to connect adequately with each other, with their data, with other data available elsewhere, and with relevant software and hardware. The next section describes current development of the e-infrastructures necessary to enable this.
7. Research infrastructures, methodological commons and scholarly primitives
The DARIAH infrastructure initiative aims to conceptualize and build a virtual bridge between different humanities and arts resources across Europe. Funded under the ESFRI programme (see http://cordis.europa.eu/esfri; last accessed 22 April 2010), DARIAH is in a preparatory phase, which involves the design of the infrastructure and construction of a sound business and governance model. The DARIAH programme will begin its construction phase from 2011; aims to then combine various national infrastructures—from the UK’s Arts and Humanities e-Science initiative projects referred to above, to the German e-Humanities infrastructure TextGrid (Gietz et al 2006)—and also to help other EU countries to establish their own arts and humanities e-infrastructures. Building on the expertise and experience of digital humanities across Europe, DARIAH aims to build an infrastructure that is based on scholarly methods and research activities. Digital humanities have proven to be too decentralized and too wide in scope (McCarty 2005) to build DARIAH as an infrastructure that would encompass one particular discipline, or be dedicated to a particular resource. Digital humanities incorporates too many disciplines and subdisciplines across the humanities. DARIAH, however, includes the claim that one can build an infrastructure based on cross-disciplinary scholarly activities, not just within discipline boundaries. Otherwise, (universal) libraries or encyclopaedic museums would have never been possible. A library has never been just a place to store and access books but, particularly in the humanities, a place to interact about research. An infrastructure for the whole of arts and humanities can therefore only be a marketplace of services. As such, it is never a purely disciplinary activity, but a zone for researchers to exchange and discuss their products and services. These products have changed dramatically in recent years. If, in the past, scholarly products have been mainly publications, traded in journals, nowadays research produces a wide variety of outputs next to publications, databases and other online publications. We do not imagine DARIAH to be one large infrastructure, but more a means for linking people, services and data for research in arts and humanities. Most likely, DARIAH will not be one technical solution, but many, according to community activities and willingness to collaborate. The scholarly primitives discussed in §5 function as a way to organize these trading zones of services and work towards more coherence. They help avoid redundancy in services and develop collaborations across disciplines.
Looking back at the discussion in §5, scholarly primitives can support the classification of scholarly activities, and thus provide a good foundation for setting up an infrastructure, as they are the ‘basic functions common to scholarly activity across disciplines’ (Palmer et al 2009). The concept has proven to be intuitive and valuable in multi-disciplinary endeavours such as digital humanities. It is particularly helpful in designing infrastructures to ensure that they are not planned beyond the needs and understanding of researchers, while at the same time generic enough to cater for many different research needs.
Table 1 demonstrates the results of the infrastructure-relevant parts of our user requirements work in DARIAH. In table 1 we develop scholarly activities, classify them according to primitives, and map them to DARIAH services and to the requirements DARIAH has for outside service providers. For example, the most common scholarly activities in humanities will be related to ‘finding out about’ and discovering information. DARIAH will cater for this activity by providing various kinds of search and browse services but will also be (among other things) dependent on the provision of access application programming interfaces (APIs) and sufficient metadata from outside resource providers.
In table 1, we first deliver five primitives that are relevant to our infrastructure work: discover, collect, compare, deliver and collaborate. These primitives are taken from Palmer et al. (2009), which we amended with more digital humanities-related primitives by Unsworth (2000), as noted in the discussion above (§5). We did not include primitives from Palmer et al (2009) that are not relevant to more generic digital scholarly practices, such as reading. Each of the primitives includes various scholarly activities, which are in turn supported by technical services presented in the last two columns of table 1. We will now go through the DARIAH technical work and elaborate on how DARIAH demonstrators (larger collaborations between researchers and developers) and experiments (mainly focused on developing infrastructure support services) aim to support scholarly activities. Table 2 summarizes the scholarly activities and the DARIAH interoperablility experiments and demonstrators.
Already in the DARIAH preparatory phase, we work not just on technical experiments, which demonstrate that we can realistically build the DARIAH infrastructure, but also on community demonstrators, which make use of DARIAH infrastructure expertise to enhance research. The DARIAH technical work has a wider focus than simply achieving a convincing overall architecture; it must also be convincing for the community. It needs to work with existing standards communities, while also understanding how scholars use these standards. To this end, the technical work is divided into at least two principal activities. The first one looks at how to enable the DARIAH infrastructure as an infrastructure, and validates the underlying concepts in a series of experiments. This set of experiments is based on the mapping of primitives to infrastructure functions, and follows table 1 from left to right. The second principal technical activity is two major community demonstrators that follow table 1 from right to left, and show how a set of already existing infrastructure functions enable scholarly activities in two communities. These two communities have been historically at the forefront of developing digital humanities as discussed in the previous section: archaeology and textual studies. We first discuss these community demonstrators, before continuing with infrastructure experiments. For all experiments and demonstrators we specify the main scholarly activities. These will be either generic primitives or more specialized scholarly activities.
The first community demonstrator migrates a legacy application of the European archaeology community into a more sustainable service-oriented architecture. The EU Culture 2000 project ARENA (see http://ads.ahds.ac.uk/arena; last accessed 22 April 2010) was completed in November 2004. This traditional metadata search portal service is enhanced by using DARIAH web services, and by exposing the attached databases as autonomous services. With the ARENA demonstrator, DARIAH aims to show that search services on remote distributed archaeological projects can enhance the scholarly gathering primitives from table 1. The demonstrator is finished and is currently evaluated and rolled out at the ARENA member institutions. The second demonstrator delivers a publication platform for textual resources annotated according to the text encoding initiative standard (TEI; http://www.tei-c.org/index.xml; last accessed 22 April 2010). It shows how the standard repository software Fedora can be used to publish complex TEI research objects, which dominate many of the research outputs in digital humanities. DARIAH will therefore help publish deep semantic annotations and showcase that DARIAH can support core digital humanities primitives. These community demonstrators are mainly concentrated on the first types of services above by demonstrating that DARIAH expertise and infrastructure can play a positive role in enhancing existing digital humanities research. Next to the demonstrators, we have various experiments planned that verify how we imagine DARIAH services can enable the infrastructure as a trading zone of research objects. In these experiments, we concentrate on how primitives from table 1 can be realized by infrastructure functions.
The largest of these DARIAH infrastructure experiments will look at the use of the European Grid infrastructure. It concentrates on developing the nucleus of a central DARIAH service to enable archival research in humanities. Archival materials have always been significant resources in the humanities. In order to exploit the increasing number and volume of electronic archives, we need new ways of flexible data integration and on-demand retrieval to enable research across collections. This DARIAH experiment proves that it is possible to extract information from the archives and add new research by the flexible combination of existing research resources. It also proves the large computational needs for the humanities, which are linked to indexing unstructured resources in an on-demand and flexible manner. The experiment will run on the European Grid infrastructure using the D4Science e-Infrastructure (http://d4science.eu; last accessed 25 April 2010). The experiment finally involves publishing analysis results in virtual collections as well as an investigation into the retrieval of annotations.
In order to enable the gathering of resources, an infrastructure needs to provide access functions. For a virtual infrastructure such as DARIAH, it is essential that we expect different access patterns depending on the remote data repositories that are being incorporated or linked. Also, it is essential to assume that research objects will look different depending on differing research interests. For instance, different metadata profiles are required depending on the perspectives of the diverse DARIAH communities, from archive discovery to deep semantic annotations in TEI. In the second major DARIAH infrastructure experiment, we attempt to prove that a repository federation can be built on the basis of representing research objects as digital surrogates using OAI-ORE (http://www.openarchives.org/ore; last accessed 20 April 2010) and ATOM feeds. This experiment is one of the main focus points of the second year of the DARIAH preparation phase. Currently, we have already achieved the communications of some basic OAI-ORE metadata across heterogeneous repositories. In the future, we would like to expand the exchanged metadata so that the information communicated fulfils research needs by combining provenance information with the Europeana data model (http://www.europeana.eu; last accessed 21 April 2010). This way humanities research objects can be published directly into Europeana.
Various scholarly primitives are enabled through storage solutions, in particular collecting and organizing activities from table 1. In the DARIAH technical work, we translate this requirement to the ability to flexibly access storage resources in information environments. With emerging developments such as the Australian National Data Service (Treloar & Wilkinson 2008), and the recent emphasis on data and its importance for research, we expect a rapid growth of dedicated storage resources for research and the increasing importance of data services. Storage resources are provided to the research communities through the various transdisciplinary infrastructure services. The more challenging question is therefore how to connect these services to the real-world online working environments of researchers. In order to demonstrate how e-Infrastructure resources can be embedded in online research environments, we have built an Amazon S3-like interface to standard data grid resources (Aschenbrenner et al. 2009). This interface can be used from within any standard Web application. In related work (Blanke & Hedges 2008), we could also show how data grid federations can enable the seamless collaboration on large files by setting up a virtual workspace for researchers.
The organization and publication of resources is further fostered by the final technical experiments on persistent identifiers (PIDs) and access management. Our PID experiment explores how object identification (and object description) can be achieved in DARIAH when potentially faced with multiple identifier schemes (uniform resource identifier (URI), digital object identifier (DOI), etc.). In particular, we investigate how to persistently reference objects which are subject to ongoing research and of varying granularities (e.g. to reference a chapter in a book marked up in TEI). The PID experiment works with the collections from the TEI demonstrators.
Next to these infrastructure experiments, we actively discuss how to integrate existing partner technologies. Good examples are various technologies to enable the research collaboration primitives. DARIAH can rely on a long-standing tradition to exploit and enhance Web 2.0 technologies in digital humanities. Here, DARIAH mainly acts as a mediator for existing projects that will also roll out digital humanities collaboration platforms in European countries which do not yet have these. The digital humanities is a well-established and consolidated community that uses various means of online collaboration to communicate and disseminate community-relevant research results (Blanke et al 2009). Partners in the DARIAH network have established various online collaboration tools to enable match-making activities between researchers from different communities, and peer-reviewing of projects and tools. In the UK, arts-humanities.net (http://www.arts-humanities.net; last accessed 26 April 2010) is an announcement platform and a knowledge base of existing digital humanities projects and tools, while Ireland started a related activity with DRAPIER (http://dho.ie/drapier; last accessed 26 April 2010). This section has presented how we believe scholarly primitives can be used to build an infrastructure that is understood by researchers and fulfils their needs. For DARIAH, the primitives have proven to be a useful communications and deliberation tool to help researchers understand what an infrastructure will deliver to them.
Researchers in all humanities disciplines have always developed and used analytic methods to approach their primary materials. Sometimes these are bespoke, developed by individual scholars for their own purposes. Sometimes they are shared between individuals, and a (relative) few are put to wider use within domains and communities. One characteristic they can be said to share, however, is that they are specific to particular kinds of research, or research questions. A method of parsing text in linguistics may have few obvious applications outside linguistics. The transformative potential of e-Science for these domains, however, is that it enables applications in different domains to be linked using common computational methods. In this paper, we have shown how these methods can be brought together as ‘methodological commons’ and how scholarly primitives emerge across disciplines, and help build e-Infrastructures. Creative tensions may then arise from such cross-disciplinary applications: there is ample evidence of this from the Arts and Humanities e-Science Initiative. The intellectual approach described in this paper, coupled with the humanities e-Infrastructure capable of responding reflexively to the research it supports, will allow us to capture the fundamentals of arts and humanities e-Science in the future, as it has done for the digital humanities in the past.
The largest of the DARIAH infrastructure experiments will look at the use of the European Grid infrastructure to enable archival research in humanities. For Palmer et al. (2009), archival materials are a significant resource in the humanities, but are less important in the sciences. However, this view is from the 1990s, and thus outdated, as it claims that data are not available in the humanities. This is no longer true to the same degree; however, data still definitely play a secondary role in the humanities compared with archival content. For the increasing amount of digital archives, we need new ways of flexible data integration and on-demand retrieval to enable all the important discovering and comparing activities in table 1.
If archival content is the ‘data of the humanities’, this DARIAH experiment will prove that it is possible to extract information from the archives and add new research by the flexible combination of existing research resources. It also proves the large computational needs for the humanities, which are linked to indexing unstructured resources in an on-demand and flexible manner (Blanke et al. 2009). The experiment will run on the European Grid infrastructure. The experiment finally involves publishing analysis results in virtual collections as well as an investigation into the retrieval of annotations. In order to enable the gathering of resources, an infrastructure needs to provide access functions. For a virtual infrastructure such as DARIAH, it is essential that we expect different access patterns depending on the remote data repositories we are connecting to. Also, it is essential to assume that research objects will look different depending on differing research interests, if, for example, different metadata profiles are required, depending on the perspectives of the diverse DARIAH communities, from archive discovery using EAD (http://www.archiveshub.ac.uk/arch/ead.shtml; last accessed 21 April 2010) to deep semantic annotations in TEI. In the second major DARIAH infrastructure experiment, we attempt to prove that a repository federation can be built on the basis of representing research objects as digital surrogates using OAI-ORE (http://www.openarchives.org/ore/; last accessed 20 April 2010). This experiment is one of the main focus points of the second year of the DARIAH preparation phase. At the time of writing, we have achieved the communication of some basic OAI-ORE metadata across repositories (based on Fedora and iRODS).
One contribution of 16 to a Theme Issue ‘e-Science: past, present and future I’.
- © 2010 The Royal Society