This theme issue has the founding ambition of landscaping data ethics as a new branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values). Data ethics builds on the foundation provided by computer and information ethics but, at the same time, it refines the approach endorsed so far in this research field, by shifting the level of abstraction of ethical enquiries, from being information-centric to being data-centric. This shift brings into focus the different moral dimensions of all kinds of data, even data that never translate directly into information but can be used to support actions or generate behaviours, for example. It highlights the need for ethical analyses to concentrate on the content and nature of computational operations—the interactions among hardware, software and data—rather than on the variety of digital technologies that enable them. And it emphasizes the complexity of the ethical challenges posed by data science. Because of such complexity, data ethics should be developed from the start as a macroethics, that is, as an overall framework that avoids narrow, ad hoc approaches and addresses the ethical impact and implications of data science and its applications within a consistent, holistic and inclusive framework. Only as a macroethics will data ethics provide solutions that can maximize the value of data science for our societies, for all of us and for our environments.
This article is part of the themed issue ‘The ethical impact of data science’.
Data science provides huge opportunities to improve private and public life, as well as our environment (consider the development of smart cities or the problems caused by carbon emissions). Unfortunately, such opportunities are also coupled to significant ethical challenges. The extensive use of increasingly more data—often personal, if not sensitive (big data)—and the growing reliance on algorithms to analyse them in order to shape choices and to make decisions (including machine learning, artificial intelligence and robotics), as well as the gradual reduction of human involvement or even oversight over many automatic processes, pose pressing issues of fairness, responsibility and respect of human rights, among others.
These ethical challenges can be addressed successfully. Fostering the development and applications of data science while ensuring the respect of human rights and of the values shaping open, pluralistic and tolerant information societies is a great opportunity of which we can and must take advantage. Striking such a robust balance will not be an easy or simple task. But the alternative, failing to advance both the ethics and the science of data, would have regrettable consequences. On the one hand, overlooking ethical issues may prompt negative impact and social rejection, as was the case, for example, of the NHS care.data programme.1 Social acceptability or, even better, social preferability must be the guiding principles for any data science project with even a remote impact on human life, to ensure that opportunities will not be missed. On the other hand, overemphasizing the protection of individual rights in the wrong contexts may lead to regulations that are too rigid, and this in turn can cripple the chances to harness the social value of data science. The LIBE amendments initially proposed to the European Data Protection Regulation offer a concrete example of the case in point.2
Navigating between the Scylla of social rejection and the Charybdis of legal prohibition in order to reach solutions that maximize the ethical value of data science to benefit our societies, all of us and our environments is the demanding task of data ethics. In achieving this task, data ethics can build on the foundation provided by computer and information ethics, which has focused for the past 30 years on the main challenges posed by digital technologies [1–3]. This rich legacy is most valuable. It also fruitfully grafts data ethics onto the great tradition of ethics more generally. At the same time, data ethics refines the approach endorsed so far in computer and information ethics, as it changes the level of abstraction (LoA) of ethical enquiries from an information-centric (LoAI) to a data-centric one (LoAD).3
Ethical analyses are developed at a variety of LoAs. The shift from LoAI to LoAD is the latest in a series of changes that have characterized the evolution of computer and information ethics. Research in this field first endorsed a human-centric LoA , which addressed the ethical problems posed by the dissemination of computers in terms of professional responsibilities of both their designers and users. The LoA then shifted to a computer-centric one (LoAC) in the mid-1980s , and it changed again at the beginning of the second millennium to LoAI .
These changes responded to rapid, widespread and profound technological transformations. And they had important conceptual implications. For example, LoAC highlighted the nature of computers as universal and malleable tools. It made it easier to understand the impact that computers could have on shaping social dynamics as well as on the design of the environment surrounding us . LoAI then shifted the focus from the technological means to the content (information) that can be created, recorded, processed and shared through such means. In doing so, LoAI emphasized the different moral dimensions of information—i.e. information as the source, the result or the target of moral actions—and led to the design of a macroethical approach able to address the whole cycle of information creation, sharing, storage, protection, usage and possible destruction .
Data science, as the latest phase of the information revolution, is now prompting a further change in the LoA at which our ethical analysis can be developed most fruitfully. In a few decades, we have come to understand that it is not a specific technology (computers, tablets, mobile phones, online platforms, cloud computing and so forth), but what any digital technology manipulates that represents the correct focus of our ethical strategies. The shift from information ethics to data ethics is probably more semantic than conceptual, but it does highlight the need to concentrate on what is being handled as the true invariant of our concerns. This is why labels such as ‘robo-ethics’ or ‘machine ethics’ miss the point, anachronistically stepping back to a time when ‘computer ethics’ seemed to provide the right perspective. It is not the hardware that causes ethical problems, it is what the hardware does with the software and the data that represents the source of our new difficulties. LoAD brings into focus the different moral dimensions of data. In doing so, it highlights the fact that, before concerning information, ethical problems such as privacy, anonymity, transparency, trust and responsibility concern data collection, curation, analysis and use, and hence they are better understood at that level.
In the light of this change of LoA, data ethics can be defined as the branch of ethics that studies and evaluates moral problems related to data (including generation, recording, curation, processing, dissemination, sharing and use), algorithms (including artificial intelligence, artificial agents, machine learning and robots) and corresponding practices (including responsible innovation, programming, hacking and professional codes), in order to formulate and support morally good solutions (e.g. right conducts or right values). This means that the ethical challenges posed by data science can be mapped within the conceptual space delineated by three axes of research: the ethics of data, the ethics of algorithms and the ethics of practices.
The ethics of data focuses on ethical problems posed by the collection and analysis of large datasets and on issues ranging from the use of big data in biomedical research and social sciences , to profiling, advertising  and data philanthropy [11,12] as well as open data . In this context, key issues concern possible re-identification of individuals through data-mining, -linking, -merging and re-using of large datasets, as well as risks for so-called ‘group privacy’, when the identification of types of individuals, independently of the de-identification of each of them, may lead to serious ethical problems, from group discrimination (e.g. ageism, ethnicism, sexism) to group-targeted forms of violence [14,15]. Trust [16,17] and transparency  are also crucial topics in the ethics of data, in connection with an acknowledged lack of public awareness of the benefits, opportunities, risks and challenges associated with data science . For example, transparency is often advocated as one of the measures that may foster trust. However, it is unclear what information should be made transparent and to whom information should be disclosed.
The ethics of algorithms addresses issues posed by the increasing complexity and autonomy of algorithms broadly understood (e.g. including artificial intelligence and artificial agents such as Internet bots), especially in the case of machine learning applications. In this case, some crucial challenges include moral responsibility and accountability of both designers and data scientists with respect to unforeseen and undesired consequences as well as missed opportunities [20,21]. Unsurprisingly, the ethical design and auditing  of algorithms' requirements and the assessment of potential, undesirable outcomes (e.g. discrimination or the promotion of antisocial content) is attracting increasing research.
Finally, the ethics of practices (including professional ethics and deontology) addresses the pressing questions concerning the responsibilities and liabilities of people and organizations in charge of data processes, strategies and policies, including data scientists, with the goal to define an ethical framework to shape professional codes about responsible innovation, development and usage, which may ensure ethical practices fostering both the progress of data science and the protection of the rights of individuals and groups . Three issues are central in this line of analysis: consent, user privacy and secondary use.
While they are distinct lines of research, the ethics of data, algorithms and practices are obviously intertwined, and this is why it may be preferable to speak in terms of three axes defining a conceptual space within which ethical problems are like points identified by three values. Most of them do not lie on a single axis. For example, analyses focusing on data privacy will also address issues concerning consent and professional responsibilities. Likewise, ethical auditing of algorithms often implies analyses of the responsibilities of their designers, developers, users and adopters. Data ethics must address the whole conceptual space and hence all three axes of research together, even if with different priorities and focus. And for this reason, data ethics needs to be developed from the start as a macroethics, that is, as an overall ‘geometry’ of the ethical space that avoids narrow, ad hoc approaches but rather addresses the diverse set of ethical implications of data science within a consistent, holistic and inclusive framework.
This theme issue represents a significant step in such a constructive direction. It collects 14 other contributions, each analysing a specific topic belonging to one of the three axes of research outlined above, while considering its implications for the other two. The articles included in this issue were initially presented at a workshop on ‘The Ethics of Data Science, The Landscape for the Alan Turing Institute’ hosted at the University of Oxford in December 2015. The issue shares with the workshop the founding ambition of landscaping data ethics as a new area of ethical enquiries and identifying the most pressing problems to solve and the most relevant lines of research to develop.
We declare we have no competing interests.
We received no funding for this study.
Before leaving the reader to the articles, we express our gratitude to the authors and the reviewers for their contributions, as well as to the Alan Turing Institute for funding the landscaping workshop as part of its research strategy. The workshop and this theme issue would not have been possible without the strong support and continuous encouragement of many colleagues, but in particular of Prof. Helen Margetts, Director of the Oxford Internet Institute, and of Prof. Andrew Blake, Director of the Alan Turing Institute. We are also grateful to Bailey Fallon, the journal's Commissioning Editor, and to the editorial office of Philosophical Transactions A for their great help during the process leading to the publication of this theme issue.
One contribution of 15 to a theme issue ‘The ethical impact of data science’.
↵2 Amendments 27, 327, 328 and 334–337 proposed in Albrecht's Draft Report, http://www.europarl.europa.eu/meetdocs/2009_2014/documents/libe/pr/922/922387/922387en.pdf.
↵3 The method of abstraction is a common methodology in computer science  and in philosophy and ethics of information . It specifies the different LoAs at which a system can be analysed, by focusing on different aspects, called observables. The choice of the observables depends on the purpose of the analysis and determines the choice of LoA. Any given system can be analysed at different LoAs. For example, an engineer interested in maximizing the aerodynamics of a car may focus upon the shape of its parts, their weight and the materials. A customer interested in the aesthetics of the same car may focus on its colour and on the overall look and may disregard the shape, weights and material of the car's components.
- Accepted October 3, 2016.
- © 2016 The Author(s)
Published by the Royal Society. All rights reserved.