Mature professions develop frameworks for how practitioners should behave. These frameworks provide guidelines and standard operating procedures for practitioners. They establish the set of values practitioners swear to uphold. They are the foundation for an ethics of that particular trade. One of the earliest and best-known examples of such a framework is the Hippocratic Oath, developed well-before Christ. In taking the oath the physician swears to Do No Harm. Data scientists, on the one hand, beginning to understand the power of their own trade, and on the other, feeling the blowback from an inquisitive public that has been increasingly exploited by bad actors, have begun discussing what such a framework might entail. Especially since the Cambridge Analytic/Facebook fallout, the topic of ethical standards in data science has been much discussed.
Learning AI without ethics is like becoming spiderman without having Uncle Ben say "with great power comes great responsibility" to you. Much love to Uncle Ben, lets keep it ethical.
— Siraj Raval (@sirajraval) May 1, 2018
The tweet above has the right sentiment but lacks direction. It almost reads like the following bit from a South Park episode (edited here to be nerdier):
Step 1. Set out to learn AI, ethically
Step 2. ???
Step 3. (Ethical) profit!
The question is not whether data science should have its own ethical code but what an ethics of data science would entail. The profession is trying to understand its values and develop guidelines to police itself.
And currently, there is no consensus on the issue.
DJ Patil, the former Chief Data Scientist of the U.S., presented a call to action for the community in this article. The first paragraph reads:
2.5 quintillion bytes of data are created every day. It’s created by you when you’re commute to work or school, when you’re shopping, when you get a medical treatment, and even when you’re sleeping. It’s created by you, your neighbors, and everyone around you. So, how do we ensure it’s used ethically?
This paragraph raises the issue of the ‘is-ought’ problem:
from wikipedia import wikipedia print wikipedia.summary('is-ought-problem') The is–ought problem, as articulated by Scottish philosopher and historian David Hume (1711–76), states that many writers make claims about what ought to be, based on statements about what is. Hume found that there seems to be a significant difference between positive statements (about what is) and prescriptive or normative statements (about what ought to be), and that it is not obvious how one can coherently move from descriptive statements to prescriptive ones. The is–ought problem is also known as Hume's law, or Hume's guillotine. A similar view is defended by G. E. Moore's open-question argument, intended to refute any identification of moral properties with natural properties. This so-called naturalistic fallacy stands in contrast to the views of ethical naturalists.
The fact that there is a so much data being created does not give us any insight into how it should be handled. The presence of a vast ocean of information does not offer us any clues about how we should conduct our lives near the shore or on the vessels that go out to trawl it. The fact of it being there does not help us in determining how we should behave. Whether there is a significant amount of personalized data or very little a code of ethics should remain consistent as to it’s handling and use.
Data science, and science generally, must look beyond fact to construct an ethical framework. It will take community engagement to develop and training to enforce.
A mandate to “behave ethically’ will not save us. And we cannot build an ethical neural network if we don’t have an ethical framework to judge it against. This is one of the problems that practitioners face in developing an ethics for data science: We believe we can use statistics to resolve this issue, but we can’t because we can’t derive an ought from an is.
There are other problems too.
Problems in developing an ethics of data science
1. ethics is hard
Let’s put the theory to the side for now. It’s difficult to know what the morally courageous thing to do is at any given point in time. It’s difficult to know what an ethical vs unethical use of data science might be. The problem is made more difficult because data scientists have not yet developed the tools to really understand what’s going on inside the black boxes of the models we build. We know certain information about the models: their architecture, hyperparameters, and the features fed into them. But we don’t really know the internals of how these algorithms make any given decisions for any given set of information.
Meanwhile, in the theory realm there are many vastly different schools of thought on morality and ethics. For example, the famous “maximize the benefit” of all utilitarians have different sects. One that might appeal to you are the act utilitarians that focused on individualized cases but also prescribe a “maximum benefit for all” mentality.
Act utilitarians reject rigid rule-based moralities that identify whole classes of actions as right or wrong. They argue that it is a mistake to treat whole classes of actions as right or wrong because the effects of actions differ when they are done in different contexts and morality must focus on the likely effects of individual actions. It is these effects that determine whether they are right or wrong in specific cases. Act utilitarians acknowledge that it may be useful to have moral rules that are “rules of thumb”—i.e., rules that describe what is generally right or wrong, but they insist that whenever people can do more good by violating a rule rather than obeying it, they should violate the rule. They see no reason to obey a rule when more well-being can be achieved by violating it.
Of course, the Act Utilitarian approach is likely not the best fit for data science. The point is that there are many varied schools of thought that should be considered in helping with this problem.
2. there’s little money in it
At one of my former positions, a co-worker who would get frustrated with the lack of resources devoted for refactoring code would say: “There’s no time to do it right but plenty of time to do it twice.”
With the demand for data science talent reaching new heights, there is little business interest in producing more ethically competent data scientists. This trend is unlikely to end soon.
3. ethics must be considered at every step in the pipeline
Meaning at every step in the pipeline data scientists may face ethical questions and should think carefully about their decisions. Under tight deadlines and pressure, choices will be made. These choices may have significant consequences on *real* people. A financial algorithm that considers race in its recommendations on the kinds of loans that are available for a particular person might cost that person a lot of cash.
4. we need it now
With advances in voice and video imitation and manipulation, in how financial transactions are handled, prices calculated, customers recommended it the power of the data scientist is shown. This framework should be developed soon otherwise more unethical data science products will be deployed.
Making a start
— Mike Olson (@mikeolson) May 6, 2018
Universities and governments have stepped in to provide some backbone to this discussion. In particular, the UK released a guidance document for civil servants. Acknowledging this may only be a start on the framework and it’s for civil servants, let’s walk through its six principles.
Principle 1. Start with clear user need and public benefit. Here we immediately run into trouble. There are certain applications of AI that have neither user need or public benefit or uses that are dubious at best. Let’s take advancing computer vision for use in more exacting weapons systems. Is there clear user need there or public benefit? You could argue the more precise the weapon the fewer number of individuals might wind up as “collateral damage.” But in the private sector, this principle is unlikely to hold in some cases. Sometimes it’s just about profit.
Principle 2. Use data and tools which have the minimum intrusion necessary. This principle is a rehash of “minimal effective dose” which is a phrase you here in the health sector — which I agree with. However, this minimal effective dose might not be the minimum amount of intrusion. If we are to say for the intended degree of outcome using the minimum intrusion, then I agree.
Principle 3. Create robust data science models. It’s really unclear what they mean here. They use the example of putting ethnicity into models and whether one should be adding these kinds of features.
Principle 4. Be alert to public perceptions. The sentiment discussed in this principle regards the public’s opinion of how their data is used, which shifts over time. But to base an ethical framework on perception or opinion — ever-shifting and contorting is odd.
Where to go from here
Data scientists are sometimes arrogant people. We believe our mathematical aptitude, the size of our datasets, the cleverness of our algorithms can save us, or at least compensate for falling down on the ethical implications of our work. We spend so much time on the difficult problems of how we don’t ask the meaningful question of why.
Other departments, like philosophy, have been tackling these issues for years. To start, data scientists can review the robust arguments that have been happening for hundreds of years and are still ongoing today.
If the data science community does not intervene to review, expand, elaborate, elucidate a framework of ethical conduct then there won’t be a framework for ethical conduct. And people will be hurt, or worse.
The community is starting to have these conversations. If you want to participate you can start by following DJ Patil on twitter. Expand the set of data scientists you follow from there and join in on the profession-wide discussion of ethics. It’s important.
China's AI based social credit score system has already resulted in millions of citizens with 'low scores' being banned from traveling on planes & trains. This is a prime example of the misuse of AI technology.
— Siraj Raval (@sirajraval) May 9, 2018
— dj patil (@dpatil) February 21, 2018