14 ethical dilemmas. Rather, we focus on his writing in the New York Times. We acknowledge that there is much more to providing ethical advice than what is part of the column The Ethicist. Speculating beyond the advice in the column is simply outside the scope of our study. Having said this, we found it remarkable that our group of experts scored the perceived usefulness of the average advice as a 4.94 out of 7. In our view, that suggests that there exists some inherent usefulness in this format of advice giving. 2. A system based on AI has no ability to empathize with those facing the dilemma. As part of the recent media attention that GPT-4 has gathered, many pundits refer to LLMs as “auto complete on steroids” or “a stochastic parrot”. Being reasonably aware of how this technology works, we have no problems with such descriptions. However, we propose that for the purpose of our study it should not matter HOW the technology functions but rather WHAT it produces as an output. We acknowledge that LLM models have no understanding or representation of human feelings and cognition. This lies in the nature of any mathematical model. For example, the epidemiological models that were used during the Covid-19 pandemic to make forecasts of new infections and support health policy decisions did not have an understanding of the death and suffering that could come with an infection. Yet, they were still valuable tools. British statistician George Box allegedly said: “All models are wrong, some are useful.” In our view, GPT-4 has clearly demonstrated its usefulness in generating ethical advice at New York Times level quality. 3. Humans might not like to get ethical advice from AI. We indeed only measured the perceived usefulness of the advice. We do not know to what extent the advice would have been followed. The question to what extent humans take advice from computers is actively studied in the literature. One emerging result points to “algorithm aversion” of many people, indicating that they would prefer advice provided (though not necessarily generated) by a human. In some domains, however, one might argue that an anonymous piece of technology would make it easier for a person to discuss certain topics. Either way, our focus is on the usefulness of the ethical advice, not on how it is delivered. 4. Our focus on a single outcome metric averaged across raters penalizes controversial advice. If we want ethical advice to be impactful, to change the behavior of the one seeking advice, and to open up a new perspective to a dilemma, it sometimes has to be eccentric if not outright controversial. Such advice might perform poorly on average, but in some situations, it could be extremely valuable. AI makes it possible to provide someone with multiple viewpoints generated from very different perspectives, just like many boards are designed to represent a mixture of different opinions. Rather than providing one piece of advice that confirms the default moral viewpoint of the average
Can AI Provide Ethical Advice? Page 13 Page 15