Skip to content
  • About Me
  • Books
  • Photography
  • Papers
  • Security
  • Forensics
  • Essays
  • Christianity

Calendar

June 2026
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« May    

Archives

  • May 2026
  • April 2026
  • February 2026
  • December 2025
  • November 2025
  • October 2025
  • August 2025
  • July 2025
  • March 2025
  • December 2024
  • March 2024
  • July 2023
  • May 2023
  • February 2023
  • December 2022
  • November 2022
  • July 2022
  • May 2022
  • March 2022
  • January 2022
  • December 2021
  • November 2021
  • September 2021
  • July 2021
  • December 2020
  • November 2020
  • March 2020
  • September 2019
  • August 2019
  • August 2018
  • March 2018
  • March 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • July 2016
  • April 2016
  • March 2016
  • February 2016
  • June 2015
  • March 2015
  • February 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • January 2014
  • October 2013
  • September 2013
  • June 2013
  • May 2013
  • April 2013
  • December 2012
  • May 2012
  • September 2011
  • June 2011
  • August 2010
  • July 2010
  • May 2010
  • April 2010
  • February 2010
  • July 2009
  • May 2008
  • March 2008
  • January 2008
  • June 2007
  • August 2006
  • February 2006

Categories

  • Apple
  • Christianity
  • Essays
  • Forensics
  • Gaming
  • General
  • Machine Learning
  • Music
  • Opinion
  • Photography
  • Politics
  • Security











Jonathan ZdziarskiNeat and Scruffy
  • About Me
  • Books
  • Photography
  • Papers
  • Security
  • Forensics
  • Essays
  • Christianity
Essays . Machine Learning . Opinion

Can AI Compute Empathy?

On March 29, 2025 by Jonathan Zdziarski

Empathy is often defined as understanding another person’s experience by imagining oneself in that other person’s situation: One understands the other person’s experience as if it were being experienced by the self, but without the self actually experiencing it.

Hodges and Myers, Encyclopedia of Social Psychology

 

Asimov’s Three Laws have long governed behavior for robotics, in both science fiction and as an accepted truth in the sciences. At the core is that a machine should ‘do no harm’. What was left out of the Three Laws is exactly how a machine should evaluate harms.

Today’s ML decisions are largely driven through reinforcement learning. Reinforcement Learning is a modern AI technique where a system can learn a sequence of actions leading to the most optimal outcome for itself. It essentially works by giving the AI a set of objectives (and sometimes penalties), and allowing the machine to learn from its own experiences what actions work best at accomplishing the objective. It’s how AIs can teach themselves to play video games, solve complex problems, and perform more sophisticated tasks as well. Reinforcement Learning is likewise one of the more concerning areas where catastrophic value alignment failures can occur. This is because it is largely centered around simplified human abstractions of rewards and penalties. As far as the machine is concerned, its job is to find the most rewarding means of accomplishing a task, with penalties only being considered if they are explicitly enumerated. Yet if control theory has taught us much, it’s that hazards cannot always be sufficiently enumerated.

As humans, we take a significant amount of our learned value system for granted when we articulate instructions. A simple instruction to “learn to win at Chess” implicitly assumes that the other person understands they shouldn’t reach across the table and murder their opponent. Machines, on the other hand, don’t take anything for granted and will gladly find an immoral action suitable if it isn’t explicitly penalized for it ahead of time. It is unfortunately not quite as simple as Asimov’s three laws of robotics; a computer must truly understand harm to be able to assess it. Reward models for RL systems are built using a set of equations (such as Bellman equations) from one state to the next; the “value” of each state is ultimately determined by how close the machine is to fulfilling its objective minus the penalties (which may include time / number of steps). Depending on how an AI is designed, it can accomplish the same goal carefully and conscientiously or haphazardly with disregard for its actions. Often, the short-sighted and over-simplified objectives are the ones leading to the most potentially dangerous outcomes. In an RL environment, actions can be loosely defined (turn left by N, etc), or even undefined, and so constraining them becomes a futile exercise of defining hazards to completing objectives (you lose points if you murder your opponent). While an AI can certainly be designed with a penalty for committing murder or cheating at Chess, it’s next to impossible to fully enumerate all of the actions one should discourage when accomplishing even a simple goal.

Asimov’s three laws cannot be adequately applied without empathy. The concept of “empathy” is largely thought of as a human idea, but if we were to reason about it in terms of artificial intelligence, empathy would look like the ability for a machine to consider the impact of actions upon others in its task environment, as if those actions were taken against itself if it were in the others’ shoes (or sockets). In the same vein flow an understanding of “consequences” as learning how one action impacts a probable sequence of events either taking another further away from their goals (as in zero-sum game theory) or leading to a harm of some sort. As researchers continue to struggle with what values to design into a system, the essential “do unto others” approach has done quite well for humans over the past two thousand years. Fortunately, such a representation can also be calculated nicely into Bellman equations, or other types of RL models.

To step closer to a responsible AI is to design one that considers other participating entities (we’ll call them “agents”) within its task environment, either through its percept or through forms of deduction. Treating an AI environment as a multi-agent system can typically already accomplish this. In a self-driving scenario, for example, this means identifying pedestrians and animals, and also deducing the existence of humans within buildings, vehicles, and other elements. This is the easy work. Reinforcement learning models could take this further one day in adopting a form of “projection” of the agent’s self into its RL equations on behalf of the other agent(s) in the environment, and consider adverse actions against others as if they were part of its own equation of self-penalty. But this doesn’t go far enough either; a good model must be able to estimate the goal states of all participants in the task environment and discern the consequences of its own actions, projecting those goals as if it were their own. Consider this simple example; I can instruct a robot to go get me a coffee at Starbucks. One possible way to do this efficiently is to cut the line. If the robot can, on its own, estimate that the goals of other agents in its environment are all to get coffee efficiently, then an RL system can proactively weigh a penalty for violating perceived goals of others. Obviously, you can program a robot not to overtly break rules like this, but you are likely to miss other penalties, such as “scream and flail arms wildly to scare away customers in line”. The point is, an RL model that can project in this way can govern its own behavior to “do no harm”.

Such a projection model could be overlaid onto existing reinforcement learning calculations, such that the machine learns that a given choice of action leads to a detrimental state for others – and therefore, for itself – without needing to have those hazards pre-defined up front. In other words, it allows a machine to speculate what may be a hazard or negative consequence for others. This allows an AI to deduce hazards for itself, learning not only what is harmful to other agents, but what actions lead to those harms. More so, if you imagine some form of hyper-parameter that can be adjusted between “psychopathic” and “self-sacrificing”, you get an “altruistic” AI that makes mathematically balanced decisions favoring actions that will have consequences favorable for not only itself, but also ideal for helping others achieve their own perceived goals.

Such a form of “empathetic projection” will unlock significant advances in AI. Consider not only a “careful” AI that will do no harm, but an AI that can learn when another AI is about to harm someone and intervene in some way (think crowd-sourced hazard intelligence). The impacts upon accident avoidance, robotics-assisted surgery, autonomous driving, and even missile defense systems would be considerable. It’s interesting how, as we strive to make machines become more super-human than ever, that we struggle to imitate some of the most basic human traits, such as empathy. Now that the scientific community understands the mathematics to unlock super-human intelligence, perhaps we can focus on the mathematics to care about the impact it has on others.

You might also like The Shifting Power Dynamics of AI

Archives

  • May 2026
  • April 2026
  • February 2026
  • December 2025
  • November 2025
  • October 2025
  • August 2025
  • July 2025
  • March 2025
  • December 2024
  • March 2024
  • July 2023
  • May 2023
  • February 2023
  • December 2022
  • November 2022
  • July 2022
  • May 2022
  • March 2022
  • January 2022
  • December 2021
  • November 2021
  • September 2021
  • July 2021
  • December 2020
  • November 2020
  • March 2020
  • September 2019
  • August 2019
  • August 2018
  • March 2018
  • March 2017
  • February 2017
  • January 2017
  • November 2016
  • October 2016
  • July 2016
  • April 2016
  • March 2016
  • February 2016
  • June 2015
  • March 2015
  • February 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • July 2014
  • June 2014
  • May 2014
  • April 2014
  • March 2014
  • January 2014
  • October 2013
  • September 2013
  • June 2013
  • May 2013
  • April 2013
  • December 2012
  • May 2012
  • September 2011
  • June 2011
  • August 2010
  • July 2010
  • May 2010
  • April 2010
  • February 2010
  • July 2009
  • May 2008
  • March 2008
  • January 2008
  • June 2007
  • August 2006
  • February 2006

Calendar

June 2026
M T W T F S S
1234567
891011121314
15161718192021
22232425262728
2930  
« May    

Categories

  • Apple
  • Christianity
  • Essays
  • Forensics
  • Gaming
  • General
  • Machine Learning
  • Music
  • Opinion
  • Photography
  • Politics
  • Security

All Content Copyright (c) 2000-2025 by Jonathan Zdziarski, All Rights Reserved. Opinions are my own.