June 2026
M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Can AI Compute Empathy?

On March 29, 2025 by Jonathan Zdziarski

Empathy is often defined as understanding another person’s experience by imagining oneself in that other person’s situation: One understands the other person’s experience as if it were being experienced by the self, but without the self actually experiencing it.

Hodges and Myers, Encyclopedia of Social Psychology

Asimov’s Three Laws have long governed behavior for robotics, in both science fiction and as an accepted truth in the sciences. At the core is that a machine should ‘do no harm’. What was left out of the Three Laws is exactly how a machine should evaluate harms.

Today’s ML decisions are largely driven through reinforcement learning. Reinforcement Learning is a modern AI technique where a system can learn a sequence of actions leading to the most optimal outcome for itself. It essentially works by giving the AI a set of objectives (and sometimes penalties), and allowing the machine to learn from its own experiences what actions work best at accomplishing the objective. It’s how AIs can teach themselves to play video games, solve complex problems, and perform more sophisticated tasks as well. Reinforcement Learning is likewise one of the more concerning areas where catastrophic value alignment failures can occur. This is because it is largely centered around simplified human abstractions of rewards and penalties. As far as the machine is concerned, its job is to find the most rewarding means of accomplishing a task, with penalties only being considered if they are explicitly enumerated. Yet if control theory has taught us much, it’s that hazards cannot always be sufficiently enumerated.

As humans, we take a significant amount of our learned value system for granted when we articulate instructions. A simple instruction to “learn to win at Chess” implicitly assumes that the other person understands they shouldn’t reach across the table and murder their opponent. Machines, on the other hand, don’t take anything for granted and will gladly find an immoral action suitable if it isn’t explicitly penalized for it ahead of time. It is unfortunately not quite as simple as Asimov’s three laws of robotics; a computer must truly understand harm to be able to assess it. Reward models for RL systems are built using a set of equations (such as Bellman equations) from one state to the next; the “value” of each state is ultimately determined by how close the machine is to fulfilling its objective minus the penalties (which may include time / number of steps). Depending on how an AI is designed, it can accomplish the same goal carefully and conscientiously or haphazardly with disregard for its actions. Often, the short-sighted and over-simplified objectives are the ones leading to the most potentially dangerous outcomes. In an RL environment, actions can be loosely defined (turn left by N, etc), or even undefined, and so constraining them becomes a futile exercise of defining hazards to completing objectives (you lose points if you murder your opponent). While an AI can certainly be designed with a penalty for committing murder or cheating at Chess, it’s next to impossible to fully enumerate all of the actions one should discourage when accomplishing even a simple goal.

Asimov’s three laws cannot be adequately applied without empathy. The concept of “empathy” is largely thought of as a human idea, but if we were to reason about it in terms of artificial intelligence, empathy would look like the ability for a machine to consider the impact of actions upon others in its task environment, as if those actions were taken against itself if it were in the others’ shoes (or sockets). In the same vein flow an understanding of “consequences” as learning how one action impacts a probable sequence of events either taking another further away from their goals (as in zero-sum game theory) or leading to a harm of some sort. As researchers continue to struggle with what values to design into a system, the essential “do unto others” approach has done quite well for humans over the past two thousand years. Fortunately, such a representation can also be calculated nicely into Bellman equations, or other types of RL models.

To step closer to a responsible AI is to design one that considers other participating entities (we’ll call them “agents”) within its task environment, either through its percept or through forms of deduction. Treating an AI environment as a multi-agent system can typically already accomplish this. In a self-driving scenario, for example, this means identifying pedestrians and animals, and also deducing the existence of humans within buildings, vehicles, and other elements. This is the easy work. Reinforcement learning models could take this further one day in adopting a form of “projection” of the agent’s self into its RL equations on behalf of the other agent(s) in the environment, and consider adverse actions against others as if they were part of its own equation of self-penalty. But this doesn’t go far enough either; a good model must be able to estimate the goal states of all participants in the task environment and discern the consequences of its own actions, projecting those goals as if it were their own. Consider this simple example; I can instruct a robot to go get me a coffee at Starbucks. One possible way to do this efficiently is to cut the line. If the robot can, on its own, estimate that the goals of other agents in its environment are all to get coffee efficiently, then an RL system can proactively weigh a penalty for violating perceived goals of others. Obviously, you can program a robot not to overtly break rules like this, but you are likely to miss other penalties, such as “scream and flail arms wildly to scare away customers in line”. The point is, an RL model that can project in this way can govern its own behavior to “do no harm”.

Such a projection model could be overlaid onto existing reinforcement learning calculations, such that the machine learns that a given choice of action leads to a detrimental state for others – and therefore, for itself – without needing to have those hazards pre-defined up front. In other words, it allows a machine to speculate what may be a hazard or negative consequence for others. This allows an AI to deduce hazards for itself, learning not only what is harmful to other agents, but what actions lead to those harms. More so, if you imagine some form of hyper-parameter that can be adjusted between “psychopathic” and “self-sacrificing”, you get an “altruistic” AI that makes mathematically balanced decisions favoring actions that will have consequences favorable for not only itself, but also ideal for helping others achieve their own perceived goals.

Such a form of “empathetic projection” will unlock significant advances in AI. Consider not only a “careful” AI that will do no harm, but an AI that can learn when another AI is about to harm someone and intervene in some way (think crowd-sourced hazard intelligence). The impacts upon accident avoidance, robotics-assisted surgery, autonomous driving, and even missile defense systems would be considerable. It’s interesting how, as we strive to make machines become more super-human than ever, that we struggle to imitate some of the most basic human traits, such as empathy. Now that the scientific community understands the mathematics to unlock super-human intelligence, perhaps we can focus on the mathematics to care about the impact it has on others.

You might also like The Shifting Power Dynamics of AI

Calendar

Archives

Categories

Can AI Compute Empathy?

Archives

Calendar

Categories