Month: December 2024

The Shifting Power Dynamics of AI

One of the focus areas of my graduate research is artificial intelligence. In my foray into adversarial game theory, I became acquainted with AI’s value alignment problem firsthand. In the development of a strategy-theoretic AI Chess agent project, I decided that it should lose points for allowing its pieces to be in jeopardy. The change resulted in the opposite effect I had hoped – significant losses, which puzzled me at first until I realized that the agent was killing off its pieces to prevent them from being put in jeopardy (once this was worked out, the strategy-theoretic approach dominated all other AI techniques, as it provides accurate non-terminal RL feedback). In the grander context of artificial intelligence, the potential for a catastrophic value alignment failure is all too easy to create as the result of short-sighted policies (such as my Chess strategy), or other simple miscalculations.

It’s in everyone’s best interest for AI to behave rationally, however many believe that – in the context of modern AI and deep learning – AI can never be formally verified to the degree that its actions can be deterministically predicted to be responsible. We tend to treat AI with the same sense of dualism that we treat reality, yet the one thing we do know is that AI is an entirely materialistic universe, and not dualist at all. Determinism of AI systems is based upon pure mathematics, with predictable causation. It is true that we can not always observe why AI behaves a certain way, however here we can learn much from classical Stoicism. The early stoics asserted that all qualitative states are explained by specific factors, even if those factors were not always observable. All subsequent qualitative states are likewise determined by the prior states and additional factors. No change can happen without an explicit cause. Whether it’s the logical determinism built through training data, alignment of real time with processing cycles, or reconciling other factors, every single micro state within a configuration of the machine can be observed with enough work. While modern philosophy essentially rejects the stoic concept of fate (of humanity), “fate” in computation falls square within the realm of an entirely deterministic material universe. AI lives in a material world, and she’s a material girl.

AI is largely unverifiable today because industry hasn’t created an affordable way to provide the computing power to observe all factors that contribute to a system’s qualitative state. Despite the inability to verify AI, industry has plotted a course regardless of edge cases that may sometimes be life threatening. Incredible progress in artificial intelligence has all but guaranteed it will be ubiquitous one day. There is little doubt that autonomous vehicles will eventually outperform human drivers, or that machine learning can more accurately diagnose a health problem. There is, on the other hand, great doubt that industry will act responsibly enough to ensure sufficient safety controls intervene when things go wrong. AI will likely never operate with rational judgment 100% of the time, nor will it ever understand the ethical implications of its action; they will always be prone to value alignment catastrophe. Of course, humans lack ethics and rationality as well, and so society controls this by holding us accountable for our actions. Unlike humans, however, industry is treated differently. This is particularly true with emerging technologies and even more so of those that we don’t fully understand. After all, how can one hold math accountable? Dismantling a broken robot does not solve the problem, particularly if the code is replicated across a million others. The connection between what holds true in a computer system and the outcome that is “fated” to occur is “based on an ontological foundation in which certain elements from logic and physics coincide” [6]; Chrysippus wrote of the close relationship between “what is true” and “what is in motion” long before AI in his Bivalence theory. A modern take is simply this: an AI’s “fate” is the direct result of a system’s physical configuration and sensor inputs. Imagine if he were alive to have observed AI, or even a good quality toaster.

Read More