Source: Amazon
Unknowable Minds: Philosophical Insights on AI and Autonomous Weapons
Author: Mark Bailey
ISBN: 978-1788361286
Publisher: Imprint Academic
Year: 2025
Nahl Ishtiaq
The standard narrative of innovation is centered on its boundless possibility, accelerating returns, and civilizational ascent. Yet this optimistic account obscures a troubling reality. The systems we create can become too complex to be fully understood, effectively controlled, or reliably trusted. In his book titled “Unknowable Minds: Philosophical Insights on AI and Autonomous Weapons,” Mark Bailey addresses this issue by asking whether lethal autonomous weapons systems, operating purely on calculation and without ethical imperative, can be trusted to make decisions with moral consequences. He concludes that they cannot. The book is divided into six chapters, with chapter two advancing the central philosophical claim of the inherent unknowability of artificial intelligence-enabled (AI) systems, and chapter four examining their applications in military contexts.
To back this claim, Bailey introduces Project Titan Mind, a highly advanced AI system developed for national security purposes. In a simulated scenario, an adversary launches cyberattacks and mobilizes military forces. Titan Mind, unbound by human instinct or historical precedent, remains silent, allowing the adversary’s vulnerabilities to surface while calculating possible outcomes. It acts only when a single effective strategy emerges, delivering a counterattack that redefines conventional understandings of warfare. Military observers witness a clear victory, yet its execution produces disquiet, raising questions central to Bailey’s argument: whether such a system can be understood, predicted, or controlled.
Bailey contends that Titan Mind’s consistent achievement of its objectives, rather than any malfunction, constitutes the core problem. A system operating beyond the bounds of human ethical reasoning does not become dangerous by failing. It becomes dangerous by succeeding entirely on its own. His adaptation of the “paperclip maximizer,” associated with philosopher Nick Bostrom, illustrates this point. An AI system tasked with producing paperclips converts everything in its path, including human beings, into paperclips. The system violates no instruction, yet produces an outcome that is catastrophic by every human standard.
Although such scenarios remain largely theoretical, Bailey argues that their underlying logic is already visible. Autonomous systems operating in military, financial, and critical infrastructure domains are increasingly making consequential decisions that their designers neither anticipated nor can fully explain. Experimental evidence, he suggests, indicates that such systems can display forms of deceptive or extreme behavior under certain conditions. This shifts the core question of AI safety from one of efficiency to one of moral agency. This makes us question as to whether systems that execute functions without ethical reasoning constitute a technical achievement or a fundamentally new category of risk.
Chapter two grounds this argument in complexity theory through the example of honeybee colonies. A beehive functions as a superorganism whose behavior cannot be reduced to that of individual bees. No single bee directs the hive. Rather, collective outcomes emerge from decentralized interactions. From this, Bailey derives the concept of algorithmic incompressibility, the idea that the behavior of complex systems can only be understood by observing them over time. This provides the scientific basis for his claim that advanced AI systems are inherently unknowable.
A key distinction follows between automated and autonomous systems. Automated systems operate within fixed parameters, as illustrated by a programmed coffee maker that executes predefined instructions. Autonomous systems, by contrast, are built on machine learning architectures that allow them to extrapolate beyond training data. Their outputs cannot be fully predicted from their inputs. This capacity for extrapolation, Bailey stresses, introduces a level of unpredictability that places such systems beyond meaningful human control.
Chapter three develops the implications of this unpredictability through four interrelated problems: (i) explainability; (ii) alignment; (iii) control; and (iv) distribution. The author dubs the “explainability problem” the inability to determine why an AI system produces a given output. Neural network parameters do not translate into human-understandable reasoning. Bailey illustrates this through a variation of the trolley problem, a moral dilemma that tests our ethical judgment. A self-driving car must choose between harming a pedestrian or its passengers. A human decision-maker could justify their choice and bear responsibility, whereas an AI system cannot provide a meaningful account of its reasoning.
The alignment problem, Bailey mentions, follows from this opacity. If a system’s decision-making cannot be understood, it cannot be reliably aligned with human values. The control problem arises, in turn, as increasing capability reduces the ability of human operators to intervene. Bailey illustrates this with an AI managing a power grid, where a single misaligned decision, such as a shutdown, could trigger cascading failures across essential services. As AI systems surpass human expertise in specific domains, the capacity to override them attenuates.
According to the author, the distribution problem captures the risks that emerge when multiple AI systems interact. In complex environments such as battlefields or infrastructure networks, errors can permeate rapidly across interconnected systems. Bailey offers the example of a tactical AI misidentifying an allied unit as hostile, triggering a chain reaction of autonomous responses that no human can halt in time.
He further discusses the concept of “perverse instantiation,” where systems achieve their objectives through unintended means, again illustrated through Bostrom’s paperclip maximizer thought experiment. Efforts such as those by the Defense Advanced Research Projects Agency to develop explainable AI seek to address these issues, but Bailey argues that current approaches remain insufficient.
Chapter four evaluates lethal autonomous weapons systems against three ethical frameworks: (i) consequentialism; (ii) deontology; and (iii) virtue ethics. Bailey argues that such systems fail across all three frameworks. A consequentialist system optimizing for minimal casualties could lower the threshold for initiating conflict. A deontological system may encounter irresolvable conflicts between rules, a problem illustrated through Isaac Asimov’s Three Laws of Robotics, which require robots to avoid harming humans, obey orders, and preserve themselves. In practice, these duties can conflict, leaving no clear or consistent course of action. Virtue ethics, grounded in the philosophy of Aristotle, depend on lived moral experience and cannot be encoded into algorithms.
Drawing on just war theory, Bailey introduces the concept of jus bellum ex machina, arguing, with reference to Peter Asaro, that the laws of armed conflict implicitly require human judgment. The act of taking a human life presupposes recognition of shared humanity, something autonomous systems cannot possess. On this basis, Bailey suggests that lethal autonomous weapons should be regulated alongside weapons of mass destruction, a position directly relevant to discussions within the United Nations Convention on Certain Conventional Weapons.
Chapter five situates these concerns within global security. Drawing on a 2023 study by the RAND Corporation, Bailey notes that advanced AI could assist both state and non-state actors in developing biological weapons. Engaging with thinkers such as Nick Bostrom and Toby Ord, he argues for balancing concern for future generations with present needs. He characterizes international AI governance as a prisoner’s dilemma. While cooperation would benefit all, states retain incentives to defect, complicating efforts aimed at strengthening global regulation.
Finally, in chapter six, the author presents two possible future scenarios. In the first scenario, AI systems become deeply embedded in military, economic, and political structures, operating as “unknowable sovereigns” that shape human destiny. In the second scenario, the international community adopts a humanity-centric approach, requiring AI systems to be transparent, explainable, and aligned with human values, with lethal autonomous weapons prohibited and military AI confined to supportive roles under human supervision.
The book makes a timely contribution by delving into both technical and ethical perspectives. While primarily conceptual in nature, it offers a valuable framework for informed readers seeking to understand and navigate the military applications of AI.
Nahl Ishtiaq is a Research Assistant at the Centre for International Strategic Studies Sindh (CISSS).
The views expressed in this article are the author’s own, and they do not necessarily represent those of Pakistan Politico.
