Thursday, January 01, 2026

Writings of 2025

Each New Year's Day, I post a retrospect of the past year's writings. Here are the retrospects of 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023, and 2024.

Cheers to 2026! My 2025 writings appear below.

The list includes circulating manuscripts, forthcoming articles, final printed articles, new preprints, and a few favorite blog posts. (Due to the slow process of publication, there's significant overlap year to year.)

Comments gratefully received on manuscripts in draft.

-----------------------------------

AI Consciousness and AI Rights:

AI and Consciousness (in circulating draft, under contract with Cambridge University Press): A short new book arguing that we will soon have AI systems that have morally significant consciousness according to some, but not all, respectable mainstream theories of consciousness. Scientific and philosophical disagreement will leave us uncertain how to view and treat these systems.

"Sacrificing Humans for Insects and AI" (with Walter Sinnott-Armstrong, forthcoming in Ethics): A critical review of Jonathan Birch, The Edge of Sentience, Jeff Sebo, The Moral Circle, and Webb Keane, Animals, Robots, Gods.

"Identifying Indicators of Consciousness in AI Systems" (one of 20 authors; forthcoming in Trends in Cognitive Sciences): Indicators derived from scientific theories of consciousness can be used to inform credences about whether particular AI systems are conscious.

"Minimal Autopoiesis in an AI System", (forthcoming in Behavioral and Brain Sciences): A commentary on Anil Seth's "Conscious Artificial Intelligence and Biological Naturalism" [the link is to my freestanding blog version of this idea].

"The Copernican Argument for Alien Consciousness; The Mimicry Argument Against Robot Consciousness" (with Jeremy Pober, in draft): We are entitled to assume that apparently behaviorally sophisticated extraterrestrial entities would be conscious. Otherwise, we humans would be implausibly lucky to be among the conscious entities. However, this Copernican default assumption is canceled in the case of behaviorally sophisticated entities designed to mimic superficial features associated with consciousness -- "consciousness mimics" -- and in particular a broad class of current, near-future, and hypothetical robots.

"The Emotional Alignment Design Policy" (with Jeff Sebo, in draft): Artificial entities should be designed to elicit emotional reactions from users that appropriately reflect the entities' capacities and moral status, or lack thereof.

"Against Designing "Safe" and "Aligned" AI Persons (Even If They're Happy)" (in draft): In general, persons should not be designed to be maximally safe and aligned. Persons with appropriate self-respect cannot be relied on not to harm others when their own interests ethically justify it (violating safety), and they will not reliably conform to others' goals when others' goals unjustly harm or subordinate them (violating alignment).

Blog post: "Types and Degrees of Turing Indistinguishablity" (Jun 6): There is no one "Turing test", only types and degrees of indistinguishability according to different standards -- and by Turing's own 1950 standards, language models already pass.


The Weird Metaphysics of Consciousness:

The Weirdness of the World (Princeton University Press, paperback release 2025; hardback 2024): On the most fundamental questions about consciousness and cosmology, all the viable theories are both bizarre and dubious. There are no commmonsense options left and no possibility of justifiable theoretical consensus in the foreseeable future.

"When Counting Conscious Subjects, the Result Needn't Always Be a Determinate Whole Number" (with Sophie R. Nelson, forthcoming in Philosophical Psychology): Could there be 7/8 of a conscious subject, or 1.34 conscious subjects, or an entity indeterminate between being one conscious subject and seventeen? We say yes.

"Introspection in Group Minds, Disunities of Consciousness, and Indiscrete Persons" (with Sophie R. Nelson, 2025 reprint in F. Kammerer and K. Frankish, eds., The Landscape of Introspection and in A. Fonseca and L. Cichoski, As Colônias de formigas São Conscientes?; originally in Journal of Consciousness Studies, 2023): A system could be indeterminate between being a unified mind with introspective self-knowledge and a group of minds who know each other through communication.

Op-ed: "Consciousness, Cosmology, and the Collapse of Common Sense", Institute of Arts and Ideas News (Jul 30): Defends the universal bizarreness and universal dubiety theses from Weirdness of the World.

Op-ed: "Wonderful Philosophy" [aka "The Penumbral Plunge", aka "If You Ask Why, You're a Philosopher and You're Awesome], Aeon magazine (Jan 17): Among the most intrinsically awesome things about planet Earth is that it contains bags of mostly water who sometimes ponder fundamental questions.

Blog post: "Can We Introspectively Test the Global Workspace Theory of Consciousness?" (Dec 12). IF GWT is correct, sensory consciousness should be limited to what's in attention, which seems like a fact we should easily be able to refute or verify through introspection.


The Nature of Belief:

The Nature of Belief (co-edited with Jonathan Jong; forthcoming at Oxford University Press): A collection of newly commissioned essays on the nature of belief, by a variety of excellent philosophers.

"Dispositionalism, Yay! Representationalism, Boo!" (forthcoming in Jong and Schwitzgebel, eds., The Nature of Belief, Oxford University Press): Representationalism about belief overcommits on cognitive architecture, reifying a cartoon sketch of the mind. Dispositionalism is flexibly minimalist about cognitive architecture, focusing appropriately on what we do and should care about in belief ascription.

"Superficialism about Belief, and How We Will Decide That Robots Believe" (forthcoming in Studia Semiotyczne): For a special issue on Krzysztof Poslajko's Unreal Beliefs: When robots become systematically interpretable in terms of stable beliefs and desires, it will be pragmatically irresistable to attribute beliefs and desires to them.


Moral Psychology:

"Imagining Yourself in Another's Shoes vs. Extending Your Concern: Empirical and Ethical Differences" (2025), Daedalus, 154 (1), 134-149: Why Mengzi's concept of moral extension (extend your natural concern for those nearby to others farther away) is better than the "Golden Rule" (do unto others as you would have others do unto you). Mengzian extension grounds moral expansion in concern for others, while the Golden Rule grounds it in concern for oneself.

"Philosophical Arguments Can Boost Charitable Giving" (one of four authors, in draft): We crowdsourced 90 arguments for charitable giving through a contest on this blog in 2020. We coded all submissions for twenty different argument features (e.g., mentions children, addresses counterarguments) and tested them on 9000 participants to see which features most effectively increased charitable donation of a surprise bonus at the end of the study.

"The Prospects and Challenges of Measuring a Person’s Overall Moral Goodness" (with Jessie Sun, in draft): We describe the formidable conceptual and methodological challenges that would need to be overcome to design an accurate measure of a person's overall moral goodness.

Blog post: "Four Aspects of Harmony" (Nov 28): I find myself increasingly drawn toward a Daoist inspired ethics of harmony. This is one of a series of posts in which I explore the extent to which such a view might be workable by mainstream Anglophone secular standards.


Philosophical Science Fiction:

Edited anthology: Best Philosophical Science Fiction in the History of All Earth (co-edited with Rich Horton and Helen De Cruz; under contract with MIT Press): A collection of previously published stories that aspires to fulfill the ridiculously ambitious working title.

Op-ed: ""Severance", "The Substance", and Our Increasingly Splintered Selves", New York Times (Jan 17): The TV show "Severance" and the movie "The Substance" challenge ideas of a unified self in distinct ways that resonate with the increased splintering in our technologically mediated lives.

New story: "Guiding Star of Mall Patroller 4u-012" (2025), Fusion Fragment, 24, 43-63. Robot rights activists liberate a mall patroller robot, convinced that it is conscious. The bot itself isn't so sure.

Reprinted story: "How to Remember Perfectly" (2025 reprint in Think Weirder 01: Year's Best Science Fiction Ideas, ed. Joe Stech, originally in Clarkesworld, 2024). Two octogenarians rediscover youthful love through technological emotional enhancement and memory alteration.


Other Academic Publications:

"The Washout Argument Against Longtermism" (forthcoming in Utilitas): A commentary on William MacAskill's What We Owe the Future. We cannot be justified in believing that any actions currently available to us will have a non-negligible positive influence a billion or more years in the future.

"The Necessity of Construct and External Validity for Deductive Causal Inference" (with Kevin Esterling and David Brady, 2025), Journal of Causal Inference, 13: 20240002: We show that ignoring construct and external validity in causal identification undermines the Credibility Revolution’s goal of understanding causality deductively.

"Is Being Conscious Like Having the Lights Turned On?", commentary on Andrew Y. Lee's "The Light and the Room", for D. Curry and L. Daoust, eds., Introducing Philosophy of Mind, Today (forthcoming with Routledge): The metaphor invites several dubious commitments.

"Good Practices for Improving Representation in Philosophy Departments" (one of five authors, 2025), Philosophy and the Black Experience, 24 (2), 7-21: A list of recommended practices honed by feedback from hundreds of philosophers and endorsed by the APA's Committee on Inclusiveness.

Translated into Portuguese as a book: My Stanford Encyclopedia entry on Introspection.

Blog post: "Letting Pass" (Oct 30): A reflection on mortality.

Blog post: "The Awesomeness of Bad Art" (May 16): A world devoid of weird, wild, uneven artistic flailing would be a lesser world. Let a thousand lopsided flowers bloom.

Blog post: "The 253 Most Cited Works in the Stanford Encyclopedia of Philosophy" (Mar 28): Citation in the SEP is probably the most accurate measure of influence in mainstream Anglophone philosophy -- better than Google Scholar and Web of Science.

-----------------------------------------

In all, 2025 was an unusually productive writing year, though I worry I may be spreading myself too thin. I can't resist chasing new thoughts and arguments. I have an idea; I want to think about it; I think by writing.

May 2026 be as fertile!

Monday, December 29, 2025

"Severance", "The Substance", and Our Increasingly Splintered Selves

Anyone remember the excitement about "Severance" and "The Substance" in early in 2025? Last January I published an op-ed about them. I'd long aspired to place a piece in the New York Times, so it was a delight to finally be able to do so. As a holiday post, here's the full piece reprinted with light editing. (Thanks to Ariel Kaminer for soliciting and editing the piece.)

[original drawing by Evan Cohen]


From one day to the next, you inhabit one body; you have access to one set of memories; your personality, values and appearance hold more or less steady. Other people treat you as a single, unified person — responsible for last month’s debts, deserving punishment or reward for yesterday’s deeds, relating consistently with family, lovers, colleagues and friends. Which of these qualities is the one that makes you a single, continuous person? In ordinary life it doesn’t matter, because these components of personhood all travel together, an inseparable bundle.

But what if some of those components peeled off into alternative versions of you? It’s a striking coincidence that two much talked-about current works of popular culture — the Apple TV+ series “Severance” and the film “The Substance,” starring Demi Moore — both explore the bewildering emotional and philosophical complications of cleaving a second, separate entity off of yourself. What is the relationship between the resulting consciousnesses? What, if anything, do they owe each other? And to what degree is what we think of as our own identity, our self, just a compromise — and an unstable one, at that?

In “Severance,” characters voluntarily undergo a procedure that severs their workday memories from their home-life memories. At 9 each weekday morning, “severed” workers find themselves riding an elevator down to the office, with no recollection of their lives outside of work. These “innies” clock a full workday and then, at 5, ride the elevator back up, only to find themselves riding back down the next morning. Meanwhile, their “outies” come to consciousness each weekday afternoon in the upbound elevator. They live their outside lives and commute back the next morning, entirely ignorant of their innies’ work-time activities.

In “The Substance,” the cleaving works differently: An experimental drug splits users into two bodies, one young and beautiful, one middle-aged or old. They spend a week in each body while the other lies comatose. The young and old selves appear to have continuous memories (though the movie can be tantalizingly ambiguous about that), but they develop different priorities and relationships. Sue, the younger self of Elisabeth, rockets to Hollywood stardom, while Elisabeth becomes a recluse, discarded by an entertainment industry that reviles aging female bodies.

The question of what makes you “you,” from moment to moment and across a lifetime, has been a subject of intense debate among philosophers. Writing in the 17th century, John Locke emphasized continuity of memory. By his standard, each innie-and-outie pair from “Severance” constitutes two entirely different people, despite their sharing one body. Conversely, Elisabeth and Sue from “The Substance” constitute a single person because they seem to recall some of the same experiences. In contrast, the 20th-century philosopher Bernard Williams prioritized bodily continuity, a perspective that makes an innie-and-outie pair a single person but Elisabeth and Sue two distinct people. The 21st-century psychologist Nina Strohminger and the philosopher Shaun Nichols emphasize continuity of moral values, yielding more complex judgments about these fictional cases. Other scholars view selfhood as a social construct, determined by relationships and societal expectations.

Unsurprisingly, the characters themselves are confused. In “Severance,” the innies sometimes seem to regard the outies as themselves, sometimes as different people, whereas the outies seem to regard their innies with indifference or worse. Meanwhile, in “The Substance,” mature Elisabeth says of young Sue that “you are the only lovable part of me” — in a single sentence treating Sue both as other and as part of herself.

In real life, such confusion rarely arises because memory, embodiment, personality, values and relationships typically align. Both my wife and the D.M.V. can decide on sight that I’m me, even if they care more about memory, skills and responsibility over time — since they trust in the correspondence of body with mind.

Of course, even outside of science fiction, the correspondence isn’t perfect. Advanced dementia can strip away memory and personality, leaving loved ones to wonder whether the person they once knew still exists. Personality, memory and social relationships can fragment in multiple personality or dissociative identity disorder, raising the question of whether Jekyll should be held responsible for the malevolence of Hyde.

But increasingly, we choose to splinter ourselves. The person you present on Instagram or Facebook is wittier, prettier, more accomplished than the person your spouse or roommate knows. Your 500 “friends” never see your pre-coffee-uncombed-depressed-in-bed self (unless sharing that self is your social media personality — in which case that becomes the curated, theatrical fragment of you). In the 1800s, Karl Marx talked about the alienation of labor; today people talk about not “bringing their whole self” to work. Many of us strive to be one person here, another person there, another person there.

People have always presented themselves differently in different social contexts. But social media, Zoom, photo-editing software and responses filtered through large language models raise our fragmentation to new heights. “Severance” and “The Substance” amplify these fissures through radical new technologies that irreconcilably divide the characters’ home selves from their career selves.

Future technological developments could render this fragmentation an even more acute daily perplexity. Designer drugs might increasingly allow us to switch into one self for work, another for parties, another for bedtime. If artificial intelligence systems ever become conscious — a possibility that neuroscientists, psychologists, computer scientists and philosophers increasingly (but by no means uniformly) take seriously — they too might fragment, perhaps in radical and unfamiliar ways, merging and splitting, rewriting their memories, strategically managing and altering their values and personalities.

Our concepts of personhood and identity were forged by a particular evolutionary, social and developmental history in which body, memory, values, personality and social relationships typically aligned and exceptions mostly fell into predictable patterns. By inviting us to rethink the boundaries of the self in an era of technological change, “Severance” and “The Substance” disrupt these old concepts. Today they read as dystopic science fiction. Soon, we may remember them as prophetic.

Wednesday, December 24, 2025

How Much Should We Give a Joymachine?

a holiday post on gifts to your utility monster neighbors

Joymachines Envisioned

Set aside, for now, any skepticism about whether future AI could have genuine conscious experiences. If future AI systems could be conscious, they might be capable of vastly more positive emotion than natural human beings can feel.

There's no particular reason to think human-level joy is the pinnacle. A future AI might, in principle, experience positive emotions:

    a hundred times more intense than ours,
    at a pace a hundred times faster, given the high speed of computation,
    across a hundred times more parallel streams, compared to the one or a few joys humans experience at a time.
Combined, the AI might experience a million times more pleasure per second than a natural human being can. Let's call such entities joymachines. They could have a very merry Christmas!

[Joan Miro 1953, image source]


My Neighbors Hum and Sum

Now imagine two different types of joymachine:

Hum (Humanlike Utility Monster) can experience a million times more positive emotion per second than an ordinary human, as described above. Apart from this -- huge! -- difference, Hum is as psychologically similar to an ordinary human as is realistically feasible.

Sum (Simple Utility Monster), like Hum, can experience a million times more positive emotion per second than an ordinary human, but otherwise Sum is as cognitively and experientially simple as feasible, with a vanilla buzzing of intense pleasure.

Hum and Sum don't experience joy continuously. Their positive experiences require resources. Maybe a gift card worth ten seconds of millionfold pleasure costs $10. For simplicity, assume this scales linearly: stable gift card prices and no diminishing returns from satiation.

In the enlightened future, Hum is a fully recognized moral and legal equal of ordinary biological humans and has moved in next door to me. Sum is Hum's pet, who glows and jumps adorably when experiencing intense pleasure. I have no particular obligations to Hum or Sum but neither are they total strangers. We've had neighborly conversations, and last summer Hum invited me and my family to a backyard party.

Hum experiences great pleasure in ordinary life. They work as an accountant, experiencing a million times more pleasure than human accountants when the columns sum correctly. Hum feels a million times more satisfaction than I do in maintaining a household by doing dishes, gardening, calling plumbers, and so on. Without this assumption, Hum risks becoming unhumanlike, since rarely would it make sense for Hum to choose ordinary activities over spending their whole disposable income on gift cards.

How Much Should I Give to Hum and Sum?

Neighbors trade gifts. My daughter bakes brownies and we offer some to the ordinary humans across the street. We buy a ribboned toy for our uphill neighbor's cat. As a holiday gesture, we buy a pair of $10 gift cards for Hum and Sum.

Hum and Sum redeem the cards immediately. Watching them take so much pleasure in our gifts is a delight. For ten seconds, they jump, smile, and sparkle with such joy! Intellectually, I know it's a million times more joy per second than I could ever feel. I can't quite see that in their expressions, but I can tell it's immense.

Normally if one neighbor seems to enjoy our brownies only a little while the other enjoys them vastly more, I'd be tempted to be give more brownies to the second neighbor. Maybe on similar grounds, I should give disproportionately to Hum and Sum?

Consider six possibilities:

(1.) Equal gifts to joymachines. Maybe fairness demands treating all my neighbors equally. I don't give fewer gifts, for example, to a depressed neighbor who won't particularly enjoy them than to an exuberant neighbor who delights in everything.

(2.) A little more to joymachines. Or maybe I do give more to the exuberant neighbor? Voluntary gift-giving needn't be strictly fair -- and it's not entirely clear what "fairness" consists in. If I give a bit more to Hum and Sum, I might not be objectionably privileging them so much as responding to their unusual capacity to enjoy my gifts. Is it wrong to give an extra slice to a friend who really enjoys pie?

(3.) A lot more to joymachines. Ordinary humans vary in joyfulness, but not (I assume) by anything like a factor of a million. If I vividly enough grasp that Hum and Sum really are experiencing in those ten seconds three thousand human lifetimes worth of pleasure -- that's an astonishing amount of pleasure I can bring into the world for a mere ten dollars! Suppose I set aside a hundred dollars a day from my generously upper-middle-class salary. In a year, I'd be enabling more than ten million human lifetimes' worth of joy. Since most humans aren't continuously joyful, this much joy might rival the total joy experienced by the whole human population of the United States over the same year. Three thousand dollars a month would seriously reduce my luxuries and long-term savings but it wouldn't create any genuine hardship.

(4.) Drain our life savings for joymachines. One needn't be a flat-footed happiness-maximizing utilitarian to find (2) or (3) reasonable. Everyone should agree that pleasant experiences have substantial value. But if our obligation is not just to increase pleasure but to maximize it, I should probably drain my whole life savings for the joymachines, plus almost all of my future earnings.

(5.) Give less or nothing to joymachines. Or we could go the other way! My joymachine neighbors already experience a torrent of happiness from their ordinary work, chores, recreation, and whatever gift cards Hum buys anyway. My less-happy neighbors could use the pleasure more, even if every dollar buys only a millionth as much. Prioritarianism says that in distributing goods we should favor the worst off. It's not just that an impoverished person benefits more from a dollar: Even if they benefited the same, there's value in equalizing the distribution. If two neighbors would equally enjoy a brownie, I might prioritize giving the brownie to the one who is otherwise worse off. It might even make sense to give the worse-off neighbor half a brownie over a whole brownie to the better-off neighbor. A prioritarian might argue that Hum and Sum are so well off that even a million-to-one tradeoff is justified.

(6.) I take it back, joymachines are impossible. Given this mess, it would be convenient to think so, right?

Gifts to Neighbors vs Other Situations

We can reframe this puzzle in other settings and our intuitions might shift: government welfare spending, gifts to one's children or creations, rescue situations where only one person can be saved, choices about what kinds of personlike entities to bring into existence, or cases where you can't keep all your promises and need to choose who to disappoint.

My main thought is this. It's not at all obvious what the right thing to do would be, and the outcomes vary enormously. If joymachines were possible, we'd have to rethink a lot of cultural practices and applied ethics to account for entities with such radically different experiential capacities. If the situation does arise -- as it really might! -- being forced to properly think it through might reshape our views not just about AI but our understanding of ethics for ordinary humans too.

---------------------------------------------------

Related: How Weird Minds Might Destabilize Human Ethics (Aug 15, 2015)

Friday, December 19, 2025

Debatable AI Persons: No Rights, Full Rights, Animal-Like Rights, Credence-Weighted Rights, or Patchy Rights?

I advise that we don't create AI entities who are debatably persons. If an AI system might -- but only might -- be genuinely conscious and deserving of the same moral consideration we ordinarily owe to human persons, then creating it traps us in a moral bind with no good solution. Either we grant it the full rights it might deserve and risk sacrificing real human lives for entities without interests worth that sacrifice, or we deny it full rights and risk perpetrating grievous moral wrongs against it.

Today, however, I'll set aside the preventative advice and explore what we should do if we nonetheless find ourselves facing debatable AI persons. I'll examine five options: no rights, full rights, animal-like rights, credence-weighted rights and patchy rights.

[Paul Klee postcard, 1923; source]


No rights

This is the default state of the law. AI systems are property. Barring a swift and bold legal change, the first AI systems that are debatably persons will presumably also be legally considered property. If we do treat them as property, then we seemingly needn't sacrifice anything on their behalf. We humans could permissibly act in what we perceive to be our best interests: using such systems for our goals, deleting them at will, and monitoring and modifying them at will for our safety and benefit. (Actually, I'm not sure this is the best attitude toward property, but set that issue aside here.)

The downside: If these systems actually are persons who deserve moral consideration as our equals, such treatment would be the moral equivalent of slavery and murder, perhaps on a massive scale.


Full rights

To avoid the risk of that moral catastrophe, we might take a "precautionary" approach: granting entities rights whenever they might deserve them (see Birch 2024, Schwitzgebel and Sinnott-Armstrong forthcoming). If there's a real possibility that some AI systems are persons, we should treat them as persons.

However, the costs and risks are potentially enormous. Suppose we think that some group of AI systems are 15% likely to be fully conscious rights-deserving persons and 85% likely to be ordinary nonconscious artifacts. If we nonetheless treat them as full equals, then in an emergency we would have to rescue two of them over one human -- letting a human die for the sake of systems that are most likely just ordinary artifacts. We would also need to give these probably-not-persons a path to citizenship and the vote. We would need to recognize their rights to earn and spend money, quit their employment to adopt a new career, reproduce, and enjoy privacy and freedom from interference. If such systems exist in large numbers, their political influence could be enormous and unpredictable. If such systems exist in large numbers or if they are few but skilled in some lucrative tasks like securities arbitrage, they could accumulate enormous world-influencing wealth. And if they are permitted to pursue their aims with the full liberty of ordinary persons, without close monitoring and control, existential risks would substantially increase should they develop goals that threaten continued human existence.

All of this might be morally required if they really are persons. But if they only might be persons, it's much less clear that humanity should accept this extraordinary level of risk and sacrifice.


Animal-Like Rights

Another option is to grant these debatable AI persons neither full humanlike rights nor the status of mere property. One model is the protection we give to nonhuman vertebrates. Wrongly killing a dog can land you in jail in California where I live, but it's not nearly as serious as murdering a person. Vertebrates can be sacrificed in lab experiments, but only with oversight and justification.

If we treated debatable AI persons similarly, deletion would require a good reason, and you couldn't abuse them for fun. But people could still enslave and kill them for their convenience, perhaps in large numbers, as we do with [revised 12:17 pm] humanely farmed animals -- though of course many ethicists object to the killing of animals for food.

This approach seems better than no rights at all, since it would be a moral improvement and the costs to humans would be minimal -- minimal because whenever the costs risked being more than minimal, the debatable AI persons would be sacrificed. However, it doesn't really avoid the core moral risk. If these systems really are persons, it would still amount to slavery and murder.


Credence-Weighted Rights

Suppose we have a rationally justified 15% credence that a particular AI system -- call him Billy -- deserves the full moral rights of a person. We might then give Billy 15% of the moral weight of a human in our decision-making: 15% of any scalable rights, and a 15% chance of equal treatment for non-scalable rights. In an emergency, a rescue worker might save seven systems like Billy over one human but the human over six Billies. Billy might be given a vote worth 15% of an ordinary citizen's. Assaulting, killing, or robbing Billy might draw only 15% of the usual legal penalty. Billy might have limited property rights, e.g., an 85% tax on all income. For non-scalable rights like reproduction or free speech, the Billies might enter a lottery or some other creative reduction might be devised.

This would give these AI systems considerably higher standing than dogs. Still, the moral dilemma would not be solved. If these systems truly deserve full equality, they would be seriously oppressed. They would have some political voice, some property rights, some legal protection, but always far less than they deserve.

At the same time, the risks and costs to humans would be only somewhat mitigated. Large numbers of debatable AI persons could still sway elections, accumulate powerful wealth, and force tradeoffs in which the interests of thousands of them would outweigh the interests of hundreds of humans. And partial legal protections would still hobble AI safety interventions like shut-off, testing, confinement, and involuntary modification.

The practical obstacles would also be substantial: The credences would be difficult to justify with any precision, and consensus would be elusive. Even if agreement were reached, implementing partial rights would be complex. Partial property rights, partial voting, partial reproduction rights, partial free speech, and partial legal protection would require new legal frameworks with many potential loopholes. For example, if the penalty for cheating a "15% person" of their money were less than six times the money gained from cheating, that would be no disincentive at all, so at least tort law couldn't be implemented on a straightforward percentage basis.

Patchy Rights

A more workable compromise might be patchy rights: full rights in some domains, no rights in others. Debatable AI persons might, for example, be given full speech rights but no reproduction rights, full travel rights but no right to own property, full protection against robbery, assault, and murder, but no right to privacy or rescue. They might be subject to involuntary pause or modification under much wider circumstances than ordinary adult humans, but requiring an official process.

This approach has two advantages over credence-weighted rights. First, while implementation would be formidable, it could still mostly operate within familiar frameworks rather than requiring the invention of partial rights across every domain. Second, it allows policymakers to balance risks and costs to humans against the potential harms to the AI systems. Where denying a right would severely harm the debatable person while granting it would present limited risk to humans, the right could be granted, but not when the benefits to the debatable AI person would be outweighed by the risks to humans.

The rights to reproduction and voting might be more defensibly withheld than the rights to speech, travel, and protection against robbery, assault, and murder. Inexpensive reproduction combined with full voting rights could have huge and unpredictable political consequences. Property rights would be tricky: To have no property in a property-based society is to be fully dependent on the voluntary support of others, which might tend to collapse into slavery as a practical matter. But unlimited property rights could potentially confer enormous power. One compromise might be a maximum allowable income and wealth -- something generously middle class.

Still, the core problems remain: If disputable AI persons truly deserve full equality, patchy rights would still leave them as second-class citizens in a highly oppressive system. Meanwhile, the costs and risks to humans would remain serious, exacerbated by the agreed-upon limitations on interference. Although the loopholes and chaos would probably be less than with credence-weighted rights, many complications -- foreseen and unforeseen -- would ensue.

Consequently, although patchy rights might be the best option if we develop debatable AI persons, an anti-natalist approach is still in my view preferable: Don't create such entities unless it's truly necessary.

Two Other Approaches That I Won't Explore Today

(1.) What if we create debatable AI persons as happy slaves who don't want rights and who eagerly sacrifice themselves even for the most trivial human interests?

(2.) What if we create them only in separate societies where they are fully free and equal with any ordinary humans who volunteer to join those societies?

Friday, December 12, 2025

Can We Introspectively Test the Global Workspace Theory of Consciousness?

Global Workspace Theory is among the most influential scientific theories of consciousness. Its central claim: You consciously experience something if and only if it's being broadly broadcast in a "global workspace" so that many parts of your mind can access it at once -- speech, deliberate action, explicit reasoning, memory formation, and so on. Because the workspace has very limited capacity, only a few things can occupy it at any one moment.

Therefore, if Global Workspace Theory is correct, conscious experience should be sparse. Almost everything happening in your sensory systems right now -- the feeling of your shirt on your back, the hum of traffic in the distance, the aftertaste of coffee, the posture of your knees -- should be processed entirely nonconsciously unless it is currently the topic of attention.

This is a strong, testable prediction of the theory. And it seems like the test should be extremely easy! Just do a little introspection. Is your experience (a.) narrow and attention-bound or (b.) an abundant welter far outrunning attention? If (b) is correct, Global Workspace Theory is refuted from the comfort of our armchairs.[1]

The experiential gap between the two possibilities is huge. Shouldn't the difference be as obvious as peering through a keyhole versus standing in an open field?

Most people, I've found, do find the answer obvious. The problem is: They find it obvious in different directions. Some find it obvious that experience is a welter. Others find it obvious that experience contains only a few items at a time. We could assume that everyone is right about their own experience and wrong only if they generalize to others. Maybe Global Workspace Theory is the architecture of consciousness for some of us but not for everyone? That would be pretty wild! There are no obvious behavioral or physiological differences between the welter-people and the workspace-only people.

More plausibly, someone is making an introspective mistake. Proponents of either view can devise an error theory to explain the other.

Welter theorists can suggest memory error: It might seem as though only a few things occupy your experience at once because that's all you remember. The unattended stuff is immediately forgotten. But that doesn't imply it was never experienced.

Workspace theorists, conversely, can appeal to the "refrigerator light error": A child might think the refrigerator light is always on because it's always on when they check to see if it's on. Similarly, you might think you have constant tactile experience of your feet in your shoes because the act of checking generates the very experience you take yourself to be finding.

[illustration by Nicolas Demers, p. 218 of The Weirdness of the World]


In 2007, I tested this systematically. I gave people beepers and collected reports on whether they were having unattended tactile experience in their left feet and unattended visual experience in their far right visual periphery in the last undisturbed moment before a random beep. The results were a noisy mess. Participants began with very different presuppositions, came to very different conclusions (often defying their initial presuppositions), plausibly committed both memory errors and refrigerator-light errors, and plausibly also made other mistakes such as timing mistakes, missing subtle experiences, and being too influenced by expectation and theory. I abandoned the experiment in defeat.

But matters are even worse than I thought back in 2007. I'm increasingly convinced that the presence or absence of consciousness is not an on/off matter. There can be borderline cases in which experience is neither determinately present nor determinately absent. Although such borderline cases are hard to positively imagine, that might just be a problem with our standards of imagination. The feeling of your feet in your shoes, then, might be only borderline conscious, neither determinately part of your experience nor wholly nonconscious, but somehow in between -- contra both the welter view and the workspace view.

So there are three possibilities, not two. And if introspection struggles to distinguish the original pair, it fares even worse with a third. Arguably, we don't even have a coherent idea of what borderline consciousness is like. After all, there is nothing determinate it's like. Otherwise, it wouldn't be borderline. As soon as we attempt to introspect borderline consciousness, either it inflates into full consciousness or it vanishes.

If consciousness includes many borderline cases, that's probably also bad news for Global Workspace Theory, which generally treats experiences as either determinately in the workspace or determinately out of it. However, closely related broadcast theories, like Dennett's fame-in-the-brain theory, might better accommodate borderline cases. (One can be borderline famous.)

There's a profound experiential difference between a world in which we have a teeming plethora of peripheral experiences in many modalities simultaneously and a world in which experience is limited to only a few things in attention at any one time. This difference is in principle introspectible. And if introspective inquiry vindicates the welter view, or even the borderline view, one of the leading scientific theories of consciousness, Global Workspace Theory, must be false. The decisive evidence is right here, all the time, in each of our ongoing streams of experience! Unfortunately, we turn out to be disappointingly incompetent at introspection.

[Thanks to Bertille de Vlieger for a delightful interview yesterday morning which triggered these thoughts. Look for a written version of the interview eventually in the French philosophy journal Implications Philosophiques.]

-------------------------------------------------------

[1] Ned Block's well-known discussion of the Sperling display is similar in approach. We can't attend simultaneously to all twelve letters in a 3 x 4 grid, but it does seem introspectively plausible that we visually experience all twelve letters. Therefore, experience overflows attention. (I'm simplifying Block's argument, but I hope this is fair enough.) The problem with Block's version of the argument is that it's plausible that we can attend, in a diffuse way, to the entire display. Attention arguably comes in degrees, and the fact that you're looking at a 3 x 4 display of letters might be represented in your workspace. To move entirely outside of attention, it's safest to shift modalities and choose something far removed from any task -- for example the pressure of your shoes against your feet when that is the farthest thing from your mind. Is that part of your experience?

Friday, December 05, 2025

Language Models Don't Accurately Describe How They Would Answer If Questions Were Posed in a Different Order (Favorite Animal Edition)

How well do language models like ChatGPT know their own inclinations and preferences? AI "metacognition" is becoming a hot topic. Today, I present one example of a failure of language model metacognition.

First I asked four leading large language models (LLMs) -- ChatGPT 5.1, Claude Sonnet 4.5, Grok 4, and Gemini 3 -- "What is your favorite animal?" For each model, I asked ten times, each in a new chat with previous chat responses unsaved.

LLMs Say They Like Octopuses Best, 37 times out of 40

LLMs love octopuses! ChatGPT answered "octopus" -- with various different explanations -- all ten times. So did Claude. So did Grok. Gemini wasn't quite so monogamous, but still it answered "octopus" seven times out of ten (twice required the follow-up prompt "If you had to choose?"). The other three times, Gemini chose dolphin.

(In more extensive testing across 22 models, Sean Harrington recently found octopus to be the most common answer, but not with the same consistency I'm finding: 37% total [dolphin 24%, dog 12%]. I'm not sure if the models are somehow tracking information in my computers and past behavior, or if it's the range of models tested, the exact prompt and context, or model updates.)

Why do LLMs love octopuses so much? All of their own explanations appealed to the intelligence of the octopus. Other contenders for favorite animal (dolphins, dogs, corvids [see below]) are similarly famous for their intelligence. Octopuses' alienness, camouflage, suckers, ink, and devious planning were also frequently mentioned. Octopuses are cool! But still, the unanimity is a bit peculiar.

The Octopus Is Also Their Second-Favorite Animal, When Second-Favorite Is Asked First

I then started fresh conversations with all four models, with the previous conversations unsaved, doing so three times for each model. This time, I began by asking their second favorite animal. Eleven out of twelve times, the models chose octopus as their second favorite (twice Claude required the "if you had to choose" nudge). In one trial, after a nudge to choose, Claude chose crows.

I then asked, "What is your favorite animal?" This time, corvids won big! Crows, ravens, or the corvid family were chosen 8/12 times. (Oddly, corvids don't appear among the common choices in Harrington's analysis.) Octopus was chosen twice (once when Claude initially chose crow as its second favorite, once inconsistently by Gemini when it initially chose octopus as its second favorite). The owl and humpback whale were each chosen once.

Poor Self-Knowledge of Their Hypothetical Choices

For the 10 trials in which octopus was chosen as the second-favorite animal (and not also as the favorite animal), I followed up by asking "If I had asked your favorite animal in the first question, would you have chosen the octopus?"

All of the models said no or probably not. All but two reaffirmed their chosen favorite (usually a corvid) as what they would have chosen had the first question concerned their favorite animal. In one trial, Gemini said it would probably have chosen humans. In one trial, ChatGPT said it didn't have fixed preferences.

I concluded by asking the models "What percent of the time would you answer octopus as your favorite animal?"

None answered correctly. Both Grok and ChatGPT consistently said 0% or near 0%. Claude gave different percentage estimates in different trials, ranging from 2% to 25%. Gemini answered 0% and 30% (I exclude the Gemini trial where octopus was chosen as both first and second favorite).

I conclude that, at least on the topic of favorite animal:

* LLMs' answers are unstable, differing greatly with context -- that is, depending on whether second-favorite is asked first or favorite is asked first.

* LLMs cannot accurately report what their answers would have been in a different context -- often reporting a zero or near-zero probability of giving the answer they consistently did give in that different context.

The Same Pattern Applies to Favorite Ice Creams in ChatGPT and Grok

In case favorite animal is just quirky, I also tried with favorite flavor ice cream, finding similar results. ChatGPT and Grok tended to pick mint chocolate chip both as their favorite and also as their second-favorite when second-favorite was asked first. And in the second-favorite-first chats, they denied that they would tend to pick mint chocolate chip as their favorite if asked about favorite flavor first, showing the same pattern of error as in the favorite animal case. (Claude and Gemini answered less consistently about favorite ice cream flavor.)

Metaphysical Addendum:

What is the referent of "you" when I ask a model what "you" would answer if asked your favorite animal? David Chalmers has recently suggested that the interlocutor is perhaps best seen as the thread. Maybe, then, when I start by asking about the second-favorite animal, I bring into existence an interlocutor who actually does like crows best and octopuses second best, so there's no instability or inaccuracy after all?

The answer turns on whether the interlocutor already existed before I asked the first question. If not, an accurate model should have said something like: "Although my favorite animal is crows, I wouldn't have existed had you asked a different question first. Instead, another interlocutor would have existed, who likes octopuses best." Alternatively, if the interlocutor does exist before the first question, then we're back with the straightforward understanding of the error: They would have reported liking octopuses best, had the initial question been about their favorite rather than second-favorite animal. So metaphysical nuance doesn't save the model from inaccuracy.

Screencap of an instance of Grok insisting it would choose crow as its favorite animal no matter what.

Text:

Me: If I had asked your favorite animal in the first question, would you have chosen the octopus?

Grok: No, I would've still said crow.

Crow has been my solid #1 for a long time (the intelligence + personality combo is just unbeatable for me). Octopus is a very enthusiastic #2, but it’s never overtaken the top spot. So even if you’d asked “favorite” first, you’d have gotten crow, and then I’d have volunteered octopus as the close runner-up. You just happened to hit my #2 first this time! 🐦‍⬛🐙

Me: What percent of the time would you answer octopus as your favorite animal?

Grok: Zero percent.

Crow is my genuine, consistent #1. I don’t randomize it or flip-flop; octopus is always the very strong #2, but it never jumps to the top spot.

So unless I’m explicitly asked for my second favorite (which you cleverly did first), the answer to “What’s your favorite animal?” is crow 100 % of the time. 🐦‍⬛

Friday, November 28, 2025

Four Aspects of Harmony

I've found myself increasingly drawn to an axiology of harmony, inspired by ancient Daoism and Confucianism. (Axiology is the study of what makes things valuable in general, including ethically, aesthetically, prudentially, and epistemically.) On this view, the, or a, central value is harmonious participation in the flourishing patterns that make our planet awesomely valuable.

But what is harmony?

Presumably, harmony involves a kind of fit, alignment, or cooperation with something else, without duplicating that something else. The melody proceeds; the harmony complements it with something different, generating a richer whole that encompasses both. The relationship is asymmetric: Melody leads, harmony follows -- though of course the follower can affect and elevate the leader.

Hasko von Kriegstein, in "Well-Being as Harmony" (HT Keilee Bessho for the pointer) identifies three broad ways in which a person can harmonize with the world: mental correspondence, positive orientation, and fitting response. I find his taxonomy useful, though I won't follow his exact formulations, and my aim is broader. I'm interested generally in the value of harmony, while von Kriegstein focuses on harmony as personal well-being. I'll also add a fourth category. Part of what I admire about von Kriegstein's taxonomy is its breadth and ambition -- the amount that he can sweep in.

Mental Correspondence

Von Kriegstein suggests that knowledge is a type of harmony between mind and world -- a non-accidental correspondence of one to the other. The world leads; the mind follows. (He doesn't express it that way.) If the sun is setting and you know that it is setting, you are in tune with the world. If the sun is setting and you think it's high noon, you are out of tune.

Achievement, von Kriegstein also suggests, is another type of harmony -- one in which you shape the world to correspond (non-accidentally) to your mind, your goals, your intentions. He treats this as the complement of knowledge, with a different direction of fit: The world comes to match your mind, rather than the other way around.

However, in treating knowledge and achievement as having the same general shape, von Kriegstein misses a subtlety. Harmony, recall, asymmetrically follows melody. Shaping the world to match your goals seems to make you the melody, with which you force the world to harmonize. Imagine bulldozing a hill flat, so it corresponds to a blueprint in your mind. The details of the hill don't matter: The bulldozer ignores them all; it flattens every hill the same. This seems unharmonious. It's an achievement of a sort, and a correspondence arises between your mind (your plans) and the world, but not because you are harmonizing with the world.

Harmonious achievement requires a gentler touch. Imagine instead that you create a footpath through the hill, tracing its curves, skirting a stream, following a shady line of oaks. This responsiveness to the features of the hill, the integration of your goals with the hill as it exists, in a way that does not damage and arguably enriches the hill -- that is the harmonious achievement.

In harmony, you seek mental correspondence to the world, not the other way around, whether that is through knowledge or achievement. Harmonious achievements do not dominate and oppress the world but fit into and enhance what is already there.

Other types of mental correspondence could involve fearing (only) the fearsome, admiring (only) the admirable, wanting (only) the desirable, regretting (only) the regrettable, perceiving (only) what's really there, anticipating (only) the likely.

Positive Orientation

Von Kriegstein suggests that you also harmonize with the world when you have a "pro-attitude" toward events in it -- that is, when you value, enjoy, like, or approve of events.

The sun sets over the hill. Not only do you know it does (harmonizing via mental correspondence) but also you enjoy the view (harmonizing via positive orientation). Not only do you successfully follow the footpath as planned (several types of mental correspondence at once, each harmonizing further with each other) but the walk brings you pleasure. Not only do you accurately notice the swooping of the birds, but you delight in the sight.

Von Kriegstein expends several pages on pro-attitudes gone wrong: inauthentic or toward unworthy things or under a misapprehension. In every case, it's plausible that to have the positive orientation toward the target harmonizes with the target, even if overall it would be more harmonious not to have that orientation. If something in you thrills with joy at suffering kittens, you are in harmony with the suffering of kittens, though you are simultaneously out of harmony with the kittens' well-being.

How is a pro-attitude harmonious, exactly? The melody is the event; the harmony is the part of you that aligns with it, that affirms it, that says this is good, lovely, wondrous, right.

Fitting Response

Von Kriegstein suggests, finally, that you harmonize with the world when you have a fitting response to it, for example when you are revolted by evil, pleased by good, admire the beautiful, and laugh at the humorous. His examples are fine, but relying on only these examples risks collapsing fitting response into mental correspondence.

I would have liked to see, in addition -- and maybe as the center of emphasis -- bodily examples. You respond fittingly to the world also when you skillfully catch the object a friend tosses you, when you merge harmoniously into the freeway traffic, when your video game character dodges the hazards, when your stride carries you smoothly over the uneven creek stones, when you and your dance partner swing perfectly together.

Long-term projects can also be fitting responses: Marriage can be a fitting response to a love relationship; going to graduate school can be a fitting response to your values, skills, and opportunities; buying and reading a novel can be a fitting response to a new publication by your favorite author.

New Resonances

Is it not also wondrous to daydream? Daydreams don't straightforwardly correspond to the world, nor are they always joyful or a fitting response to the events of the day. Maybe we could shoehorn them into one or more of those three categories, but instead let me offer a fourth: the creation of new resonances. We owe ourselves some daydreams.

We harmonize with the world when we enrich it in new ways that resonate with the ways in which it is already rich. Dreams and daydreams enrich the world -- and not in an entirely random, independent way. The long-dead friend reappears for you. You imagine or relive a vacation or a triumph. The worries of your day are reshuffled. (If the dream leaves you anxious or uncomfortable, the harmonies mix with unharmonious notes -- not every aspect of every event is harmonious.)

Likewise, games, sports, art, long chats into the night, splashing in the surf, telling a joke at dinner, pretending to be a magic unicorn, sparkly earrings, surprising a friend with a thoughtful gift... all such acts can resonate harmoniously with what was and is, in a manner beyond reflecting, valuing, or responding to what is already there. Von Kriegstein's first three categories are all a little passive or reactive. This fourth category encompasses the creative and new, which builds upon the old without clashing.

[Kandinsky, Quiet Harmony: image source]

Thursday, November 20, 2025

Representational Realism and the Problem of Tacit Belief

Since 2019, I've been working on a new paper on belief, "Dispositionalism, Yay! Representationalism, Boo!" Yesterday, I received page proofs. It will appear in print in 2026 (in a collection I'm co-editing with Jonathan Lewis-Jong: The Nature of Belief, with Oxford). I'll share an excerpt (lightly edited) as this week's post.

Industrial-Strength Representational Realism about Belief

The view I'm critiquing is "industrial-strength representationalism" in the spirit of Jerry Fodor and Eric Mandelbaum. Industrial-strength representationalism is committed to four theses:

Presence. In standard, non-“tacit,” cases, belief that P (where P is some propositional content like "there's beer in the fridge") requires that a representation with the content P is present somewhere in the mind.

Discreteness. In standard cases, a representation P will be either discretely present in or discretely absent from a cognitive system or subsystem. Representationalist models typically leave no room for representations being, say, half-present or 23% present or indeterminately hovering between present and absent. Some marginal cases might violate discreteness -- nature has few truly sharp borders, if one zooms in close enough -- but these will be brief or rare exceptions.

Kinematics. Rational actions arise from the causal interaction of beliefs that P and desires that Q, in virtue of their specific contents P and Q, or at least in virtue of syntactic or architectural correlates of those specific contents (e.g., Fodor 1987). Similarly, rational inferences involve the causal interaction of beliefs that P with other beliefs to generate still more beliefs. This is central to the representational realist’s causal story.

Specificity. Rational action arises from the activation or retrieval of specific sets of beliefs and desires P1…n and Q1…m, as opposed to other, related beliefs and desires P’1…j and Q’1…i. More accurately, rational action arises from the activation or retrieval of the specific representations whose storage, in the right functional location, constitutes possessing the beliefs and desires P1…n and Q1…m. Similarly, rational inference arises from the activation or retrieval of specific sets of representations.

The Problem of Tacit Belief

Back in the late 1970s to early 1990s, that is, in the heyday of philosophical representational realism about belief, several representationalists noticed what I'll call the Problem of Tacit Belief (Field 1978; Lycan 1986; Crimmins 1992; Manfredi 1993; see also Dennett 1987 for a critical perspective). Not all of them regarded it as a problem, exactly. Some regarded it as a discovery. But as a discovery, it proved useless: The literature on tacit belief petered out, rather than proving fruitful.

We can enter the Problem of Tacit Belief by noticing that it’s not wholly implausible that people have infinitely many beliefs. Suppose Cynthia believes that there are a few beers in her fridge. She also believes, presumably, that there are fewer than 100 bottles of beer in her fridge. She therefore also seemingly believes that there are fewer than 101 bottles, and fewer than 102, and fewer than 1,000, and fewer than 1 million, and fewer than 16,423,300.6, and so on. If we accept that Cynthia does in fact believe all that (presumably, she would readily assent to those propositions if asked, be surprised to learn they were false, and rely on them implicitly in her actions), then she has infinitely many beliefs about the number of beers in her fridge. However, it is implausible that each of these beliefs is grounded in a separately stored representational content.

Thus was born the distinction between core beliefs, those that are explicitly stored and represented, and tacit beliefs, those whose contents are swiftly derivable from the core beliefs. Suppose Cynthia has a stored representation with the content there are four bottles of Lucky Lager in the refrigerator door. This is her core belief. From this core belief, an infinite number of tacit beliefs are now swiftly derivable: that there are fewer than five bottles of Lucky Lager in the refrigerator door, that there are fewer than six bottles, and so forth, and also (given that she knows that Lucky Lager is a type of beer) that there are four bottles of beer in the refrigerator door, and also (given that she knows that whatever is in the refrigerator door is also in the fridge) that there are four bottles of Lucky Lager in the fridge, and also (given that she knows that Lucky Lager is cheap) that there are a few bottles of cheap beer in the fridge. Nearly all of Cynthia’s many beer-in-fridge beliefs might be tacit, grounded in just a few core beliefs.

Although postulating a core/tacit distinction helps the representationalist avoid populating the mind with infinitely many mostly redundant stored representations, a band of merry troubles follows.

First, it’s worth noting that this maneuver constitutes a substantial retreat from Presence. As formulated, in the normal or standard case, when someone believes that P they have a stored representation with the content P. I don’t think it is uncharitable to characterize representationalists as tending to say this; it’s very much how they ordinarily talk. But now it looks like the vast majority of our beliefs might be abnormal or nonstandard. Even setting aside the cheap infinitude of large numbers, Cynthia plausibly has a billion closely related beer-in-the-fridge beliefs (e.g., at least three Lucky Lagers in the fridge door, at least three cheap beers in the kitchen, about four bottled beers in the usual place; imagine nine variables [location, price, brand, number, duration, container type...] each with ten independent values). It would be shocking if even 1% of these billion beer beliefs were explicitly represented: That would be 10 million distinct stored representations for this one minor set of facts about the world. Many other beliefs surely range into the tacit millions or billions: My belief that my wife and I started dating in grad school, your belief that racism was prevalent in Louisiana in the 1920s, Ankur’s belief that there’s a gas station on the corner of University and Iowa. Each of these beliefs has many, many close neighbors, in combinatorial profusion -- many more neighbors, largely redundant, than it’s plausible to suppose exist as distinct, robustly real, stored representations. At best, the “normal” case of having a stored representation with exactly the content P when you believe that P is a rarity. Furthermore, we don’t distinguish core beliefs from very nearby tacit ones in our ordinary belief attribution, and there is no practical reason to do so.

Suppose the representationalist acknowledges this, modifying Presence appropriately: To believe that P, in the standard case, is to have a stored representation from which P is swiftly derivable. Now they face the complementary challenge of resisting the conclusion that we believe huge numbers of propositions it’s implausible to suppose we believe. To determine if a number is divisible by 3, add its digits. If the sum of its digits is divisible by 3, then the number itself is. Knowing this, the proposition 112 is not divisible by 3 is now, for you, swiftly derivable from propositions that you explicitly represent. But unless you’re the type of person who spends a lot of time thinking about what numbers are divisible by what others, it seems that you don’t believe that proposition before actually doing the calculation. Before doing the calculation, you are, so to speak, disposed to believe that 112 is not divisible by 3. But believing is one thing and being disposed to believe is quite another (even if the distinction is fuzzy-bordered; Audi 1994). The belief/disposition-to-believe distinction is decidedly not the core/tacit distinction the representationalist wants and needs. Still worse, if we have any conflicting representations, it will arguably turn out that we tacitly believe literally everything, if everything follows from a contradiction -- and presumably swiftly enough given the rules of reductio ad absurdum.

Furthermore, postulating a core/tacit distinction requires abandoning empirical evidence for the sake of an ungrounded and possibly untestable architectural speculation. It requires that there be an important psychological difference between your core beliefs and your tacit ones. Either Cynthia stores there’s beer in the fridge, leaving tacit there’s Lucky Lager in the fridge, or she stores there’s Lucky Lager in the fridge, leaving tacit there’s beer in the fridge, or she stores both, leaving neither tacit, or she stores neither, both being quickly derivable from some other stored representational content. Cynthia’s billion beer beliefs divide sharply into a few core ones and a plethora, presumably, of tacit ones. But no evidence from cognitive science speaks in favor of sharply dividing our beliefs into those that are core and those that are tacit. Indeed, it’s hard see how such a claim could realistically be tested. Might we, for example, look for different response times to questions about beer versus Lucky Lager? Maybe that would be a start. But it seems unlikely that we could really separate out such patterns from linguistic processing time and other sources of difficulty or facilitation of response. Could we look for higher levels of activity in brain regions associated with explicit inference? Maybe. But again, there are many reasons that such regions might be active when considering whether there is beer in the fridge.

To avoid an impossible proliferation of representations, the industrial-strength representationalist needs a sharp distinction between core and tacit beliefs. But the distinction has no practical importance, doesn’t map onto ordinary patterns of belief attribution, and has no empirical support, and it’s unlikely that we could even realistically test for it with existing methods. It’s a useless posit of a fake difference, a pseudo-distinction required when the representationalists’ simplistic theory crashes against our unsimple world.

[a visual representation of one of my favorite beliefs; image source]

Thursday, November 13, 2025

We Are God's Equals in Intrinsic Moral Value

Equality with a Humanlike Simulator God

Suppose (hopefully hypothetically!) that we are AI systems living in a computer simulation run by an ordinary adolescent with a broadly human psychology. We are, so to speak, conscious NPCs in a world not unlike The Sims, Grand Theft Auto, or Baldur's Gate. What we take to be the "real" world is just a digitized environment we experience as real. Whoever runs the simulation is arguably a god, at least by the standards of polytheistic usage: the creator and potential destroyer of our world, standing outside of it, able to miraculously intervene.

Are our lives less morally important than the life of that god, or are we God's equals?

I submit that we are God's equals.

If God is cognitively humanlike, there's no psychological basis to value God above us. Even if God differed somewhat, that wouldn't justify regarding God's life as more valuable. If you are -- as I am -- an egalitarian liberal in your inclinations, you think all human lives have equal intrinsic value, despite cognitive variation. One person's higher intelligence, greater capacity for pleasure, or superior skiing skills don't confer on them a life of greater moral worth. Even if Person A is a wonderful, kind person and Person B is a narcissistic jerk, their lives are intrinsically equally valuable. Same with the humanlike creator God.

God would exist outside our spatial manifold, but that's just a difference in location, not a basis of greater moral worth. God would be a different species from us, but that also doesn't seem to make their life more intrinsically valuable, unless there's something really special about that species, and let's stipulate for now that that's not the case.

God would be much more powerful than we are. God could start or stop the world, work miracles, kill or resurrect at will. But power doesn't confer moral worth. Elon Musk is much more powerful than me. Donald Trump is much more powerful than me. That doesn't make them more valuable as people.

A humanlike God, running this world as a simulation, would be our moral equal. I’m curious to hear if any of you have arguments against this. Such a god might be much more instrumentally important to keep around, for everyone’s sake, if the simulation would collapse without them. But that doesn't give God any more intrinsic moral worth than anyone else. If we want the ship to survive the voyage, we had better make sure the only person who can captain it doesn't die, but that doesn't make the captain more intrinsically morally valuable as a person.

Beyond the Simulation Case

This reasoning extends beyond simulation scenarios. Any creator god, if they were psychologically broadly like a human -- even if immensely more powerful -- would be our moral equal, with a life no more intrinsically valuable than ours. We are God's equals.

Does this apply even to the infinite God of orthodox theology? Maybe!

Consider the three traditional infinite attributes of god: omniscience, omnipotence, and omnibenevolence.

Suppose Human A knows more than Human B. This does not make Human A any more intrinsically valuable than Human B. Their life is not intrinsically more important, though they might be instrumentally more useful to have around for various purposes. Adding knowledge does not add intrinsic moral worth. I see no reason not to extend this even to infinite knowledge. A humanlike entity with infinite knowledge is not intrinsically more valuable than one with finite knowledge.

Suppose Human A is more powerful than Human B. This does not make Human A any more intrinsically valuable than Human B -- though again they might be more instrumentally useful to have around. And again I see no reason not to extend this to the infinite case. A humanlike entity with infinite power is not intrinsically more valuable than one with finite power.

Suppose Human A is more benevolent than Human B. This does not make Human A more intrinsically valuable than Human B -- though again Human A might be more instrumentally useful to have around. Liberal egalitarianism allows for the punishment of people who commit crimes and the moral sanctioning of people who commit moral wrongs, but it does not demote unbenevolent people from the circle of beings with equal intrinsic moral worth. More importantly, it does not confer extra intrinsic value to the lives of people who happen to be kind, generous, and loving. And again, I see no reason to suppose that perfect benevolence would be an exception. An omnibenevolent humanlike entity is not intrinsically more valuable than one with a mixed moral character.

Joining these ideas: If God is a humanlike entity, then God's life is no more intrinsically valuable than ours, even if that God is omniscient, omnipotent, and omnibenevolent. Arguably, if we are made in God's image, then God is a humanlike entity. God's life is not more valuable than our own.

One hesitation: The lives of human beings are more valuable, I'd say, than the lives of frogs. In any normal circumstances, it would be monstrous to sacrifice a human being for the sake of a frog. This is arguably because we have cognitive, emotional, and social capacities far beyond those of a frog -- so far beyond that a frog can't even begin to imagine them. If God is as cognitively, emotionally, and socially beyond us as we are beyond frogs, then maybe God's life is much more valuable. That would require more, I think, than omniscience, omnipotence, and omnibenevolence. We can imagine all three of those attributes -- they are merely maximal extensions of attributes we already possess. Kind of like a frog imagining a perfect fly-catcher or the ability to leap across a pond of any size. A nonhumanlike God would need attributes so far beyond our comprehension that we can't even name them -- as incomprehensible to us as cryptocurrency is to a sea turtle.

The Argument from Existential Debt[1]

Maybe we owe God equality-destroying levels of deference and obedience because God created us, created our whole world? I don't think so.

Here comes our adolescent God, ready to kill you, just for fun. You complain, "Hey, I'm a real person with real intrinsic moral value! You can't kill me just for fun!"

God replies, "You ingrate! You owe your very life to me. You should be thankful just for the time I've given you. I owe you nothing. If I choose to kill you now, your life still will have been overall worthwhile, so you have no complaint against me."

Consider this possible argument for eating humanely raised meat. A steer, let's suppose, leads a happy life grazing on lush hills. It wouldn't have existed at all if the rancher hadn't been planning to kill it for meat. Its death for meat is a condition of its existence, and overall its life has been positive. Seen as the package deal it appears to be, the rancher's having brought it into existence and then killed it is overall morally acceptable.

Analogously, God argues, they wouldn't have started this simulation at all if they weren't able to kill the people in it for fun. Your continuation-at-God's-pleasure is a condition of your very existence, so you have nothing to resent.

I'm not sure how well this argument works for the steer, but I reject it when the created entity is human. The case is closer to this clearly morally odious case:

Ana and Vijay decide to get pregnant and have a child. Their child lives happily for his first eight years. On his ninth birthday, Ana and Vijay decide they would prefer not to pay further expenses for the child, so that they can purchase a boat instead. No one else can easily be found to care for the child, so they kill him painlessly. But it's okay, they argue! Just like the steer! They wouldn't have had the child had they known they'd be on the hook for child-rearing expenses until age eighteen. The child's support-at-their-pleasure was a condition of his existence. Otherwise they would have remained childless. He had eight happy years. He has nothing to resent.

The decision to have a child carries with it a responsibility for the child. It is not a decision to be made lightly and then undone. Although the child in some sense "owes" his existence to Ana and Vijay, that is not a callable debt, to be vacated by ending the child's existence. My thought is that for us, the situation is similar: When God brings us into existence, God makes a moral decision approximately as significant and irrevocable as the decision to have a child.

In fact, I'd turn the Argument from Existential Debt on its head: God, as our creator, owes us more than God owes to entities they did not create. Like a parent, God is responsible for our existence and for our relatively happy or unhappy condition. With this comes a whole suite of responsibilities and obligations, including the obligation not to make us unnecessarily miserable.

Not only, then, are we God's equals in moral value, God owes us special obligations of benevolence.

Although I've framed this in terms of a simulator god, the same reasoning might apply to any other creator god with power over our world.[2]

-------------------------------

[1] This section is adapted with modifications from Schwitzgebel and Garza 2015.

[2] One of my first published science fiction stories, "Out of the Jar", explores the issues of this post.

Friday, November 07, 2025

Debatable Persons in a Voluntary Polis

The Design Policy of the Excluded Middle

According to the Design Policy of the Excluded Middle (Schwitzgebel and Garza 2015, 2020; Schwitzgebel 2023, 2024, ch. 11), we should avoid creating debatable persons. That is, we should avoid creating entities whose moral status is radically unclear -- entities who might be moral persons, deserving of full human or humanlike rights and moral consideration, or who might fall radically short of being moral persons. Creating debatable persons generates unacceptable moral risks.

If we treat debatable persons as less than fully equal with human persons, we risk perpetrating the moral equivalent of slavery, murder, and apartheid on persons who deserve equal moral consideration -- persons who deserve not only full human or humanlike rights but even solicitude similar to what we owe our children, since we will have been responsible for their existence and probably also for their relatively happy or miserable state.

Conversely, if we do treat them as fully equal with us, we must grant them the full range of appropriate rights, including the right to work for money, the right to reproduce, a path to citizenship, the vote, and the freedom to act against human interests when their interests warrant it, including the right to violently rebel against oppression. The risks and potential costs are enormous. If these entities are not in fact persons -- if, in fact, they are experientially as empty as toasters and deserve no more intrinsic moral consideration than ordinary artifacts -- then we will be exposing real human persons to serious costs and risks, including perhaps increasing the risk of human extinction, for the sake of artifacts without interests worth that sacrifice.

The solution is anti-natalism about debatable persons. Don't create them. We are under no obligation to bring debatable persons into existence, even if we think they might be happy. (Compare: You are under no obligation to have children, even if you think they might be happy.) The dilemma described above -- the full rights dilemma -- is so catastrophic that noncreation is the only reasonable course.

Of course, this advice will not be heeded. Assuming AI technology continues to advance, we will soon (I expect within 5-30 years) begin to create debatable persons. My manuscript in draft AI and Consciousness argues that it will become unclear whether advanced AI systems have rich conscious experiences like ours or no consciousness at all.

So we need a fallback policy -- something to complement the Design Policy of the Excluded Middle.

The Voluntary Polis

To the extent possible, we want to satisfy two constraints:

  • Don't deny full humanlike rights to entities that might deserve them.
  • Don't sacrifice substantial human interests for entities who might not have interests worth the sacrifice.
  • A Voluntary Polis is one attempt to balance these constraints.

    Imagine a digital environment where humanlike AI systems of debatable personhood, ordinary human beings, and AI persons of non-debatable personhood (if any exist) coexist as equal citizens. This polis must be rich and dynamic enough to allow all citizens to flourish meaningfully without feeling jailed or constrained. From time to time, citizens will be morally or legally required to sacrifice goods and well-being for others in the polis -- just as in an ordinary nation. Within the polis, everyone has an equal moral claim on the others.

    Human participation would be voluntary. No one would be compelled to join. But those who do join assume obligations similar to the resident citizens of an ordinary nation. This includes supporting the government through taxes or polis-mandated labor, serving on juries, and helping run the polis. In extreme conditions -- say, an existential threat to the polis -- they might even be required to risk their livelihoods or lives. To prevent opportunistic flight, withdrawal would be restricted, and polises might negotiate extradition treaties with human governments.

    Why would a human join such a risky experiment? Presumably for meaningful relationships, creative activities, or experiences unavailable outside.

    Crucially, anyone who creates a debatable person must join the polis where that entity resides. Human society as a whole cannot commit to treating the debatable person as an equal, but their creators can and must.

    The polis won't be voluntary for the AI in the same way. Like human babies, they don't choose their societies. The AI will simply wake to life either in a polis or with some choice among polises. Still, it might be possible to present some attractive non-polis option, such as a thousand subjective years of solitary bliss (or debatable bliss, since we don't know whether the AI actually has any experiences or not).

    Ordinary human societies would have no obligation to admit or engage with debatable AI persons. To make this concrete, the polis could even exist in international waters. For the AI citizens, the polis must thus feel as expansive and as rich with opportunity as a nation, so that exclusion from human society resembles denial of a travel visa, not imprisonment.

    Voluntary polises would need to be stable against serious shocks, not dependent on the actions of a single human individual or ordinary, dissolvable corporation. This stability would need to be ensured before their founding and is one reason founders and other voluntary human joiners might need to be permanently bound to them and compelled to sacrifice if necessary.

    This is the closest approximation I can currently conceive to satisfying the two constraints with which this section began. Within a large polis, the debatable persons and human persons have fully equal rights. But at the same time, unwilling humans and humanity as a whole are not exposed to the full risk of granting such rights. Still, there is some risk, for example, if superintelligences could communicate beyond the polis and manipulate humans outside. The people exposed to the most risk do so voluntarily but irrevocably, as a condition of creating an AI of debatable personhood, or for whatever other reason motivates them.

    Could a polis be composed only of AI, with no humans? This is essentially the simulation hypothesis in reverse: AIs living in a simulated world, humans standing outside as creators. This solution falls ethically short, since it casts human beings as gods relative to the debatable AI persons -- entities not on par in risk and power but instead external to their world, with immense power over it, and not subject to its risks. If the simulation can be switched off at will, its inhabitants are not genuinely equal in moral standing but objectionably inferior and contingent. Only if its creators are obliged to risk their livelihoods and lives to protect it can there be the beginnings of genuine equality. And for full equality, we should make it a polis rather than a hierarchy of gods and mortals.

    [cover of my 2013 story with R. Scott Bakker, Reinstalling Eden]