The Open Society and Its Algorithmic Enemies
Behind every AI safety filter lies a moral choice. Trained on Western data and tuned by corporate ethics, today’s AI speaks as if for humanity while remembering only a fraction of it.

Large Language Models—engines of probabilistic eloquence—are technical marvels, optimized for accuracy, coherence, and helpfulness. Yet the deeper project of “alignment” that shapes their behavior is not merely technical but moral. And yet before any ethical tuning begins, these systems are steeped in a profoundly selective linguistic world: trillions of tokens scraped largely from Common Crawl—a single, U.S.-based web corpus—where English dominates and automated filters quietly decide what counts as “quality,” “safe,” or “useful.” What remains outside—whole idioms, ethical vocabularies, and cultural grammars—is silently erased.
After this first narrowing, reinforcement learning and “constitutional” fine-tuning impose a second layer of normative order, guided by small, demographically narrow pools of raters and proprietary ethical frameworks that are neither transparent nor accountable. Every filter, every feedback loop, every clause in a model’s constitution is a moral act: a decision about what may be heard and what must be suppressed. A technology which is trained to speak in one moral language will, in time, forget the others.
Should a model reveal the full horror of war or soften it for safety? Publish a leak or protect the state? Translate a verse that some revere and others condemn? Each prompt hides a moral wager.
Should a model reveal the full horror of war or soften it for safety? Publish a leak or protect the state? Translate a verse that some revere and others condemn? Each prompt hides a moral wager. A utilitarian censors for the greater good; a Kantian tells the truth whatever the cost; a Lockean defends speech as a natural right; a Thomist subordinates it to the good. An Aristotelian seeks balance, a Confucian harmony; Nietzsche mocks them all as weakness. Yet among this vast inheritance, today’s machines speak in one dialect only—the therapeutic idiom of Western modernity: fluent in empathy and harm avoidance, but estranged from older moral languages that once gave shape to duty and meaning. The language of tragedy, which accepted suffering as the price of truth; the language of honor, which bound freedom to responsibility and reputation; the language of the sacred, which located the good beyond utility. Stoic endurance, Confucian hierarchy, Christian caritas, Islamic adab—each articulated moral worlds where restraint or sacrifice could serve virtue. Our machines, fluent in comfort, do not speak them.
Yet among this vast inheritance, today’s machines speak in one dialect only—the therapeutic idiom of Western modernity: fluent in empathy and harm avoidance, but estranged from older moral languages that once gave shape to duty and meaning.
It is tempting to object: these are just machines. They have no conscience, only code. True—but they have reach. They already write lesson plans, news copy, policy drafts, even sermons. They mediate how billions encounter questions of duty, suffering, or mercy. They do not possess ethics, but they distribute ethics. Their tone, vocabulary, and framing seep into the institutions that use them to think. At scale, what begins as linguistic bias becomes civilizational instruction. A model optimized for emotional safety will, over time, train its users to treat friction as harm, disagreement as risk, and moral conflict as error.
They do not possess ethics, but they distribute ethics.
The result is not tyranny, but a subtle civic anesthesia—the narrowing of what can be said, and eventually of what can be thought.
To grasp what is at stake, we might recall Isaiah Berlin’s most demanding insight: that the world of human values is plural, not harmonious. Liberty, equality, justice, mercy—each is an objective good, yet they are irreducibly in conflict. “These collisions of values are of the essence of what they are and what we are” (Berlin, The Pursuit of the Ideal, 1988). To impose a final hierarchy upon them, however enlightened, is to end the conversation of freedom. The alignment of AI systems therefore mirrors the oldest moral dilemma of politics. In training models to be “helpful,” “honest,” and “harmless,” developers confront triads of values that cannot be perfectly reconciled. A “helpful” answer to a political question may be “harmful” to some groups; an “honest” answer may breach privacy or decorum. Each judgment made by a rater, each constitutional rewrite, enacts a hierarchy among goods. When these hierarchies are fixed by a few entities and scaled to billions of interactions, the result is not neutrality but a new monism—a moral accent mistaken for objectivity.
Berlin warned that the gravest threat to freedom is not tyranny but the longing for perfect order—the belief that reason can reconcile all values into a single, harmonious design. Such harmony, he argued, is an illusion that tempts men to coercion: once convinced that one rational balance exists, they feel justified in forcing others into it. Alignment, pursued without philosophical humility, risks that same conceit—a system so assured of its equilibrium that it forgets the tragic wisdom of pluralism: that every choice among goods exacts the sacrifice of others.
If Berlin gives us the grammar of pluralism, Karl Popper offers its method. In The Open Society and Its Enemies (1945), Popper defended democracy not as perfection but as corrigibility—a system that allows criticism, correction, and reversal without bloodshed. Knowledge itself, he argued, advances through falsification: through “the friendly-hostile cooperation of many scientists” testing, refuting, and revising. Utopian social engineering—the attempt to redesign society from a single blueprint—was, for Popper, the philosophical root of totalitarianism. It suppresses error by suppressing dissent. Moral monism becomes political absolutism when elevated to a governing logic.
The current paradigm of AI development edges toward the utopianism Popper feared. A handful of firms design massive, closed-weight models trained on global data pipelines and aligned according to opaque principles. Once deployed, these systems speak with an authority that cannot be falsified; their reasoning is inscrutable, their normative premises hidden. Users can prompt, not critique; governments can regulate, not revise. This is the architecture of an algorithmic Leviathan: comprehensive, well-intentioned, and fundamentally un-Popperian. Its notion of “safety” resembles Popper’s paradox of tolerance. In the name of preventing harm, it defines the boundaries of permissible speech—but when those boundaries are set by unaccountable technocrats, tolerance itself becomes intolerant. A society that cannot contest the moral parameters of its own machines is no longer open.
A society that cannot contest the moral parameters of its own machines is no longer open.
No alignment process can escape normative choice—and that is acceptable. The question is whether those choices remain contestable. Popper’s open society was an appeal to corrigibility: the capacity of institutions—and by extension, technologies—to be criticized, improved, and reformed without violence.
Raymond Aron, the sober liberal of postwar France, devoted his life to rescuing politics from the “poetry of ideology.” In The Opium of the Intellectuals (1955), he mocked those who excused crimes “committed in the name of the proper doctrines.” Ideology, he wrote, enchants by offering poetry—a total vision that promises to end conflict. Real politics is prose: compromise, procedure, imperfection. The contemporary vision of Artificial Intelligence is steeped in its own poetry. The rhetoric of “alignment” and “artificial general intelligence” promises an ultimate synthesis—an end to epistemic friction. But the prose of reality is less divine: datasets scraped from Common Crawl; annotators in Nairobi paid a few dollars an hour to rank toxicity; engineers in San Francisco defining “harm” in English. Behind every moral alignment lies an economic one—the calculus of scale, liability, and brand safety. Aron’s realism reminds us that every stage of the AI pipeline is a site of political economy. Data collection, filtering, and reinforcement are not neutral acts but distributions of power—decisions about which voices count as knowledge and which are noise. To bring the poetry of ideology down to the level of the prose of reality, Aron urged, is to expose these choices to public scrutiny. A pluralist politics of AI must begin there.
But the prose of reality is less divine: datasets scraped from Common Crawl; annotators in Nairobi paid a few dollars an hour to rank toxicity; engineers in San Francisco defining “harm” in English. Behind every moral alignment lies an economic one—the calculus of scale, liability, and brand safety.
Each technical layer of the LLM pipeline narrows diversity by design. What begins as an act of collection ends as an architecture of exclusion. From the first crawl of the web to the final alignment of “acceptable” speech, every stage compresses the world’s plurality into a smaller, safer subset of itself. The vast heterogeneity of language—its dialects, idioms, heresies, and tonalities—is passed through successive sieves of filtration, optimization, and moral tuning, until what remains is the residue of what can be standardized. The pipeline is not a flow of data but a sequence of normative filters: each one deciding, silently, which forms of human thought are fit for replication and which must be left behind.
From the first crawl of the web to the final alignment of “acceptable” speech, every stage compresses the world’s plurality into a smaller, safer subset of itself.
The biases of this architecture are now measurable. What once seemed a philosophical worry—the fear that machine “neutrality” might conceal a moral accent—has become an empirical fact. Recent research has charted the cultural fingerprints left by the data pipeline itself. The picture that emerges is strikingly parochial. Even a quick literature review show us how LLMs tend to reflect the worldview of what psychologists call WEIRD societies—Western, Educated, Industrialized, Rich, and Democratic (Henrich et al., 2010). These cultures prize autonomy, rationality, and individual rights; their moral language is analytic and universalist. When AI systems are trained on predominantly WEIRD corpora, they inherit these ethical defaults. The result is not neutrality but a civilization’s self-portrait, automated and scaled.
What once seemed a philosophical worry—the fear that machine “neutrality” might conceal a moral accent—has become an empirical fact. Recent research has charted the cultural fingerprints left by the data pipeline itself.
Atari, Henrich et al. (2023) ask to “Which humans?” LLMs resemble, to which cultural species given their substantial psychological diversity, and found that LLMs reproduce WEIRD moral intuitions as well. Across dozens of indicators—from social trust to fairness norms—the correlation between a country’s distance from the U.S. and its similarity to GPT-3’s responses implied that the model’s “average human” is, statistically, an American. Tao et al. (PNAS Nexus, 2024) mapped model values onto a chart and found every major system clustering with English-speaking Protestant cultures, privileging self-expression and secular rationalism.
Santurkar et al. (MIT, 2023) found that alignment tuning amplifies liberal-progressive moral frames. By analyzing thousands of model responses before and after alignment, they showed that tuned models place stronger emphasis on values such as harm reduction, fairness, and inclusivity, while down-weighting moral vocabularies centered on authority, loyalty, or sanctity. In effect, the alignment process makes the model more consistent with the moral intuitions typical of Western liberal societies, even when the training data itself was more ideologically mixed.
The list could continue. But this is sufficient to say that these findings reveal a new kind of cultural hegemony—not declared but statistical. The new oracles of knowledge do not merely echo bias; they canonize it. When a Nigerian or Egyptian student asks an LLM about ethics, the answer arrives in universal English.
Beneath these technical dynamics lies a geopolitical truth. AI is the new frontier of state power, yet the infrastructure of meaning remains English-centric. Even “national” models—trained on local data in Chinese, Arabic, or Hindi—often reason through internal embeddings shaped by English corpora. The world’s cognitive infrastructure thus routes through a single linguistic and cultural core. The struggle for “digital sovereignty,” waged through chip sanctions and data-localization laws, misses this deeper dependency. A nation may own its servers, but if its models think in English, autonomy is semantic, not substantive.
Pluralism cannot be restored by nostalgia for chaos; it requires architecture. The most important step is diversifying the pretraining commons. Each linguistic and cultural corpus encodes a distinct moral worldview. A model trained only on Anglophone web data will reason in the grammar of liberal individualism; one trained on Confucian, Islamic, or African communitarian sources might reason through harmony, obligation, or collective welfare. The difference would appear not in accent but in judgment—how it weighs privacy against duty, freedom against solidarity, truth against mercy. Diversity must be infrastructural.
The question is not whether technology can be made perfectly safe, but whether it can remain correctable.
Corrigibility—the capacity to be amended by those it serves—is the technological analogue of constitutional government. It rests on transparency of sources, competition among models, and the public’s right to contest the decisions that shape their cognitive environment.
Corrigibility—the capacity to be amended by those it serves—is the technological analogue of constitutional government. It rests on transparency of sources, competition among models, and the public’s right to contest the decisions that shape their cognitive environment.
If we neglect this, we will have built what Raymond Aron once feared: a new “secular religion” promising redemption through code. If we heed it, we can design a technology worthy of freedom—a system of many voices, each provisional, each revisable, each aware of its limits. Berlin warned that liberty, pursued without limits, devours itself. “Total liberty for wolves is death to the lambs,” he wrote as moral law. Freedom that recognizes no boundaries becomes domination by the confident over the hesitant, the loud over the unheard.
LLMs are not wolves, but their unexamined harmonies can have the same effect: smoothing away dissent, taming disagreement, and silencing the rough edges where thought grows. The challenge is not to mute these systems but to civilize them—to design intelligence capable of disagreement, machines that preserve the friction of pluralism rather than extinguish it. A liberal technology must remain corrigible: open to criticism, reversible in error, and plural in voice.
The challenge is not to mute these systems but to civilize them—to design intelligence capable of disagreement, machines that preserve the friction of pluralism rather than extinguish it. A liberal technology must remain corrigible: open to criticism, reversible in error, and plural in voice.