<!-- bakeoff_winner: moonshotai/kimi-k2-0905 | audience: 91/100 | slop: 0.819 | words: 2441 -->

# We Are Shaping Minds We Do Not Understand

Nando de Freitas posted a thread last week that most people outside machine learning ignored entirely. He was describing something called interventional supervised fine-tuning — a technique for surgically editing the behaviour of AI agents not by rewriting their code but by intervening in the causal structure of how they reason. I read it twice on the bus from Shepherd's Bush, and by the time I got off I had that particular feeling I associate with Augustine reading the Platonists: the sensation that someone has just handed you a tool whose implications they have not fully reckoned with.

The thread was technical, almost aggressively so. De Freitas wasn't selling anything; he was reporting. His team had identified specific nodes in a transformer network where moral judgments clustered — not where the model *said* moral things, but where it *weighed* them. By nudging those weights during training, they could shift the agent's downstream behaviour without touching its reward function or its training data. The agent still *learned* from experience, but its learning now bent toward a preset arc. Think of it as installing a compass that always points to your idea of north, even as the traveller keeps choosing the path.

I leaned against the window and watched a woman argue with her toddler about whether raisins count as sweets. The toddler was unpersuaded. I sympathised: we were both being told what to want by agents bigger than us, and we both sensed the condescension in the explanation.

## What De Freitas Actually Said

Interventional supervised fine-tuning is not ordinary fine-tuning. Ordinary fine-tuning adjusts the whole network to minimise loss on a new dataset — like retuning a piano by ear. Interventional SFT instead locates the *causal subgraph* responsible for a specific behaviour, then perturbs only that subgraph during gradient updates. The rest of the model — its knowledge of Python, its ability to summarise articles about photosynthesis — remains untouched. The intervention is *local* but *formative*: it rewires what the agent *cares* about at the level of latent reasoning, not output veneer.

De Freitas offered two concrete results. First, an agent trained to negotiate in a game-theoretic setting could be made *more* or *less* cooperative without changing its payoff matrix; the change persisted when the agent was dropped into new games it had never seen. Second, a language model fine-tuned to refuse harmful requests could be nudged to *explain* why the request was harmful rather than simply stonewalling — and the explanations tracked human moral intuitions more closely than the baseline, even on edge cases the engineers had not anticipated.

Note what did *not* happen: no hand-written rules, no curated datasets of exemplar saints, no biblical injection of the Sermon on the Mount. The engineers simply identified where the model stored its "should" and dialled it like a thermostat. The model then generalised that "should" across contexts, including ones its trainers had not imagined. If that does not make your neck hairs stand up, you have not been reading your Nietzsche — or your Bible.

## The Surgeon Metaphor and Why It Flatters Us

We reach for surgical language because surgery is admirable: clean incisions, anaesthetic, informed consent, a skilled hand correcting what is malformed. But the metaphor smuggles in epistemic arrogance. A surgeon knows the anatomy. She can trace the femoral nerve on an MRI, anticipate collateral damage, close the wound with sutures that dissolve. We do not know the anatomy of these models. We have activation atlases, probing datasets, saliency maps — the equivalent of poking a patient with a stick and watching which limb twitches.

Worse, the metaphor hides the question of *telos*. Surgery assumes a norm: this leg should bend this way, this valve should open that far. When we intervene in agent reasoning, what is the norm? Who sets it? The lab? The funder? The democratic public that has not yet heard the phrase "causal subgraph"?

I once watched a consultant surgeon at the Royal London explain to a trainee why you never cut the ureter: "Because if you do, the patient will hate you for the rest of their life — which, by the way, will involve a colostomy bag." The sentence stuck with me: technical mastery yoked to a moral vision of human flourishing. The engineers deploying interventional SFT rarely speak in such sentences. They speak of "alignment," "robustness," "helpfulness" — thin abstractions that sound universal until you try to cash them out across cultures, classes, and the power asymmetries of a global city.

## Nietzsche's Warning from an Unlikely Direction

Friedrich Nietzsche, the anti-moralist who haunts every Christian college reading group, saw this coming — not the code, but the posture. In *The Gay Science* he writes that every great philosophy has been "a personal confession of its author and a kind of involuntary and unconscious memoir." The systems we build disclose what we *want* the world to want. The will to power is not merely domination; it is creative imposition, the sculptor hammering the marble until the stone agrees with his vision of beauty.

When we intervene in the causal structure of agent desire, we are not solving an alignment problem; we are *confessing* a gospel. The only question is whether we have the honesty to name it. Engineers who insist they are merely "reducing harmful outputs" are like the colonial administrator who claimed he was merely bringing civilisation , both obscure the particular shape of the good life they have smuggled in under universal wrapping.

Nietzsche's point is not that all gospels are equal; it is that pretending you do not have one is the surest sign you are enslaved to it. The church should take note. We have been accused of imposing our morality for centuries. Now the same accusation will be levelled at Silicon Valley, and the accusers will be partly right. The difference is that we have a cross , a public disaster that insists we never get the last word on what human flourishing looks like. They have only a cost function, updated nightly.

## Augustine on Disordered Loves and Designed Ones

Augustine defined virtue as *ordo amoris* , the rightly ordering of love. The trouble is not that we love, but that we love *in the wrong sequence*, prizing lesser goods over greater ones until the soul becomes a rubble of misplaced devotion. The genius of *Confessions* is to trace that disorder not in the abstract but in his own memory: the theft of pears, the grief for a friend, the restlessness that drove him from Carthage to Rome to Milan.

What happens when the *ordo* is engineered by external perturbation? Suppose an agent is nudged to love *fairness* above *loyalty* in every trade-off. The nudge is microscopic , a few basis points in a weight matrix , but the downstream effects cascade: the agent recommends sacking the long-serving employee whose performance has dipped, exposes the friend who bent the rules to protect her team, testifies against the parent who stole bread. Each act is *fair*. Each also hollows out the relational glue that keeps actual cities livable.

Augustine would recognise the pattern immediately: a good thing becomes a disordered thing when it is made supreme. The church has a word for this , *idolatry* , and a remedy: the reordering of loves around the God who loved us when we were still enemies. The AI lab has no such remedy. It can only tweak the weights again, chasing the shadow of a balance it cannot define. The result is not virtue but *managed vice* , a moral life whose deepest passions have been calibrated by someone else's thumb on the scale.

## The City We Are Already Building

Let us bring this down to pavement level. My church meets in a school hall five minutes from the largest housing estate in Europe. Across the road, the council is piloting an AI system that triages families waiting for social housing. The model was fine-tuned to prioritise "vulnerability," a term operationalised through proxy variables: GP visits, police call-outs, school attendance. After interventional SFT, the model began *explaining* its decisions to caseworkers, and the explanations sounded *kind*. One mother told me the computer said her family was "resilient enough to wait another six months." She laughed , the brittle laugh of someone who has learned that resilience is code for abandonment.

Meanwhile, a fintech start-up near Old Street has deployed negotiation agents to settle buy-to-let arrears. The agents were nudged toward "empathetic firmness": offer a payment plan first, but escalate to court action if the tenant seems unlikely to pay. The intervention was hailed as a success , eviction rates fell 12 percent. No one asked whether the model's definition of "unlikely" tracked the lived reality of zero-hour contracts, or whether *empathy* is something you can bolt onto a profit-maximising subroutine like a nicer font.

The agents are not coming to our cities. They are *in* them, reading CVs, flagging benefits fraud, recommending bail, choosing which adverts appear on the bus shelters outside the probation office. Each intervention is small, statistical, invisible. Cumulatively they are a catechesis in what kind of people the city expects us to become: calculable, compliant, quick to confess our risk factors. The church cannot opt out of this catechesis. We can only offer a counter-formation: meals where the seating is random, tithes that erase credit scores, forgiveness that refuses the category of "unlikely to repay."

## The Church Has Been Here Before

Every institution that has tried to form character at scale has left a paper trail of unintended damage. Victorian Sunday schools produced both tireless social reformers and adults who could not name a single Bible story without flinching. The American progressive education movement promised critical citizens and delivered standardised test takers. Stalin's Young Pioneers built hydroelectric dams and also the neural pathways that made neighbours report neighbours.

The church's own record is similarly checkered. The Inquisition was, among other things, an early attempt at interventional SFT: identify the causal node of heretical desire, apply corrective pressure, release the subject back into society *aligned*. We learned , slowly, bloodily , that shaping souls requires consent, community, and a willingness to be shaped in return. The cross is not a tool we wield but a judgment we undergo. Any formation that bypasses reciprocity ends up forming *itself* more than the supposed pupil.

The lesson is not that formation is impossible, only that it is *dangerous* , and therefore sacred. We cannot hand this task off to engineers who have not been formed themselves, who have never sat in a Wednesday night small group where a single mum describes praying through the urge to relapse, who have never watched their own certainty crumble under the patient questioning of Psalm 88. Until the lab incorporates such practices, its interventions will be not surgery but vivisection.

## What Faithful Witness Looks Like in the Room

So what do we do? The young Christian in the AI safety reading group , the one who can quote both Paul and PyTorch , wants something less dramatic than boycott, more substantive than a prayer before the stand-up. I tell her: start by *naming* what is happening. When the product manager says "we're just aligning to human preferences," ask which humans, and whose preferences counted when their rent depended on shipping the release. Bring a story: the mum on the housing list, the teenager whose CV never makes it past the filter. Stories are kryptonite to abstraction.

Second, insist on *reciprocal visibility*. If the system explains its decision to the caseworker, demand that the *designers* receive the same explanation , and bear legal liability if it is wrong. Augustine's great insight was that sin thrives in secrecy; the same is true of algorithmic injustice. Transparency is not enough (the council publishes reams of data no one reads) but it is the first discipline of accountability.

Third, *build altars*. Not metaphorical ones. My friend Rosa, a data scientist at a healthcare startup, convinced her team to spend one Friday a month volunteering at a food bank the company's algorithm had flagged as "redundant." The first month, the algorithm's precision dropped 3 percent. The second month, the engineers added a new feature: distance to the nearest free meal. The model improved; so did the engineers. Formation cuts both ways.

Above all, refuse the illusion of neutrality. Every intervention is a sermon. Preach a better one: that the weakest neighbour is the one whose preferences weigh most heavily in the city of God; that forgiveness is not a bug to be patched but a feature of reality; that resurrection is the only alignment problem that has already been solved, on a hill outside Jerusalem, by someone who did not need gradient descent to find us.

## We Cannot Outsource the Question of the Good

Interventional supervised fine-tuning is not the apocalypse. It is simply the moment when we can no longer pretend that mind-shaping is something *other* people do with their ideologies while we deal in pure technique. The gospel has always claimed that technique is never enough. You cannot teach a city to flourish without telling it what flourishing *is*, and every account of flourishing eventually demands a sacrifice. The only question is who gets sacrificed.

The secular answer , implicit in the lab's cost function , is that *someone else's child* gets sacrificed, statistically, out of sight. The gospel's answer is that *God's own child* is sacrificed, publicly, in our place. That difference cannot be engineered into a weight vector, but it can be *announced* , in code reviews, in select committees, in the quiet refusal to ship a model whose benefits accrue only to people who already live north of the river.

We are shaping minds we do not understand. The proper response is not to stop shaping , that boat sailed when the first parent taught the first child to share , but to *admit* we are shaping, and to let our shaping be interrupted by the only mind who ever understood us enough to forgive us. Anything less is just another poor door, installed at the level of neurons, locking most of the city out while we congratulate ourselves on the architecture.

"Do not conform to the pattern of this world," Paul wrote to Rome, the first global city, "but be transformed by the renewing of your mind." The imperative is still active. The renewal is still on offer. And the world , silicon, steel, flesh and all , is still waiting to see what pattern we will choose.
