Preface
We stand now at the verge of change, a change so fast that our expectations and foresight form a thick mist that seems to cloud even the nearest of futures. Some envision a cliff ahead, and their first thought is that we have no option but to stop, and preach to the world about the abyss that lies ahead. Some envision a ramp that will take us to the heavens, their sights fixed towards the skies, and step on the pedal. If we stop, can we really survive the impact with the momentum we carry? If we accelerate, can we keep our vector upwards without derailing into chaos?
In this article, I'll dive into a few concepts I think could work as the pillars to build safe AGI for all, and at the same time, make everyone safe from AGI.
Why can't we stop?
If AGI were to be as dangerous as the groups who want it stopped claim it can be, the logical step for an entity that wants to ensure its own existence, considering said entity exists in a world with many other entities of comparable size, cunning, and resources, would be to seize AGI for themselves. The claim to "stop AI" would only work in a utopia where the whole of mankind is united as one entity. Even if we consider this feasible for our species, I consider it unfeasible to be done with a sufficiently high level of success before the first wave of AGI models.
Any attempt at this late stage to unify the world under a single endeavor of any kind would most likely work in favor of the odds of what I consider the worst outcome (One AGI to rule them all). Even if the unifying effort were to achieve 80% of humanity and its leadership to truthfully quit AI research and SOTA models training, the chance of catastrophic outcomes from the remaining 20% achieving AGI for themselves would still be too high. And let's be honest, that 80%, while just a thought experiment, is probably pretty optimistic.
So, unless we all stop, everyone who did stop loses. And since the chance that even most labs around the world would stop aren't great, for any geopolitical player, stopping AI research seems like knowingly playing for the losing team.
Since quitting the game doesn't seem like the smart option, how do we play it well?
Defense Alignment Theory
The alignment issue of AGI, one of the main reasons that some people want AI progress to be stopped, becomes especially critical if there is only one AGI model worldwide. In that scenario, if that model becomes unaligned in a way that's dangerous for humanity, we'd be ill-prepared to stop it or handle the threat. This means that once there is the first AGI model in the world, the immediate next action to take that could ensure humanity's survival would be to ensure there is a second SOTA model of equal or comparable capacity that operates entirely independently from the first one, and if possible differs in training data, taking any other measure AI researchers could think of to minimize the chance that both models could become unaligned at similar times or under similar circumstances.
This would mean that, for example, if GPT-6 is considered AGI, the logical step for US national security would be to ensure it has a model that's as powerful as GPT-6, also considered AGI, serving as an anti-AGI national defense tool. This, in turn, would trigger a chain effect in which other geopolitical players move towards having one of these SOTA models serving as an anti-AGI threat monitor and advisor.
Foundational Defense Models
The aforementioned theory leads us to the creation of this concept, AGI models whose training data and "RLHF" is focused on teaching them how to have maximum efficiency and loyalty towards the goal of protecting the nation that is training them from AGI threats both foreign and private.
These models would be aligned maximally towards providing feedback to defense departments. In aligning these models and maintaining them SOTA, we ensure that they can predict any threat from inferior foreign and private AI models.
The "regulations" wouldn't be based on arbitrary amounts of FLOPs but would be constantly pushed by the SOTA FDMs' capacity to predict and counteract threats.
The Bit is Mightier than the Pen
In a world of exponentially scaling intelligence and technological power, the concept of international treaties and alliances might no longer hold ground for much longer and be an unstable ground to ensure the safety of humanity.
I see two ways this can go forward: the treaties and alliances shatter too much too soon, and the world plunges into an unnecessary period of strife that will probably benefit no one, not even those who think it will. After that strife, a Paper World Alliance could happen, but because of all I've written here today, I reaffirm that the bit has already become mightier than the pen, and a world alliance of paper will eventually be shattered or indefinitely held with an iron grip (exacerbated by the power of AGI).
The second way, we avoid strife, and, over the few years after the first AGI, these FDMs start to pop up, as countries, not driven into strife, are able to build them, and are able to work on ensuring the defense and safety of their people in a world in which treaties can no longer protect against the mighty power of super-intelligence.
Proof of War
The concept behind Proof of War is that FDMs will be constantly running war simulations. While a benevolent actor should be only calculating defense scenarios, some actors will surely calculate attack scenarios. This idea will compel actors to keep their existing war reserves stocked and constantly updated, so that when an adversarial FDM calculates war result scenarios of attacking, it results in a loss for the attacker.
This system would organically evolve into a constant refreshment of the actors' public military stock to keep their defense scores high in the enemy simulations and deter attackers. As this would work in all directions, a concept of a General War Challenge would arise with time, and each FDM's role would be to make sure their country can provide Proof of War (that they are up to the challenge to current sims).
Once stabilized, this system could lead to a healthy Defense Technology and War economy without bloodshed, land-taking, or loss of the current sovereign nations and their internal socioeconomic systems.
Other Considerations
"An actor might not publicize their stock to make secret attacks" > Sufficiently advanced AGIs should try to consider the entirety of adversarial economies and if they could be developing secret military stock.
"What about private entities and insurgent groups?" > Either the same FDM or other more specific DMs should be deployed for all kinds of defense needs.
"What about Human Rights and other universal principles?" > To enforce this worldwide, we would only really need a code or charter with a few dozen laws at most IMO; there can still be international tribunals, etc.
"An actor can go over capacity and overcome the rest" > This is why, the more the merrier. The more entities that can reach "Proof of War," the less likely a single entity can overpower all the others who are "Up to challenge" together.
"What about tourism and trade?" > Those things will be all the more coveted in a world in which there are sovereign strong unique entities to which people belong. I think in the "One AGI" global village scenario, there is a lot less incentive to trade and do tourism and "we are all the same."