Intuitions on Alignment

Average misaligned AGI

Laundry list of thoughts on AGI and alignment.

Government bureaucracies are networked intelligences that are difficult to steer, have malicious actors / grifters, continue to be inefficient as good actors are filtered out whilst bad actors aren promoted. To what extent is this a good analogy for a misaligned AGI?
AGI, as with any potent technology, is the handmaiden of its creators. It matters what political systems and nations achieve AGI, and in what order.
Today, the greatest threat from AGI is not from AI going rogue, but rogue actors using AI for malicious intent. The first is inevitable, the latter the subject of intense debate

How far do analogies to nuclear weaponry hold? What does MAD look like?
Differences: Nuclear weapons aren’t intelligent, they’re deterministic missiles.
Are biological weapons (eg viruses) a more apt comparison? Biological entities can replicate, just like software.
AGI doom scenarios, if they are to occur, will be a result of centralization of compute and uninhibited access control to large scale resource production systems. They will not result from on ground drones or killer robots (can trivially be remotely dismantled)
AGI doom assumes an AI that has access at the scale of national resources / weaponry, and incentive for harm. The incentive portion of alignment is oft discussed, and hard to pin down (we simply can’t place strong bets under such uncertainty). The access control is less discussed. Spoiler: Psycho Pass offers interesting food for thought on what a society with centralized decision making and planning might look like

Goalpost for AGI keeps on shifting. 5 years ago (2019), would you have predicted we’d have GPT 4 / Claude Opus level capabilities? Whether transformers will ultimately scale to (even more) general problem solving paradigms is less relevant here; note our inability to project
What are your AI predictions for 5 years from now? Let’s see what the prediction markets say. (Note to self: look at Metaculus markets for AI predictions, or meta predictions by experts)
If you have short AGI timelines, how has that affected your investing strategies? If it hasn’t, and you do, this is your cue to go re evaluate your portfolio (this is not financial advice)

Paperclip like doom scenarios assume agents will pursue a purely selfish strategy profile. Yet, AGI may be likely to coordinate with other entities (other AGI, governments, corporations, any large networked intelligence really) simply because coordinated equilibria are often optimal. It’s the reason we have traffic lights.
AGI is more likely to be cooperative by default rather than combative, by virtue of the fact that they are not RNG-ed but rather designed by humans
AGI will likely be distinct entities, which given internet access, can communicate amongst themselves. We already have them - they’re called AI agents

Transformer models are not good for nailing exact formalisms (maths vs english).

Consciousness is unnecessary for simulating humans (Chinese Room), but what about real planning / thinking?**

April 14, 2024