Intuitions on Alignment

Average misaligned AGIAverage misaligned AGI

Laundry list of thoughts on AGI and alignment.

Institutions

  1. Government bureaucracies are networked intelligences that are difficult to steer, have malicious actors / grifters, continue to be inefficient as good actors are filtered out whilst bad actors aren promoted. To what extent is this a good analogy for a misaligned AGI?
  2. AGI, as with any potent technology, is the handmaiden of its creators.  It matters what political systems and nations achieve AGI, and in what order.
  3. Today, the greatest threat from AGI is not from AI going rogue, but rogue actors using AI for malicious intent. The first is inevitable, the latter the subject of intense debate 

Nuclear Weapons

  1. How far do analogies to nuclear weaponry hold? What does MAD look like? 
  2. Differences: Nuclear weapons aren’t intelligent, they’re deterministic missiles. 
  3. Are biological weapons (eg viruses) a more apt comparison? Biological entities can replicate, just like software. 
  4. AGI doom scenarios, if they are to occur, will be a result of centralization of compute and uninhibited access control to large scale resource production systems. They will not result from on ground drones or killer robots (can trivially be remotely dismantled) 
  5. AGI doom assumes an AI that has access at the scale of national resources / weaponry, and incentive for harm. The incentive portion of alignment is oft discussed, and hard to pin down (we simply can’t place strong bets under such uncertainty). The access control is less discussed. Spoiler: Psycho Pass offers interesting food for thought on what a society with centralized decision making and planning might look like 

Timelines

  1. Goalpost for AGI keeps on shifting. 5 years ago (2019), would you have predicted we’d have GPT 4 / Claude Opus level capabilities? Whether transformers will ultimately scale to (even more) general problem solving paradigms is less relevant here; note our inability to project 
  2. What are your AI predictions for 5 years from now? Let’s see what the prediction markets say. (Note to self: look at Metaculus markets for AI predictions, or meta predictions by experts)
  3. If you have short AGI timelines, how has that affected your investing strategies? If it hasn’t, and you do, this is your cue to go re evaluate your portfolio (this is not financial advice)

Game Theory

  1. Paperclip like doom scenarios assume agents will pursue a purely selfish strategy profile. Yet, AGI may be likely to coordinate with other entities (other AGI, governments, corporations, any large networked intelligence really) simply because coordinated equilibria are often optimal. It’s the reason we have traffic lights. 
  2. AGI is more likely to be cooperative by default rather than combative, by virtue of the fact that they are not RNG-ed but rather designed by humans 
  3. AGI will likely be distinct entities, which given internet access, can communicate amongst themselves. We already have them - they’re called AI agents 

Transformers

  1. Transformer models are not good for nailing exact formalisms (maths vs english). 

Consciousness

  1. Consciousness is unnecessary for simulating humans (Chinese Room), but what about real planning / thinking?**

Meta

  1. Alignment is not EA any more than it is e/acc. Legit research in the field (mechanistic interpretability) is just building tools to deeply understand neural networks. This is fundamental science, an attempt to encroach the black boxes we take for granted in our AI systems. Important to not get caught up in religious turf wars as we try to understand intelligence. 
  2. Most alignment takes are recycled from Yudkowsky / LeCun, lack of true first principles thinking even amongst the rationalist adjacent, not a dunk, cautionary note, thinking is rare. Exercise: set 1hr timer to think about solving alignment. Just you and a notepad. Credits: Alexey Guzey
  3. Aside: If you have a series of predictions you are 90% sure about, let’s say 7 bets, each building on top of each other, what’s your overall probability? Answer: 0.9^7=0.49. This should give you pause before boldly asserting limit case scenarios about doom. ht Venkatesh Rao

Reading list

  1. Meditations on Moloch (Scott Alexander)

Date
April 14, 2024