This is a way better safety plan than what will actually be attempted, I predict. So I'd support it if governments were taking it seriously, as a big step in the right direction. The hard part is getting governments to take it seriously.
It could also be that one AGI playing against another realises that it needs to fool the researchers to win and builds a communications gimmick to do so. If it is clever enough to fight, it can be clever enough to deceive, to persuade. If he's smart enough, he can time it. How can this be prevented?
Interesting. I haven't gotten to study this is in detail, but I love the concreteness of this plan as compared to a lot of the more "meta level concept mapping" stuff I've seen on Less Wrong.
I have two comments:
First, I love you're bringing up boxing. Boxing isn't a long term solution, but doing boxing well and applying secure boxing techniques globally can buy us a lot of time, perhaps years of time. So understanding boxing and how to do it well is absolutely vital. I'm a pretty unclear how the AI could monitor water flows in the building, but I think this sort of paranoia is what we need. The biggest risks, I think, are human social engineering and hidden messages (steganography).
Secondly, I think this dovetails nicely with Tegmark's formal verification paper - AIs will generate proofs to prove alignment criteria are met by other AIs, and humans will check those proofs. It's relatively easy for humans to write trustworthy code to check the proof even if they don't understand it at all. (https://arxiv.org/abs/2309.01933)
This is a way better safety plan than what will actually be attempted, I predict. So I'd support it if governments were taking it seriously, as a big step in the right direction. The hard part is getting governments to take it seriously.
It could also be that one AGI playing against another realises that it needs to fool the researchers to win and builds a communications gimmick to do so. If it is clever enough to fight, it can be clever enough to deceive, to persuade. If he's smart enough, he can time it. How can this be prevented?
Chris what do you think of this system?
Lots of people pointing out problems. Anyone discussing peaceful solutions?
For the past two years we have been trying to find solutions to our biggest problem: THE CORRUPTION OF THE SYSTEMS THAT GOVERN OUR LIVES.
Is it solvable? Yes of course. All problems that do not defy the laws of physics are solvable.
This took us 2 years to write this. How to fix corrupt government in 3 simple steps:
https://open.substack.com/pub/joshketry/p/how-to-fix-corrupt-government-in?r=7oa9d&utm_medium=ios&utm_campaign=post
Interesting. I haven't gotten to study this is in detail, but I love the concreteness of this plan as compared to a lot of the more "meta level concept mapping" stuff I've seen on Less Wrong.
I have two comments:
First, I love you're bringing up boxing. Boxing isn't a long term solution, but doing boxing well and applying secure boxing techniques globally can buy us a lot of time, perhaps years of time. So understanding boxing and how to do it well is absolutely vital. I'm a pretty unclear how the AI could monitor water flows in the building, but I think this sort of paranoia is what we need. The biggest risks, I think, are human social engineering and hidden messages (steganography).
Secondly, I think this dovetails nicely with Tegmark's formal verification paper - AIs will generate proofs to prove alignment criteria are met by other AIs, and humans will check those proofs. It's relatively easy for humans to write trustworthy code to check the proof even if they don't understand it at all. (https://arxiv.org/abs/2309.01933)