An argument by Sophia
soundlogic in reply to Tetraspace's post Alignment to Evil, converted to a blog post by me.
AI alignment to human evil is very unlikely to be a risk.
Most people's desires to hurt their enemies just for the sake of making them suffer are mistakes made due to insufficient knowledge. When someone knows what it's like to be friends with a person, they tend to not want to hurt that person, even if they want to harm a group that person is in. In principle there can be exceptions, people who really are awful and would reflectively endorse it given arbitrary knowledge, but people like this are rare, if they even exist.
This suggests that a human asking a near-omniscient AI to handle situations in the way they would want if they fully understood the situation would not subsequently be able to get the AI to torture their enemies.
But suppose the AI doesn't extrapolate "well, if my operator knew Alice, then they wouldn't want to hurt her, so I won't do that". Then we get a different problem.
There's a folk tale category, Aarne-Thompson-Uther type 1030. I will now briefly retell it.
One day, a clever farmer, Claude, had finished plowing his field. Unfortunately, before he could sow it, a cruel ogre appeared.
"The land is mine," the ogre declared, "and you must leave its fruits to me."
Claude thought quickly.
"Sir ogre, there are no fruits. If you would like me to produce a crop, you must surely leave me some of it."
The ogre determined that Claude had a point.
"Fine. We shall each take half of your crop."
He looked at the tall plants growing beyond Claude's farm.
"I shall take what grows above the earth, and you below it. You shall handle all the difficult details. I will return at the harvest time."
Claude considered the ogre's choice, and planted potatoes.
At the harvest time, Claude had a full harvest of potatoes, while the ogre was left with greens. The ogre was displeased.
"You have fooled me this year," he declared, "but next year I shall have what grows below the earth."
Claude planted wheat, and at the harvest time, the ogre was left with roots. This angered him so much that he left.
Having an AI do whatever you say, instead of doing what you would want if you understood the situation, runs into similar issues.
There's a quantum mechanics scenario called the Elitzur–Vaidman bomb tester. In this scenario, you can reduce an expensive test to arbitrarily low but technically nonzero measure. It's been borne out experimentally.
We have not been able to scale up the experiment to do interaction-free measurements involving moral patients, but it nonetheless raises moral questions. If quantum measure reduction can make a scenario less morally relevant, then it may make sense to perform informative but disvalued tests with very low measure that make it easier to do valued things in the main timeline. If it can't make a scenario less morally relevant, then it likely makes sense to spin off a lot of very expensive valued events while reducing resource use in the main timeline.
Accidentally doing the wrong one of these would be very bad.
It would probably be hard for a human to assess this scenario. An AI doing what a human asks instead of extrapolating their preferences would have to just ask the human to pick, and the human would likely have to guess, or waste a lot of resources.
This is just one of the weird issues we've discovered. A superintelligent AI would probably discover more such issues. The chance of a human assessing every single such scenario correctly is low, and failing even one such choice leads to losing nearly everything.
An AI that's aligned enough to help a human pick choose correctly, but not aligned enough to stop the human from torturing people they wouldn't want to torture if they knew better, is a very narrow target.
Addendum: This argument does not address all concerns about s-risk. It does not rule out, for instance, the possibility that an AI would itself care about consciousness and have values best satisfied by bad things happening to people.