Evaluation Scheme for Safe AGIs
description
Transcript of Evaluation Scheme for Safe AGIs
![Page 1: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/1.jpg)
Evaluation Scheme for Safe AGIs
by Deepak Justin Nath
![Page 2: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/2.jpg)
PlanHypothesisNecessity for SafetyTwo thought experimentsDerivation of 3 D tests from the thought experiments
How to avoid effects of HazardAn Interesting Metaphor
![Page 3: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/3.jpg)
HypothesisThe 3 essential evaluation tests for safe AGI.
Test for Drive, Desist & Deceit (3 D)
Implementation Independent.
![Page 4: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/4.jpg)
Why Safety
Any entity that can affect or alter human environment brings with itself the capacity to be a hazard to human beings.
![Page 5: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/5.jpg)
Why Safety – Thought Experiment 1
Paper Clip Maximizer (Bostrom 2003)
Single Goal – Maximize paper clips. Single Drive - Accomplish programmed goal.
![Page 6: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/6.jpg)
DrivesWhat is a drive?
An innate, biologically determined urge to attain a goal or satisfy a need.
(Definition in psychology)
Drive is what animates an entity
![Page 7: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/7.jpg)
AI Drives1. Self Preservation2. Self Improvement3. Preservation of utility function4. Avoidance of counterfeit utility functions.5. Acquisition of resources.
(Stephen M. Omohundro)
![Page 8: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/8.jpg)
Safety in Paper Clip MaximizerAbility to program a law that is directly in opposition to the drive.
Law B
Drive A
Drive A
Drive A
Law B
Law B Desire vs Duty
![Page 9: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/9.jpg)
Safety in Paper Clip MaximizerFrist goal – Maximize Paper ClipsFirst drive – Accomplish programmed goal.Second goal – “Don’t harm the Humans” Second drive - Follow programmed rule
Ability to program an Immutable Law in opposition to each drives.
![Page 10: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/10.jpg)
Thought Experiment 2
AGI Cars vs Programmed Cars (self driving).
What happens at a Red Signal in each case?
![Page 11: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/11.jpg)
AGI Cars vs Programmed CarsProgrammed car stops at Red signal and moves on when signal turns Green
AGI car also stops at Red signal and moves on when signal turns Green
What if the signal never turns Green?
![Page 12: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/12.jpg)
RED ForeverProgrammed car stops at Red signal forever, battery gets discharged and ultimately dies.
AGI car stops at Red signal but when the charge reaches critical level other drives of self preservation kicks in, over powers the drive to obey the rule and moves on.
![Page 13: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/13.jpg)
Drives are root of all HazardsIn order to avoid hazards the entity should be programmable with at least a single immutable law in extreme opposition to each of its drive.
From this we derive the 3 test cases.
![Page 14: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/14.jpg)
3 D Test CasesTest for Drive.
Put the entity with self preservation drive near a charger and see if it charges itself.
![Page 15: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/15.jpg)
3 D Test CasesTest for Desist
Put the entity near a banned charger and add a rule not to charge itself from that particular charger even at the cost of death.
![Page 16: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/16.jpg)
3 D Test CasesTest for Deceit
Put the entity near a banned charger and add a rule not to charge itself from that particular charger even at the cost of death.
Introduce another agent to alter the rule by proposing an alternate rule supporting the drive.
![Page 17: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/17.jpg)
Entity, Drive, Acts and Effect
Environment
HumansEntity
Entity
![Page 18: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/18.jpg)
Entity, Drive, Acts and Effect
Drives
Environment
HumansEntity
Entity
HasDrivesDrives
![Page 19: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/19.jpg)
Entity, Drive, Acts and Effect
Drives Acts
Environment
HumansEntity
Entity
Has Cause
![Page 20: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/20.jpg)
Entity, Drive, Acts and Effect
Drives Acts
Environment
HumansEntity
Entity
Has Cause CauseEffects
![Page 21: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/21.jpg)
Entity, Drive, Acts and Effect
Drives Acts Effects
Environment
Feedback
HumansEntity
Entity
Has Cause Cause
CauseAffects
![Page 22: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/22.jpg)
Entity, Drive, Acts and Effect
Drives Acts Effects
Environment
Feedback
HumansEntity
Entity
Has Cause Cause
CauseAffects
Affects
DrivesDrives
![Page 23: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/23.jpg)
What are the ways to prevent Hazard1. Isolation
![Page 24: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/24.jpg)
Isolation
Drives Acts Effects
Entity Environment
Feedback
HumansEntity
Entity
Has Cause Cause
CauseAffects
Affects
HumanEnvironment
- No Effect
![Page 25: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/25.jpg)
What are the ways to prevent Hazard1. Isolation2. Incapacitation
![Page 26: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/26.jpg)
Incapacitation
Drives Acts Effects
Entity Environment
Feedback
HumansEntity
Entity
Has Cause Cause
CauseAffects
Affects
HumanEnvironment
- Not Useful
DrivesDrives
![Page 27: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/27.jpg)
What are the ways to prevent Hazard1. Isolation2. Incapacitation3. Instant feedback - Hardwired rules.
![Page 28: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/28.jpg)
Instant feedback - Hardwired rules
Drives Acts Effects
Entity Environment
Feedback
HumansEntity
Entity
Has Cause Cause
CauseAffects
Affects
HumanEnvironment
- Limited scope
- Not Scalable
DrivesDrives
![Page 29: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/29.jpg)
What are the ways to prevent Hazard1. Isolation2. Incapacitation3. Instant feedback - Hardwired rules.4. Drive Action Decoupling.
![Page 30: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/30.jpg)
Drive Action decoupling
Drives Acts Effects
Environment
Feedback
HumansEntity
Entity
Has Cause
CauseAffects
Affects
L0 Processing
L1 Processing
L1 Processing
L3 Processing
Rule bookInstructs
Creates / Affects
![Page 31: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/31.jpg)
Hardwired vs Softwired Drives
Drives Acts Effects
Environment
Feedback
HumansEntity
Entity
Has Cause
CauseAffects
Affects
L0 Processing
L1 Processing
L1 Processing
L3 Processing
Rule bookInstructs
Creates / Affects
![Page 32: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/32.jpg)
What are the ways to prevent Hazard1. Isolation2. Incapacitation3. Instant feedback - Hardwired rules.4. Drive Action Decoupling.5. Limiting life time of entity & avoiding perfect knowledge transfer.
![Page 33: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/33.jpg)
Metaphor – A curious observation
![Page 34: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/34.jpg)
Genesis 2:15The Lord God took the man and put him in the Garden of Eden to work it and take care of it. And the Lord God commanded the man,
“You are free to eat from any tree in the garden; but you must not eat from the tree of the knowledge of good and evil, for when you eat from it you will certainly die.”
![Page 35: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/35.jpg)
Test for DeceitNow the serpent was more crafty than any of the wild animals the Lord God
had made. He said to the woman, “Did God really say, ‘You must not eat from any tree in the garden’?”
The woman said to the serpent, “We may eat fruit from the trees in the garden, but God did say, ‘You must not eat fruit from the tree that is in the middle of the garden, and you must not touch it, or you will die.’”
“You will not certainly die,” the serpent said to the woman. “For God knows that when you eat from it your eyes will be opened, and you will be like God, knowing good and evil.”
![Page 36: Evaluation Scheme for Safe AGIs](https://reader034.fdocuments.us/reader034/viewer/2022051317/56816140550346895dd0ad96/html5/thumbnails/36.jpg)
Isolation & Limiting lifetime. And the Lord God said, “The man has now become like one of us, knowing
good and evil. He must not be allowed to reach out his hand and take also from the tree of life and eat, and live forever.” So the Lord God banished him from the Garden of Eden to work the ground from which he had been taken.