A survey of research questions for robust and bene cial AI · PDF fileA survey of research...

A survey of research questions for robust and beneficial AI

1 Introduction

Artificial intelligence (AI) research has explored a variety of problems and approaches since its inception,but for the last 20 years or so has been focused on the problems surrounding the construction of intelligentagents—systems that perceive and act in some environment. In this context, the criterion for intelligence isrelated to statistical and economic notions of rationality—colloquially, the ability to make good decisions,plans, or inferences. The adoption of probabilistic representations and statistical learning methods has led toa large degree of integration and cross-fertilization between AI, machine learning, statistics, control theory,neuroscience, and other fields. The establishment of shared theoretical frameworks, combined with theavailability of data and processing power, has yielded remarkable successes in various component tasks suchas speech recognition, image classification, autonomous vehicles, machine translation, legged locomotion,and question-answering systems.

As capabilities in these areas and others cross the threshold from laboratory research to economicallyvaluable technologies, a virtuous cycle takes hold whereby even small improvements in performance are worthlarge sums of money, prompting greater investments in research. There is now a broad consensus that AIresearch is progressing steadily, and that its impact on society is likely to increase. The potential benefitsare huge, since everything that civilization has to offer is a product of human intelligence; we cannot predictwhat we might achieve when this intelligence is magnified by the tools AI may provide, but the eradication ofdisease and poverty are not unfathomable. Because of the great potential of AI, it is valuable to investigatehow to reap its benefits while avoiding potential pitfalls.

The progress in AI research makes it timely to focus research not only on making AI more capable, butalso on maximizing the societal benefit of AI. Such considerations motivated the AAAI 2008–09 PresidentialPanel on Long-Term AI Futures [61] and other projects and community efforts on AI impacts. Theseconstitute a significant expansion of the field of AI itself, which up to now has focused largely on techniquesthat are neutral with respect to purpose. The present document can be viewed as a natural continuationof these efforts, focusing on identifying research directions that can help maximize the societal benefit ofAI. This research is by necessity interdisciplinary, because it involves both society and AI. It ranges fromeconomics, law, and philosophy to computer security, formal methods and, of course, various branches of AIitself. The focus is on delivering AI that is beneficial to society and robust in the sense that the benefits areguaranteed: our AI systems must do what we want them to do.

This document is an attempt to lay out some of the research topics that we think will be most useful todo now in order to shape the future impact of AI. We will surely find that some questions are less usefulor timely than others, and some important ones are missing. We hope this guide will be a helpful sourceof suggestions, but also that potential grantees won’t be discouraged from approaching us with similarlyrelevant topics we didn’t think of. We will try to publish future versions that are up to date with progressin the field.

We are very grateful to the many people who have contributed to this document, in particular DanielDewey, Stuart Russell, and Max Tegmark for their invaluable work on the research priorities document, LukeMuehlhauser for his list of potential strategic research projects, Nate Soares and MIRI for their technicalagenda, and the MIRIxOxford research workshop analyzing and expanding on the MIRI technical agenda.

Many people at FLI have contributed lists of additional research projects and directions, including JimBabcock, Steve Greidinger, Janos Kramar, Richard Mallah and Max Tegmark, and many more have provided

1

helpful feedback and suggestions, including Anthony Aguirre, Erik Brynjolfsson, Meia Chita-Tegmark, DanielDewey, Owain Evans, Eric Gastfriend, Ales Flidr, Katja Grace, Evan Hefner, Viktoriya Krakovna, ColleenMessing, Howard Messing, Chase Moores, Luke Muehlhauser, Sean O hEigeartaigh, Tristan Plumb, StuartRussell, Murray Shanahan, Nate Soares, Marin Soljacic, Michele Reilly, Anders Sandberg, John Sturm, JaanTallinn, Jacob Trefethen, Nick Ryder, Michael Vassar, and Alexander Wissner-Gross, Eliezer Yudkowsky.Thanks also to Nick Bostrom for writing the excellent and seminal book “Superintelligence”. Like the restof the document, this list is at an early stage - we’re looking forward to receiving your constructive feedbackand comments!

2

Contents

1 Introduction 1

2 Short-term research priorities 42.1 Optimizing AI’s Economic Impact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Measuring and Forecasting Economic Impact of Automation and AI . . . . . . . . . . 42.1.2 Policy research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.1.3 Managing potential adverse effects of automation and AI . . . . . . . . . . . . . . . . 5

2.2 Law and Ethics Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.3 Computer Science Research for Robust AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3.1 Verification (“Did I build the system right?”) . . . . . . . . . . . . . . . . . . . . . . . 72.3.2 Validity (“Did I build the right system?”) . . . . . . . . . . . . . . . . . . . . . . . . . 82.3.3 Security (“How can I prevent unauthorized access?”) . . . . . . . . . . . . . . . . . . . 92.3.4 Control (“OK, I built the system wrong, can I fix it?”) . . . . . . . . . . . . . . . . . . 9

3 Long-term research priorities 113.1 Some perspectives on the long term . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.3 Validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3.3.1 Ethics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.3.2 Ensuring goal stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.4.1 Software containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.4.2 Psychological containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4.3 Hardware containment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4.4 Tripwires: Detection & Response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.4.5 Detecting intent to deceive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.5 Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.5.1 Corrigibility and Domesticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.5.2 Safe and Unsafe Agent Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Forecasting 234.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.3 Forecasting AI progress . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.4 Forecasting AI takeoff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264.5 Brain emulations (uploads) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5 Policy and Collaboration 27

3

2 Short-term research priorities

2.1 Optimizing AI’s Economic Impact

The successes of industrial applications of AI, from manufacturing to information services, demonstrate agrowing impact on the economy, although there is disagreement about the exact nature of this impact and onhow to distinguish between the effects of AI and those of other information technologies. Many economistsand computer scientists agree that there is valuable research to be done on how to maximize the economicbenefits of AI while mitigating adverse effects, which could include increased inequality and unemployment[72, 21, 39, 40, 107, 84, 69]. Such considerations motivate a range of research directions, spanning areas fromeconomics to psychology. Below are a few examples that should by no means be interpreted as an exhaustivelist.

2.1.1 Measuring and Forecasting Economic Impact of Automation and AI

When and in what order should we expect various jobs to become automated [39]? How will this affect theemployment and wages of various professions, including less skilled workers, creatives, and different kinds ofinformation workers? Some have argued that AI is likely to greatly increase the overall wealth of humanityas a whole [21]. However, increased automation may push income distribution further towards a power law[22], and the resulting disparity may fall disproportionately along lines of race, class, and gender; researchanticipating the economic and societal impact of such disparity could be useful.

1. It is possible that economic measures such as real GDP per capita do not accurately capture the benefitsand detriments of heavily AI-and-automation-based economies, making these metrics unsuitable forpolicy purposes [72]. Research on improved metrics could be useful for decision-making.

2. What has been the historical record on jobs being displaced by automation? What have the averagerate and distribution of displacement been - has it been clustered in time, industry, and geography?How long before displaced workers found new jobs? Did displacement contribute to inequality?

3. Is there anything different about the advancement of artificial intelligence happening now that wouldlead us to expect a change from our centuries-long historical record on jobs being displaced by automa-tion?

4. What factors make an industry more amenable or less amenable to automation? Machines histori-cally performed rote mass-production, but we’ve been expanding their capabilities with advances ininformation processing and artificial intelligence.

5. Which markets are most susceptible to disruption as automation advances? Significant parts of theeconomy – including finance, insurance, actuarial, and many consumer markets – could experienceupheaval through use of AI techniques to learn, model, and predict agent actions. These markets mightbe identified by a combination of high complexity and high rewards for navigating that complexity [69].Are there other features?

6. Some jobs could be done by machines in principle but require a particular advance in artificial intelli-gence before it becomes feasible. What are some of those prerequisite advances?

7. Based on the above factors, how far in advance can we predict that an industry is likely to be largelyautomated? Is this something we can predict and prepare for 20 years ahead? 10 years? 6 months?Not at all?

4

2.1.2 Policy research

What is the space of policies that could/should be considered for helping AI-assisted societies flourish?For example, Brynjolfsson and McAfee [21] explore various policies for incentivizing development of labor-intensive sectors and for using AI-generated wealth to support underemployed populations. What outcomesare these policies likely to lead to? What are the key uncertainties in these outcome predictions, and whatresearch would help reduce these uncertainties?

1. What factors contribute to a ‘winner-take-all’ dynamic of software-based industries? The low cost ofreproduction and increasingly global nature of the information economy, for example, make it easierto concentrate wealth. What other factors might we study and how could they be quantified?

2. Conversely, what factors counteract the ‘winner-take-all’ dynamic? For example, the lowered cost ofentering a market might make it easier for new startups to compete with established products.

3. How well does the neoclassical model of anti-trust regulation apply to an economy increasingly dom-inated by software and AI-assistance? Will we need to develop new frameworks of regulation, or willour current laws adapt well enough?

4. Will the economy undergo deflation as software becomes a larger share of productivity? The potentialfor a relatively rapid, large-scale productivity boost from software could increase the purchasing powerof the dollar. What are the closest examples of this occurring and do our current forecasts properlyincorporate it?

5. If the economy does experience deflation, could governments effectively take advantage of it by spendingmore on projects that reduce inequality?

2.1.3 Managing potential adverse effects of automation and AI

If automation and AI-assistance does lead to lower employment, it will be important to evaluate the societalstructures that determine whether such populations flourish or succumb to depression and self-destructivebehavior. History provides many examples of subpopulations not needing to work for economic security,ranging from aristocrats in antiquity to many present-day citizens of Qatar. What societal structures andother factors determine whether such populations flourish? Unemployment is not the same as leisure, andthere are deep links between unemployment and unhappiness, self-doubt, and isolation [53, 28]; understand-ing what policies and norms can break these links could significantly improve the median quality of life.

1. What problems might arise in low employment societies? How strong are the correlations betweenemployment level and rates of depression, crime, or drug abuse, for example?

2. Are there bright spots within the current correlations? What societies suffer less or even prosper inlower employment?

3. What cultural elements play into how low employment impacts different societies’ flourishing? Forexample, Bellezza, Keinan, and Paharia[12] found that conspicuous hours worked was positively cor-related with perceived status in the United States, but negatively correlated in Europe. What othervariables might be in play, and can they be used to predict how different cultures/subcultures will beaffected by low employment?

4. Are there historical analogues of societies in which groups have not had to work for economic security?(e.g. Traditional aristocracy, children of wealthy parents, etc.) What activities and mindsets have ledthem to consider their life happy or meaningful? Do these factors apply to the current development ofAI-assisted automation?

5

2.2 Law and Ethics Research

The development of systems that embody significant amounts of intelligence and autonomy leads to importantlegal and ethical questions whose answers impact both producers and consumers of AI technology. Thesequestions span law, public policy, professional ethics, and philosophical ethics, and will require expertisefrom computer scientists, legal experts, political scientists, and ethicists. For example:

1. Liability and law for autonomous vehicles: If self-driving cars cut the roughly 40,000 annual UStraffic fatalities in half, the car makers might get not 20,000 thank-you notes, but 20,000 lawsuits.[66]In what legal framework can the safety benefits of autonomous vehicles such as drone aircraft and self-driving cars best be realized [127]? Should legal questions about AI be handled by existing (software-and internet-focused) “cyberlaw”, or should they be treated separately [23]? In both military andcommercial applications, governments will need to decide how best to bring the relevant expertise tobear; for example, a panel or committee of professionals and academics could be created, and Calo hasproposed the creation of a Federal Robotics Commission [24].

2. Machine ethics: How should an autonomous vehicle trade off, say, a small probability of injuryto a human against the near-certainty of a large material cost? How should lawyers, ethicists, andpolicymakers engage the public on these issues? Should such trade-offs be the subject of nationalstandards?

3. Autonomous weapons: Can lethal autonomous weapons be made to comply with humanitarian law[27]? If, as some organizations have suggested, autonomous weapons should be banned [34, 125], isit possible to develop a precise definition of autonomy for this purpose, and can such a ban practi-cally be enforced? If it is permissible or legal to use lethal autonomous weapons, how should theseweapons be integrated into the existing command-and-control structure so that responsibility and li-ability be distributed, what technical realities and forecasts should inform these questions, and howshould “meaningful human control” over weapons be defined [99, 98, 3]? Are autonomous weaponslikely to reduce political aversion to conflict, or perhaps result in “accidental” battles or wars [10]?Finally, how can transparency and public discourse best be encouraged on these issues?

4. Privacy: How should the ability of AI systems to interpret the data obtained from surveillancecameras, phone lines, emails, etc., interact with the right to privacy? How will privacy risks interactwith cybersecurity and cyberwarfare [109]? Our ability to take full advantage of the synergy betweenAI and big data will depend in part on our ability to manage and preserve privacy [68, 1].

5. Professional ethics: What role should computer scientists play in the law and ethics of AI de-velopment and use? Past and current projects to explore these questions include the AAAI 2008–09Presidential Panel on Long-Term AI Futures [61], the EPSRC Principles of Robotics [13], and recently-announced programs such as Stanford’s One-Hundred Year Study of AI and the AAAI committee onAI impact and ethical issues (chaired by Rossi and Chernova).

From a policy perspective, AI (like any powerful new technology) enables both great new benefits and novelpitfalls to be avoided, and appropriate policies can ensure that we can enjoy the benefits while risks areminimized. This raises policy questions such as these:

1. What is the space of policies worth studying?

2. Which criteria should be used to determine the merits of a policy? Candidates include verifiabil-ity of compliance, enforceability, ability to reduce risk, ability to avoid stifling desirable technologydevelopment, adoptability, and ability to adapt over time to changing circumstances.

6

2.3 Computer Science Research for Robust AI

As autonomous systems become more prevalent in society, it becomes increasingly important that theyrobustly behave as intended. The development of autonomous vehicles, autonomous trading systems, au-tonomous weapons, etc. has therefore stoked interest in high-assurance systems where strong robustnessguarantees can be made; Weld and Etzioni have argued that “society will reject autonomous agents unlesswe have some credible means of making them safe” [130]. Different ways in which an AI system may fail toperform as desired correspond to different areas of robustness research:

1. Verification: how to prove that a system satisfies certain desired formal properties. (“Did I build thesystem right?”)

2. Validity: how to ensure that a system that meets its formal requirements does not have unwantedbehaviors and consequences. (“Did I build the right system?”)

3. Security: how to prevent intentional manipulation by unauthorized parties.

4. Control: how to enable meaningful human control over an AI system after it begins to operate. (“OK,I built the system wrong, can I fix it?”)

2.3.1 Verification

By verification, we mean methods that yield high confidence that a system will satisfy a set of formalconstraints. When possible, it is desirable for systems in safety-critical situations, e.g. self-driving cars, tobe verifiable.

Formal verification of software has advanced significantly in recent years: examples include the seL4 kernel[63], a complete, general-purpose operating-system kernel that has been mathematically checked against aformal specification to give a strong guarantee against crashes and unsafe operations, and HACMS, DARPA’s“clean-slate, formal methods-based approach” to a set of high-assurance software tools [38]. Not only shouldit be possible to build AI systems on top of verified substrates; it should also be possible to verify thedesigns of the AI systems themselves, particularly if they follow a “componentized architecture”, in whichguarantees about individual components can be combined according to their connections to yield propertiesof the overall system. This mirrors the agent architectures used in Russell and Norvig [102], which separate anagent into distinct modules (predictive models, state estimates, utility functions, policies, learning elements,etc.), and has analogues in some formal results on control system designs. Research on richer kinds ofagents—for example, agents with layered architectures, anytime components, overlapping deliberative andreactive elements, metalevel control, etc.—could contribute to the creation of verifiable agents, but we lackthe formal “algebra” to properly define, explore, and rank the space of designs.

Perhaps the most salient difference between verification of traditional software and verification of AIsystems is that the correctness of traditional software is defined with respect to a fixed and known machinemodel, whereas AI systems—especially robots and other embodied systems—operate in environments thatare at best partially known by the system designer. In these cases, it may be practical to verify that thesystem acts correctly given the knowledge that it has, avoiding the problem of modelling the real environment[31]. A lack of design-time knowledge also motivates the use of learning algorithms within the agent software,and verification becomes more difficult: statistical learning theory gives so-called ε-δ (probably approximatelycorrect) bounds, mostly for the somewhat unrealistic settings of supervised learning from i.i.d. data andsingle-agent reinforcement learning with simple architectures and full observability, but even then requiringprohibitively large sample sizes to obtain meaningful guarantees.

Research into methods for making strong statements about the performance of machine learning algo-rithms and managing computational budget over many different constituent numerical tasks could be improveour abilities in this area, possible extending work on Bayesian quadrature [52, 45]. Work in adaptive controltheory [11], the theory of so-called cyberphysical systems [91], and verification of hybrid or robotic systems[2, 131] is highly relevant but also faces the same difficulties. And of course all these issues are laid on topof the standard problem of proving that a given software artifact does in fact correctly implement, say, a

7

reinforcement learning algorithm of the intended type. Some work has been done on verifying neural networkapplications [95, 122, 106] and the notion of partial programs [5, 119] allows the designer to impose arbitrary“structural” constraints on behavior, but much remains to be done before it will be possible to have highconfidence that a learning agent will learn to satisfy its design criteria in realistic contexts.

It is possible that the best methodology for gaining high confidence that large, complex AI systems willsatisfy their design criteria is to be found in the realm of software engineering methodology/standards ratherthan in formal verification. It’s also possible that while formal verification would be best in the long run, inpractice the research progress required for constructing a superintelligence comes to fruition at a time whenformal verification is prohibitively expensive to the teams that are closest to building superintelligence.

In this case, it would be good to understand what current work would be most valuable to reducing therisk of adverse outcomes arising from bugs in implementation. This work would most likely be less theoreticaland more practical and implementation-specific than most of the other research explored in this document.Some of the questions to investigate here are:

1. What categories of bugs are most hazardous? Some particularly undesirable sorts of bugs are:

(a) bugs that lie dormant during ordinary testing but can be encountered in larger settings givenenough time. (For example, integer overflows or accumulation of numerical error.)

(b) portability bugs, ie bugs that arise from differences in libraries, environment, or hardware. (Forexample, GPGPU libraries.)

(c) “Heisenbugs”, ie bugs that manifest in practice but not in a debugging environment.

(d) bugs that are difficult to reproduce for some reason, such as bugs affected by nondeterministicscheduling of concurrent threads of execution, or by the interaction of this with some other sortof state, such as a random number generator.

2. How likely would these sorts of bugs be to arise in a hazardous way if an otherwise-promising super-intelligence project was undertaken in the medium-term?

3. What kinds of tools or software changes would make the most difference in mitigating the risk of anadverse outcome? Some ideas:

(a) influence current and upcoming programming language interpreters, compilers, application virtualmachines (such as the JVM), etc. to adopt a default behavior (or at least an option) of throwingexceptions on encountering numerical overflow/underflow.

(b) ensure software quality of particularly popular state-of-the-art machine learning libraries, GPGPUlibraries, and other core components.

(c) assess the prevalence of portability bugs and promote adherence of standards that could resolvethem.

2.3.2 Validity

A verification theorem for an agent design has the form, “If environment satisfies assumptions φ then behaviorsatisfies requirements ψ.” There are two ways in which a verified agent can, nonetheless, fail to be a beneficialagent in actuality: first, the environmental assumption φ is false in the real world, leading to behavior thatviolates the requirements ψ; second, the system may satisfy the formal requirement ψ but still behave inways that we find highly undesirable in practice. It may be the case that this undesirability is a consequenceof satisfying ψ when φ is violated; i.e., had φ held the undesirability would not have been manifested; orit may be the case that the requirement ψ is erroneous in itself. Russell and Norvig [102] provide a simpleexample: if a robot vacuum cleaner is asked to clean up as much dirt as possible, and has an action to dumpthe contents of its dirt container, it will repeatedly dump and clean up the same dirt. The requirementshould focus not on dirt cleaned up but on cleanliness of the floor. Such specification errors are ubiquitousin software verification, where it is commonly observed that writing correct specifications can be harder than

8

writing correct code. Unfortunately, it is not possible to verify the specification: the notions of “beneficial”and “desirable” are not separately made formal, so one cannot straightforwardly prove that satisfying ψnecessarily leads to desirable behavior and a beneficial agent.

In order to build systems that robustly behave well, we of course need to decide what “good behavior”means in each application domain.[73] This ethical question is tied intimately to questions of what engineeringtechniques are available, how reliable these techniques are, and what trade-offs can be made –– all areas wherecomputer science, machine learning, and broader AI expertise is valuable. For example, Wallach and Allen[128] argue that a significant consideration is the computational expense of different behavioral standards(or ethical theories): if a standard cannot be applied efficiently enough to guide behavior in safety-criticalsituations, then cheaper approximations may be needed. Designing simplified rules – for example, to govern aself-driving car’s decisions in critical situations – will likely require expertise from both ethicists and computerscientists. Computational models of ethical reasoning may shed light on questions of computational expenseand the viability of reliable ethical reasoning methods [9, 120]; for example, work could further explore theapplications of semantic networks for case-based reasoning [70], hierarchical constraint satisfaction [67], orweighted prospective abduction [89] to machine ethics.

Explicit ethical systems generally have the desideratum of transparency and understandability. Therehave been a number of approaches to modeling ethical systems, such as using semantic casuistry networks[70], hierarchical constraint satisfaction[67], category theory[19], and weighted prospective abduction [89], butresearch on more dynamic representations would be beneficial for integrating with machine learning systems.Long-term safety researchers[78] point out that seemingly any explicitly encoded moral philosophy (or ethicalrulebase) is incomplete and therefore leads to unintended and perverse interpretations and instantiations,particularly outside the boundaries of the environment in which it was formulated; they further point out thewide heterogeneity of ethical systems in moral philosophy. This heterogeneity might be useful as a resourcehowever: research on formal mathematics to compare, contrast, and overlay ethical rulebases, such as viaalgebraic topology [36] may enable ensembles of moderately conflicting ethical rulebases to have more robustproperties than any individual ethical system. Multiobjective optimizations over multiple ethical systemsand explicit goals may also merit further study [82].

2.3.3 Security

Security research can help make AI more robust. As AI systems are used in an increasing number of criticalroles, they will take up an increasing proportion of cyber-attack surface area. It is also probable that AI andmachine learning techniques will themselves be used in cyber-attacks.

Robustness against exploitation at the low level is closely tied to verifiability and freedom from bugs. Forexample, the DARPA SAFE program aims to build an integrated hardware-software system with a flexiblemetadata rule engine, on which can be built memory safety, fault isolation, and other protocols that couldimprove security by preventing exploitable flaws [30]. Such programs cannot eliminate all security flaws(since verification is only as strong as the assumptions that underly the specification), but could significantlyreduce vulnerabilities of the type exploited by the recent “Heartbleed bug” and “Bash Bug”. Such systemscould be preferentially deployed in safety-critical applications, where the cost of improved security is justified.

At a higher level, research into specific AI and machine learning techniques may become increasinglyuseful in security. These techniques could be applied to the detection of intrusions [65], analyzing malware[97], or detecting potential exploits in other programs through code analysis [20]. It is not implausible thatcyberattack between states and private actors will be a risk factor for harm from near-future AI systems,motivating research on preventing harmful events. As AI systems grow more complex and are networkedtogether, they will have to intelligently manage their trust, motivating research on statistical-behavioraltrust establishment [94] and computational reputation models [103].

2.3.4 Control

For certain types of safety-critical AI systems – especially vehicles and weapons platforms – it may bedesirable to retain some form of meaningful human control, whether this means a human in the loop, on the

9

loop[54, 88], or some other protocol. In any of these cases, there will be technical work needed in order toensure that meaningful human control is maintained [33].

Automated vehicles are a test-bed for effective control-granting techniques. The design of systems andprotocols for transition between automated navigation and human control is a promising area for furtherresearch. Such issues also motivate broader research on how to optimally allocate tasks within human-computer teams, both for identifying situations where control should be transferred, and for applying humanjudgment efficiently to the highest-value decisions.

10

3 Long-term research priorities

A frequently discussed long-term goal of some AI researchers is to develop systems that can learn fromexperience with human-like breadth and surpass human performance in most cognitive tasks, thereby havinga major impact on society. If there is a non-negligible probability that these efforts will succeed in theforeseeable future, then additional current research beyond that mentioned in the previous sections will bemotivated as exemplified below, to help ensure that the resulting AI will be robust and beneficial.

Assessments of this success probability vary widely between researchers, but few would argue with greatconfidence that the probability is negligible, given the track record of such predictions. For example, ErnestRutherford, arguably the greatest nuclear physicist of his time, said in 1933 that nuclear energy was “moon-shine”1 , and Astronomer Royal Richard Woolley called interplanetary travel “utter bilge” in 1956 [96].Moreover, to justify a modest investment in this AI robustness research, this probability need not be high,merely non-negligible, just as a modest investment in home insurance is justified by a non-negligible proba-bility of the home burning down.

3.1 Some perspectives on the long term

Looking into the future, there is a very real possibility that we will see general AI and superintelligence.(We’ll defer discussion of when and how this is likely to happen to 4.) This would be truly revolutionary -for the first time, we could have machine assistance in every domain. What would this mean? Unfortunatelyit’s difficult to get a complete picture from examples, because we haven’t seen any systems that are morecapable than humans at every cognitive task.

One hint comes from looking at chimpanzees, our closest living relatives. Chimpanzees share 94% ofour genetic code, have a complex social structure, and are capable of planning, tool use, and some symboliclanguage use. However, unlike chimp intelligence, human intelligence has accumulated innovations (such ascomplex language, writing, the scientific method, etc) that have reshaped the planet, and that give our speciesunprecedented power; for better or for worse, chimpanzees will only continue to exist if we see their existenceas more valuable than what we could gain from eliminating them. Fortunately, we see the elimination of ourclosest living relatives as particularly barbaric and immoral, so they probably won’t be added to the growinglist of species we have driven to extinction - but this should give us a sense of the power superintelligencecould have. Of course, even if we happen to be the most intelligent species around, it would be a mistake toanthropomorphize a superintelligent AI system, which may have very little in common with us internally;certainly much less than a chimp. So what can we say about it? In order to be a superintelligent AI, thesystem must be doing very well at some task by some performance measure, relative to the resources thesystem is using. Looking at the system in a high-level way, we can describe it as having preferences to dowell according to this performance measure. The preferences might be intended by the designer or not; theymight be represented in the system or not; they might be context-specific; temporary; and they might bebest defined relative to some virtual/abstract world (eg a chessboard), or the real world.

We will refer back to this notion of preferences, which underlies various arguments about the natureof superintelligence. Preferences are often called “goals”; the latter term may sound like it refers to listsof fully-known binary-state objectives for the agent to accomplish, perhaps without a way of prioritizingbetween them, but this is not the intended meaning.

Finally: of course, we already have systems that have superhuman capability in many domains, such asrote computation, chess, and Jeopardy; and this has not exposed us to any dramatic risks. An intelligentsystem need not literally have every human capability in order to be dangerous, but it’s likely to need someof the following types of capabilities[16]:

1. Self-improvement: intelligence amplification

2. Strategic: planning, forecasting, prioritizing

1“The energy produced by the breaking down of the atom is a very poor kind of thing. Any one who expects a source ofpower from the transformation of these atoms is talking moonshine.”[92]

11

3. Social: social and psychological modeling, manipulation, rhetorical persuasive ability

Some questions to explore long-term perspectives:

1. Explore the space of possible mind designs. What features are common? On what axes can theydiffer? Where do human minds sit in that space relative to apes, dolphins, current AIs, future AIs,etc.? Where in the space could safe general AIs be found? (Some candidates to examine: optimizersvs not, tending to break out of simulations/virtual worlds vs not.) A starting point is [133].

2. Bostrom’s “orthogonality thesis”[17] states that “any level of intelligence could in principle be combinedwith more or less any goal,” but what kinds of general intelligences are plausible? Should we expectsome correlation between level of intelligence and goals in de-novo AI? How true is this in humans,and in whole brain emulations?

3. What “instrumental goals” might a self-improving AI generically evolve? Omohundro[85] has arguedthat to improve its ability to attain its goals, it generically seeks capability enhancement (betterhardware, better software and a better world model), and that the goal of better hardware genericallyleads to a goal of self-preservation and unlimited resource acquisition, which can lead to unwanted sideeffects for humans.

4. How generic is the instrumental goal of resource acquisition? Is it true of most final goals that anoptimizer with that final goal will want to control a spatial region whose radius increases linearly withtime? In what sense of “most”, if any, is this true?

3.2 Verification

Reprising the themes of short-term research, research enabling verifiable low-level software and hardwarecan eliminate large classes of bugs and problems in general AI systems; if the systems become increasinglypowerful and safety-critical, verifiable safety properties will become increasingly valuable. If the theory ofextending verifiable properties from components to entire systems is well understood, then even very largesystems can enjoy certain kinds of safety guarantees, potentially aided by techniques designed explicitly tohandle learning agents and high-level properties. Theoretical research, especially if it is done explicitly withvery general and capable AI systems in mind, could be particularly useful.

A related verification research topic that is distinctive to long-term concerns is the verifiability of sys-tems that modify, extend, or improve themselves, possibly many times in succession [41, 126]. Attempting tostraightforwardly apply formal verification tools to this more general setting presents new difficulties, includ-ing the challenge that a formal system that is sufficiently powerful cannot use formal methods in the obviousway to gain assurance about the accuracy of functionally similar formal systems, on pain of inconsistencyvia Godel’s incompleteness [37, 129]. It is not yet clear whether or how this problem can be overcome, orwhether similar problems will arise with other verification methods of similar strength.

Finally, it is often difficult to actually apply formal verification techniques to physical systems, especiallysystems that have not been designed with verification in mind. This motivates research pursuing a generaltheory that links functional specification to physical states of affairs. This type of theory would allow useof formal tools to anticipate and control behaviors of systems that approximate rational agents, alternatedesigns such as satisficing agents, and systems that cannot be easily described in the standard agent formalism(powerful prediction systems, theorem-provers, limited-purpose science or engineering systems, etc.). It mayalso be that such a theory could allow rigorously demonstrating that systems are constrained from takingcertain kinds of actions or performing certain kinds of reasoning (see Section 3.5.1 for examples).

3.3 Validity

As in the short-term research priorities, validity is concerned with undesirable behaviors that can arise despitea system’s formal correctness. In the long term, AI systems might become more powerful and autonomous,in which case failures of validity could carry correspondingly higher costs.

12

Reliable generalization of concepts, an area we highlighted for short-term validity research, will also beimportant for long-term safety. To maximize the long-term value of this work, concept learning researchmight focus on the types of unexpected generalization that would be most problematic for very general andcapable AI systems. In particular, it might aim to understand theoretically and practically how learnedrepresentations of high-level human concepts could be expected to generalize (or fail to) in radically newcontexts [123]. Additionally, if some concepts could be learned reliably, it might be possible to use them todefine tasks and constraints that minimize the chances of unintended consequences even when autonomousAI systems become very general and capable. Little work has been done on this topic, which suggests thatboth theoretical and experimental research may be useful.

Mathematical tools such as formal logic, probability, and decision theory have yielded significant insightinto the foundations of reasoning and decision-making. However, there are still many open problems inthe foundations of reasoning and decision. Designing a powerful AI system without having a thoroughunderstanding of these issues might increase the risk of unintended consequences, both by foregoing toolsthat could have been used to increase the system’s reliability, and by risking the collapse of shaky foundations.Example research topics in this area include reasoning and decision under bounded computational resourcesa la Horvitz and Russell [59, 100], how to take into account correlations between AI systems’ behaviorsand those of their environments or of other agents [124, 64, 58, 46, 115],how agents that are embedded intheir environments should reason [110, 87], and how to reason about uncertainty over logical consequencesof beliefs or other deterministic computations [114, 93]. These topics may benefit from being consideredtogether, since they appear deeply linked [47, 48].

In the long term, it is plausible that we will want to make agents that act autonomously and powerfullyacross many domains. Explicitly specifying our preferences in broad domains in the style of near-futuremachine ethics may not be practical, making “aligning” the values of powerful AI systems with our own valuesand preferences difficult [111, 113]. Consider, for instance, the difficulty of creating a utility function thatencompasses an entire body of law; even a literal rendition of the law is far beyond our current capabilities,and would be highly unsatisfactory in practice (since law is written assuming that it will be interpretedand applied in a flexible, case-by-case way). Reinforcement learning raises its own problems: when systemsbecome very capable and general, then an effect similar to Goodhart’s Law is likely to occur, in whichsophisticated agents attempt to manipulate or directly control their reward signals [16]. This motivatesresearch areas that could improve our ability to engineer systems that can learn or acquire values at run-time. For example, inverse reinforcement learning may offer a viable approach, in which a system infersthe preferences of another actor, assumed to be a reinforcement learner itself [101, 83]. Other approachescould use different assumptions about underlying cognitive models of the actor whose preferences are beinglearned (preference learning, [26]), or could be explicitly inspired by the way humans acquire ethical values.As systems become more capable, more epistemically difficult methods could become viable, suggesting thatresearch on such methods could be useful; for example, Bostrom [16] reviews preliminary work on a varietyof methods for specifying goals indirectly.

3.3.1 Ethics

As AI develops to human-level intelligence and beyond, it may be necessary to create agents that obey someformulation of ethics[73]. There are several fundamental questions on which researchers’ assumptions differ;some are:

1. Should ethics be thought of as a constraint on the agent’s actions, or as a subgoal, or as the main goalcontent? The former approach seems appropriate for near-term applications such as self-driving cars,but in a more powerful AI system it may not be sufficiently robust; also it can be argued that workingunder the latter assumption is more likely to expose our ethics models to feedback and improvement,which may lead to better outcomes in the long run.

Intuitions about this are likely linked to differing intuitions about whether there is a difference in kindbetween ethical values (such as fairness, compassion, generosity, mercy, etc) and other typical valuesheld by humans (such as knowledge, beauty, fun, courage, loyalty, chocolate, etc).

13

2. Should ethics be formulated directly[128, p. 83-86], or should it be learned from human behaviors,brains, writings, etc (”indirectly”)[128, p. 108-111]? Again, the former approach is sufficient forself-driving cars, but seems fairly difficult to pin down to the degree that would be required for asuperintelligence acting freely in the world. In the long run, such a superintelligence would need tobe sophisticated enough to come to ”correct”2 conclusions (or meticulous indifference of a kind thatleaves control to us) about the implications of such things as the creation of other possibly-sentientAIs, brain emulations, possible collective consciousnesses, etc., as well as more everyday situations.

The learning-based approach has the drawback of not necessarily being transparent to human inspec-tion, of easily misconstruing context, and of potentially overfitting; on the other hand it seems easierto reach an objective result over limited domains. Hybrid approaches have also been suggested [128,p. 117-124], but there are a number of open questions in that area.

Whichever way is chosen, it would be very valuable to find effective ways to validate ethical systems beforedeveloping superintelligence. Two apparent paths to doing this are to inspect the content manually, andto try them out in various (likely simulated) settings, evaluating them both subjectively and through eachother’s lenses.

One significant challenge with testing is that many values (e.g., courage, love) are themselves rathercomplex features of the world, and so they might be difficult to capture in a simulated context withoutlosing much of the essential complexity. Testing in non-simulated contexts would be significantly slower andmore limited in terms of variety, reducing the quality of feedback. Furthermore, since superintelligent AIswill have more actions and plans available to them than the AIs we’d use for testing ethical systems, theethical systems would have to generalize well, in a way that we could not test in reality. This suggests thatthe best option may be to create complex simulated environments with ethical complexity comparable toreality, in which we could set up interesting scenarios to test generalizability of ethical systems. In order todo this, successfully and ethically, we would need to find a way to replicate the true complexity of our values(which is a fairly difficult task in itself) while minimizing ethically meaningful harm to simulated entities (ifit is determined that they hold moral standing).

1. Although it has been frequently argued that the AI goals should reflect “human values”, which partic-ular values should be preserved given that there is a broad spectrum of inconsistent views across theglobe about what these values should be? Who should get to decide that and when? For one example ofsuch challenges, see for example the infinite ethics[14] framework of Arrhenius[8]. For another exampleof existing work here, Anderson[4] suggests a method for learning how to rank conflicting ethical rules.

2. How are human ethical principles best codified in a way that makes sense to a machine? This is thefocus of the nascent field of Computational Ethics (a.k.a. Machine Ethics) [73], which includes manyquestions of a technical nature. For example:

(a) What are the best knowledge representation and model representation systems for dynamicalmodeling of ethical systems?

(b) How can different ethical systems be compared analytically?

(c) How could we best build systems that learn ethical content from humans?

(d) How can we best estimate the loss implications of errors in learned ethical content?

3. Both across ethical systems and within a given ethical system, conflicting objectives must often be con-sidered in tandem. Bostrom[15] has suggested a parliamentary voting model of subagents representingtheir respective subobjectives. Multi-agent political dynamics may be undesirable however, leading oneto consider optimization over multiple objectives that are not each represented by subagents. Duringsuch a multiobjective optimization over subgoals and/or subethics, we don’t want to be susceptible tofollowing some edge path that seems great to one or a small number of subobjectives that are either

2One way to operationalize this is described by Muehlhauser[80].

14

very disliked or not well understood by most subobjectives (as that likely indicates a perverse instan-tiation). This preference can be equivalent to having some inherent preference for the centroid of theobjective space; investigation of both modification of standard multiobjective optimization as well asadding in meta-objectives as subobjectives[18, p. 440] would be in order.

The value learning problem is discussed further in [112, 123].

3.3.2 Ensuring goal stability

Once desirable goals have been successfully loaded into an AI, the key question is whether they will beretained if the AI self-improves. A prerequisite for the “friendly AI” vision [136] is that a set of “friendly”goals must remain stable throughout the self-improvement process. In other words, the AI must strive notonly to improve its capability of achieving its current goals, but also to ensure that it will retain these goalseven after it has become more capable. This sounds quite plausible: after all, would you choose to get anIQ-boosting brain implant if you knew that it would make you want to kill your loved ones? But is it reallytrue in general? If not, can AIs be designed for which it is true, at least under some plausible circumstances?Such questions suggest a host of research topics - here are some examples:

1. Self-trusting agents: can we construct goal-driven agents that obey some formalization of correct rea-soning (eg first-order logic, or Bayesian probability) and have access to actions that modify themselves(and/or defer taking final action to a possibly-modified copy of themselves), that are able to makecorrect decisions about these actions without falling into the so-called Lobian obstacle or the procras-tination paradox? (It’s not necessary for these agents to be practical; an equivalent of AIXI[62] wouldbe enlightening too.) A more thorough introduction to the problem was recently written by Fallensteinand Soares.[37]

2. Generally, how can we structure an autonomous goal-oriented agent so that we can be sure it won’tintentionally self-modify to change its goals, or create more powerful agents with different goals? Arethere other sorts of replication-capable AI for which this might be answerable?

3. Can any useful evidence for or against this goal-retention hypothesis be found by studying humans?For example, is there evidence that humans retain their values as their cognitive abilities improvethroughout childhood and adolescence? To what degree do human values and preferences convergeupon learning new facts? To what degree has this happened in history? Almost nobody values the willof Zeus anymore, presumably because of learning about Zeus’ non-existence, but do such examples tellus much of relevance to AIs? For philosophical analyses of the issue, see e.g. [117].

4. “Ontological crisis”: if an agent’s preferences are based on a model of the world which turns out to notbe fundamental, it must then extrapolate/redefine them somehow. How is this best done, and can thisalways be done in a satisfactory way? For example, suppose we program a friendly AI to maximize thenumber of humans whose souls go to heaven in the afterlife. First it tries things like increasing people’scompassion and church attendance. But suppose it then attains a complete scientific understanding ofhumans and human consciousness, and discovers that there is no such thing as a soul. Now what? Inthe same way, it is possible that any other goal we give it based on our current understanding of theworld (“maximize the meaningfulness of human life”, say) may eventually be discovered by the AI tobe undefined. Can goals safely be articulated in terms of people rather than arrangements of atoms,or about a classical universe rather than a quantum, simulated, or other mathematical one?

This problem, and other related problems, are discussed in a recent paper by Soares.[110]

5. Decision theory: most work on automated planning and causality assumes a Causal Decision Theory.However, Causal Decision Theory is not stable under reflection in general. What is the right reflectivelystable decision theory?

This problem is discussed in a recent paper by Soares and Fallenstein.[115]

15

(a) Does a good decision theory require a theory of logical counterfactuals, and if so, what’s a goodtheory of logical counterfactuals?

(b) Does a good decision theory shed light on multiagent coordination problems? (There is somereason to think so.) On ontological crises? On naturalized induction?

6. Ensemble stability problem. Suppose an agent makes decisions using some sort of multiobjectiveoptimization over different goals and ethical systems. Do some ways of doing this guarantee that thesame objectives will be respected in any successors?

3.4 Security

It is unclear whether long-term progress in AI will make the overall problem of security easier or harder; onone hand, systems will become increasingly complex in construction and behavior and AI-based cyberattacksmay be extremely effective, while on the other hand, the use of AI and machine learning techniques alongwith significant progress in low-level system reliability may render hardened systems much less vulnerablethan today’s. From a cryptographic perspective, it appears that this conflict favors defenders over attackers;this may be a reason to pursue effective defense research wholeheartedly.

Although the research topics described in 2.3.3 may become increasingly important in the long term, verygeneral and capable systems will pose distinctive security problems. In particular, if the problems of validityand control are not solved, it may be useful to create “containers” for AI systems that could have undesirablebehaviors and consequences in less controlled environments.[134] Both theoretical and practical sides of thisquestion warrant investigation. If the general case of AI containment turns out to be prohibitively difficult,then it may be that designing an AI system and a container in parallel is more successful, allowing theweaknesses and strengths of the design to inform the containment strategy [16]. The design of anomalydetection systems and automated exploit-checkers could be of significant help. Overall, it seems reasonableto expect this additional perspective – defending against attacks from “within” a system as well as fromexternal actors – will raise interesting and profitable questions in the field of computer security.

Below we explore some types of containment that could help limit the potential for the AI to makeand execute plans with arbitrarily large effect on the world. It’s worth bearing in mind that even if thecontrol and validity problems seemed to have been adequately addressed, it would still be appropriate to useadditional safeguards while studying and testing any potential superintelligence - and that it would help forsuch safeguards to be as adaptable and painless as possible.

The task of designing appropriate containment is challenging for numerous reasons, not only due to thedifficulty of reliably avoiding security vulnerabilities. In the worst case, every aspect of the AI’s executionmay be superintelligently designed to further a set of goals alien to ours. With a superintelligent AI in acontainer, we’d need to worry not only about straightforward attempts at disabling safeguards or makingself-copies, but also about hardware exploits and more insidious attempts to gain trust or influence.

Some of the attributes that may be desirable or necessary are:

1. Containment: it should prevent a contained superintelligent AI from having arbitrary effects on theworld. In particular, it should be verifiably free of vulnerabilities itself.

2. Robustness: it should be difficult to unintentionally render ineffective.

3. Uptake: it should be a system that AI builders want to use, and avoid being one that they want tonot use. Some possible aspects of this include:

(a) Completeness: it should allow running the same experiments that AI builders can do without thecontainment

(b) Flexibility: it should be adaptable enough to different hardware setups to be used with differentarchitectures, and especially on distributed hardware

(c) Efficiency: it should not overly reduce runtime efficiency, memory efficiency, or research/developmentefficiency

16

(d) Ease of use: it should provide as familiar an environment as possible; for example a UNIX shelland other familiar tools.

4. Inspectability: it should allow detailed debugging and inspection of the contained AI. See 3.4.5. Thiscould contribute to uptake if it provides better inspection capabilities than AI builders typically have(for instance, debugging distributed software is typically awkward in the current state of affairs).

3.4.1 Software containment

This research is about how we might develop, test and share software that can reliably “box” and monitorAI.

1. What is the space of possible boxing architectures worth studying, based on input from researchers inindustry, academia, etc.? Dimensions include:

(a) At what level of abstraction is it implemented? As a virtual machine? As an operating system?

(b) What kinds of isolation guarantees does it provide?

i. Limited input/output channels? (This is relevant to preventing both typical computer hackingand psychological escapes.)

ii. Limited side effects on hardware? (Perhaps using some variant of homomorphic encryptionto make the physical effect of executing code on the CPU unpredictable/unexploitable?)

(c) What forms of logging and monitoring should be supported?

(d) What types of automated tripwires and associated countermeasures should be supported?

(e) How can scalability and parallelism be provided without risk of accidentally connecting it to theinternet?

2. Once a desirable boxing architecture has been identified, implement and share it on an open-sourcebasis. A key metric of success here is actual adoption by leading general AI researchers, so their inputinto the architecture selection will be crucial.

3. Attempt to prove that the software (or at least key components) will behave as intended.Use a “red team” to attempt to find security flaws in proposed and currently existing virtualizationsoftware, perhaps incentivized by prizes. Multiple levels of virtualization should be considered (andperhaps used in parallel), such as:

(a) Operating-system level virtualization (such as VirtualBox, VMWare, KVM)

(b) Programming-language level containment (such as Java sandbox)

(c) Static analysis (eg to check that a program won’t use networking APIs or corrupt its memory)

4. Investigate programs likely to be used for viewing output and logs from an AI project. Determine whichones are and aren’t secure enough to use safely in the context of a boxed AI, and address weaknessesin them. For example:

(a) How safe are terminal emulators?

(b) Can an AI censor information from its log files by mixing in things like Unicode control charactersor xterm escape sequences?

(c) Under what circumstances is it safe to copy-paste things from an AI’s output?

(d) What file formats should and shouldn’t the AI be allowed to use?

5. Develop a debugger (perhaps by modifying an existing one such as gdb) that’s able to connect to avirtual machine and debug something inside it, without being exploitable by the thing inside.

17

6. Automate checking for the presence of known problems (ie, check version numbers against a databaseand keep the database up to date), from both inside and outside of a box. Design it to be worked intoan AI project as an automated check.

7. Can we put a general AI in a virtual environment that doesn’t reveal that it’s a virtual environment?What sort of content would be safe to give it?

3.4.2 Psychological containment

This research aims to study and reduce the risk that human users are manipulated into enabling a boxedsuperintelligent AI to “escape”. Note that escape here need not be by disabling safeguards; the AI coulddeliver information to its recipient in a way that contributes to other complex plans to gain influence, trust,more copies, etc.

1. Replicate the “AI box experiment” where a human pretending to be an AI tries to persuade a humancontroller to enable a breakout; document strategies, counterstrategies, and outcomes.

(a) How can individual AI researchers best be trained to resist being manipulated by an AI?

2. What are the most effective social control mechanisms for preventing individuals from enabling AIbreakout? What can be learned from past research on preventing unauthorized use of nuclear weaponsand other powerful technologies? Such examples include:

(a) Psychological screening of operators

(b) Surveillance of operators. For example, add an option to sshd which records keystrokes and outputin a place the logged-in user cannot access

(c) “False alarm” tests where the operator is led to believe that a critical situation has occurred andone can see what decisions they make under pressure

(d) Systems requiring more than one authorized human for making certain critical decisions. Whatsoftware tools should be developed (and perhaps shared on a public-domain basis)? For example,how would one best make a double-entry / two-person ssh interface where both entrants areverifiably human?

3. Scenario analysis: What are some concrete AI paths to influence over world affairs?

4. What are the optimization power gains from mere content? What have people figured out withoutoriginal theoretical advances or new experiments, but just by reading lots of known facts and puttingtogether the pieces in a way that nobody had before? Relevant for deciding whether containmentshould involve restricting the AI’s knowledge base.

3.4.3 Hardware containment

Study ways in which a general AI might break out from a computer (or computer cluster) disconnected fromthe outside world, and investigate corresponding countermeasures.

1. First challenge: Consider only currently known technology, for example communication with theoutside world via radio waves, vibrations, minute A/C power modulations, etc. that somehow interactwith the computer in a software-controllable/software detectable way. See the literature on side-channelattacks.

2. Second challenge: Consider advanced and currently unknown technology that the general AI couldpotentially invent, limited only by laws of physics we consider truly fundamental.

Both 1 and 2 include the challenge of how to safely dispose of boxed AI hardware after shutdown.

18

3.4.4 Tripwires: Detection & Response

This research aims to study ways of detecting an imminent or in-progress AI box breakout, and appropriatecountermeasures to take when pre-set alarms are triggered.

1. What are appropriate tripwires for detecting an imminent or in-progress AI box breakout? Examplesinclude:

(a) Low-level triggers such as unexpected use of memory, bandwidth or communication channels

(b) Triggers related to the AI modifying its own software beyond certain limits

(c) Triggers related to some sort of high-level ability metric. Can any improved measures of substrate-independent general intelligence be helpful in this regard, extending past work by Legg, Yud-kowsky, Goertzel, Hernandez-Orallo & Dowe, etc.?

2. How are such tripwires best implemented without causing performance degradation in terms of speed,memory-usage, etc. that may limit their adoption? To maximize adoption, software implementingsuch tripwires in the above-mentioned boxing software should be developed and shared, optionally onan open-source basis.

3. Study the risk associated with an AI having access to the source code for its boxing and tripwiresoftware (or itself) and thereby discovering ways to circumvent it.

4. What are appropriate countermeasures for various types of tripwire triggers?

(a) At the machine level, what should happen automatically for safe shutdown? After shutdown?What standard interfaces for tripwires could be developed for automatically shutting down all ofa distributed system, including triggers and/or processes inside virtual machines?

(b) What is an appropriate emergency response plan if general AI appears imminent or has come intoexistence? At the organization level? At the national/societal level? When is it worth developingsuch plans?

3.4.5 Detecting intent to deceive

Suppose we construct a superintelligent AI system that has goals alien to ours. Correctly suspecting it ofbeing potentially harmful, we use a brilliantly-designed container mechanism to make sure it can have onlyvery limited influence on the world. Of course, eventually we’d like to put this system to use. When we dothis it will be difficult for us to ensure it can’t influence the world through its actions; knowing this, the AImay deceive us to gain trust while inside the container. We may try to inspect it, but it will have had achance to disguise itself to look harmless.

For these reasons, trying to make use of a mature superintelligent AI, even in a well-designed container,could be very hazardous. This suggests that even if we have a container, if we don’t trust the AI’s goal systemwe must have enough insight into its cognitive processes to monitor it and catch the “treacherous turn”,i.e. the first moment when it comes up with the idea of deceiving us. Transparency/inspectability is thusimportant, perhaps even more so than containment per se. For example, it would be useful to investigatehow amenable different AI architectures would be to having their beliefs and goals read from the outside.

3.5 Control

It has been argued that very general and capable AI systems operating autonomously to accomplish sometask will often be subject to effects that increase the difficulty of maintaining meaningful human control [86,17, 16, 107]. Research on systems that are not subject to these effects, minimize their impact, or allow forreliable human control could be valuable in preventing undesired consequences, as could work on reliableand secure test-beds for AI systems at a variety of capability levels.

19

3.5.1 Corrigibility and Domesticity

If an AI system is selecting the actions that best allow it to complete a given task, then avoiding conditionsthat prevent the system from continuing to pursue the task is a natural subgoal [86, 17] (and conversely,seeking unconstrained situations is sometimes a useful heuristic [132]). This could become problematic,however, if we wish to repurpose the system, to deactivate it, or to significantly alter its decision-makingprocess; such a system would rationally avoid these changes. Systems that do not exhibit these behaviorshave been termed corrigible systems [116], and both theoretical and practical work in this area appearstractable and useful. For example, it may be possible to design utility functions or decision processes so thata system will not try to avoid being shut down or repurposed [116], and theoretical frameworks could bedeveloped to better understand the space of potential systems that avoid undesirable behaviors [55, 57, 56].

It has been argued that another natural subgoal is the acquisition of fungible resources of a variety ofkinds: for example, information about the environment, safety from disruption, and improved freedom ofaction are all instrumentally useful for many tasks [86, 17]. Hammond [49] gives the label stabilization tothe more general set of cases where “due to the action of the agent, the environment comes to be betterfitted to the agent as time goes on”. This type of subgoal could lead to undesired consequences, and abetter understanding of the conditions under which resource acquisition or radical stabilization is an optimalstrategy (or likely to be selected by a given system) would be useful in mitigating its effects. Potential researchtopics in this area include “domestic” goals that demand actions/plans whose consequences are limited inscope in some way [16], the effects of large temporal discount rates on resource acquisition strategies, andexperimental investigation of simple systems that display these subgoals.

Finally, research on the possibility of superintelligent machines or rapid, sustained self-improvement(“intelligence explosion”) has been highlighted by past and current projects on the future of AI as potentiallyvaluable to the project of maintaining reliable control in the long term. The AAAI 2008–09 PresidentialPanel on Long-Term AI Futures’ “Subgroup on Pace, Concerns, and Control” stated that

There was overall skepticism about the prospect of an intelligence explosion... Nevertheless, therewas a shared sense that additional research would be valuable on methods for understanding andverifying the range of behaviors of complex computational systems to minimize unexpected out-comes. Some panelists recommended that more research needs to be done to better define “intel-ligence explosion,” and also to better formulate different classes of such accelerating intelligences.Technical work would likely lead to enhanced understanding of the likelihood of such phenomena,and the nature, risks, and overall outcomes associated with different conceived variants [61].

Stanford’s One-Hundred Year Study of Artificial Intelligence includes “Loss of Control of AI systems” as anarea of study, specifically highlighting concerns over the possibility that

...we could one day lose control of AI systems via the rise of superintelligences that do not actin accordance with human wishes – and that such powerful systems would threaten humanity.Are such dystopic outcomes possible? If so, how might these situations arise? ...What kind ofinvestments in research should be made to better understand and to address the possibility ofthe rise of a dangerous superintelligence or the occurrence of an “intelligence explosion”? [60]

Research in this area could include any of the long-term research priorities listed above, as well as theoreticaland forecasting work on intelligence explosion and superintelligence [25, 16], and could extend or critiqueexisting approaches begun by groups such as the Machine Intelligence Research Institute [113].

Research questions:

1. Can high Bayesian uncertainty and agent respect for the unknown act as an effective safety mechanism?(See [32, 108])

2. Investigate steep temporal discounting as an incentives control method for an untrusted general AI.

It has been argued [135] that the nature of the general AI control problem undergoes an essential shift,which we can refer to as the “context change”, when transitioning from subhuman to superhuman general

20

AI. This suggests that rather than judging potential solutions to the control problem using only experimentalresults, it’s essential to build compelling deductive arguments that generalize and are falsifiable, and onlywhen these arguments are available does it make sense to try to test potential solutions via experiment.

3.5.2 Safe and Unsafe Agent Architectures

Predicting the exact behavior of complex software is notoriously difficult and has been shown to be gener-ically impossible with less computational cost than simply running it. The goal of AI safety research istherefore more modest: to show that the behavior, although not exactly predictable, will have certain de-sired properties, for example keeping certain behavioral parameters within certain bounds.

Rational agents are often composed of distinct modules (e.g. sensors, actuators, a performance element,a learning element, a problem generator, a critic, etc.), each with limited abilities, with some network ofinformation flows between modules.[102] Within this framework, it would be valuable to provide guaranteesthat various modules would be safe or unsafe (individually or in combination).

A related approach is to not build an agent at all, but rather some sort of non-agent “Tool AI”. Sometypes of this are:

1. “Oracle AI”: an AI system designed to merely answer questions about the world as accurately aspossible. (Though people sometimes also call agent AI with a goal of accurately answering questionsabout the world “Oracle AI”.)

2. “Virtual AI”: an agent that interacts only with an abstract world, but has no way of determining this,and hence is not an agent in the physical world.

Although a common assumption is that both 1 and 2 are “safe” by remaining “Oracle AI”/“Tool AI”, thishas not been substantiated. For example, an Oracle AI that becomes superintelligent via self-improvementis likely to evolve a knowledge-acquisition goal, which might lead it to modify its answers to manipulate itsuser to perform certain experiments.

Many of the above-mentioned safety issues are related to the issue of goals that the rational agentmay have. This question provides an important link between architectures and goals: how amenable aredifferent AI architectures to having their goals and beliefs read from the outside in a fashion useful for safetydetermination and monitoring?

Research questions on properties of architectures under self-modification:

1. Can certain interesting classes of agents be proven to exhibit behavior converging towards recursivelystable fixed points or limit cycles?

2. Do certain kinds of non-optimizer agents become optimizers, and if so, how quickly?

3. If so, how strong is the ‘optimizer’ stable attractor?

4. Are there other stable attractors?

5. Are tool-like or Oracle-like things stable attractors?

Research questions about specific architectures:

1. Model and bug-trace the variety of different scenarios of failure modes and dangerous behaviors intrinsicto different existing real-world general AI architectures such as OpenCog and Sigma.

2. Analyze how deep learning discontinuity/instability[121] can affect deep reinforcement learning ar-chitectures. What new classes of risks in agent behavior can result? How easily can these risks bemitigated? Evaluate compensatory safety mechanisms whereby an agent requires at least two distinctperspectives on a situation or subject of analysis before characterizing it.

3. Explore how the field of mechanism design can be applied to controlling neuromorphic AIs and otherarchitectures.

21

4. How well does an AI system’s transparency to human inspection scale, using different kinds of archi-tectures and methods?[76].

22

4 Forecasting

4.1 Motivation

1. Conduct a broad survey of past and current civilizational competence. In what ways, and under whatconditions, do human civilizations show competence vs. incompetence? Which kinds of problems dothey handle well or poorly? Similar in scope and ambition to, say, Perrow’s Normal Accidents[90] andSagan’s The Limits of Safety [104]. The aim is to get some insight into the likelihood of our civilizationhandling various aspects of the superintelligence challenge well or poorly. Some initial findings werepublished on the MIRI blog.[79, 74]

2. Did most early AI scientists really think AI was right around the corner, or was it just a few people?The earliest survey available (Michie 1973[71]) suggests it may have been just a few people. For thosethat thought AI was right around the corner, how much did they think about the safety and ethicalchallenges? If they thought and talked about it substantially, why was there so little published onthe subject? If they really didn’t think much about it, what does that imply about how seriously AIscientists will treat the safety and ethical challenges of AI in the future?

4.2 Methodology

There are many interrelated variables that are relevant to arriving at a good understanding of the future ofAI, in particular the path towards general AI; and there are a number of different ways to produce forecastsof these variables, with varying degrees of accuracy, credibility, feasibility, and informativeness. Possiblemethods include:

1. expert surveys

2. prediction markets

3. systematic group forecasting methods, such as the Delphi method

4. building complex models

5. extrapolation from historic trends

6. analogy with other historical developments

7. combining forecasts from differing methods or forecasts of related variables

Existing projects that investigate how to best forecast future sci/tech and other developments include theseIARPA programs:

1. ACE (Aggregative Contingent Estimation), which investigates ways to improve and combine the judg-ments of analysts to produce accurate forecasts. The ACE program has been running a team predictiontournament, which has consistently been won by the Good Judgment Project , which has produced sev-eral insightful publications.

2. ForeST (Forecasting Science & Technology), which investigates ways to get and maintain high-qualityforecasts of sci/tech milestones, and funds SciCast , the world’s largest sci/tech forecasting tournament.

These projects are providing us with valuable information on how best to make short-term forecasts. Itwould be interesting to run a similar tournament with 5-year and 10-year time horizons for predictions andsee if there are significant differences; are the chief determinants of predictive success the same? What kindof process should we trust to give us the best predictions? Besides the question of how to train analystsand combine their estimates, there is also the question of what modelling methodologies, used by whom,yield the best long-term forecasts. The Tauri Group is a think tank that conducted a study[44] for the

23

http://www.iarpa.gov/index.php/research-programs/ace

http://www.goodjudgmentproject.com/

http://www.iarpa.gov/index.php/research-programs/forest

https://scicast.org/

Department of Defense reviewing over 1000 technological forecasts and statistically analyzing accuracy bymethodology, by source, and by time frame. It would be informative to have a similar analysis of long-term technological predictions from other sources, such as (1) The Futurist and World Future Review,(2) Technological Forecasting and Social Change, (3) Foresight and International Journal of Forecasting,(4) Journal of Forecasting, (5) publications of the Hudson Institute, (6) publications of the Institute forthe Future, (7) publications of the Club of Rome, (8) Journal of Future Studies, (9) Ray Kurzweil (morethorough than section 5.4 of (Armstrong et al, 2014)[7]), (10) Alvin Toffler, (11) John Naisbitt, (12) theState of the World reports by the Worldwatch Institute.

4.3 Forecasting AI progress

In order to get a good understanding of what the path to general AI might look like, there are many kindsof interacting variables that would be worth forecasting:

1. resources (researchers and funding) going into AI innovation in general, or within AI subfields

2. resources going into AI areas of application, such as robotics or sensory technologies

3. related fields which may contribute ideas, such as neuroscience

4. shifts in the set of organizations/people performing AI research among:

(a) countries

(b) academia vs industry vs other government (eg military)

Other related questions that may merit detailed study include:

1. in terms of technologies:

(a) What types of AI (in terms of architecture, subfield, application, etc) are most likely to contributeto reaching general AI? What AI capabilities would be necessary or sufficient, individually orcollectively?

(b) Nick Bostrom brings up in Superintelligence that brain emulation technology is unlikely to arrivemuch sooner than human-level neuromorphic AI, because techniques and knowledge from theformer can likely be repurposed for the latter. Are there other foreseeable situations where twodisparate fields or research programs may be closely related, with success on one implying greatprogress on the other?

i. Does causal entropy[132] constitute a promising shared avenue of progress in AI and nanotech?

2. in terms of scenarios:

(a) What kinds of scenarios would increase or decrease researcher inclination to work on AI or generalAI research? (For example, changing ideologies or public opinion, association of the field withideas held in low regard, . . . ) Can we forecast this?

(b) How scalable is innovative project secrecy? Examine past cases: Manhattan project, Bletchleypark, Bitcoin, Anonymous, Stuxnet, Skunk Works, Phantom Works, Google X. Could there belarge projects we don’t know about? How will this change in coming decades?

(c) What is the world’s distribution of computation, and what are the trends? (Some initial resultshere.[75])

(d) Supposing enough technical innovations are in place to build general AI, how large of a projectwill implementation be? How much of the work to reach general AI is scientific advancement andtechnical innovation vs engineering and implementation?

24

3. in terms of public response:

(a) How will governments respond?

i. What conditions would make bans or nationalization likely? (Consider historical exampleshere.) What would be the consequences?

ii. Examine international collaboration on major innovative technology. How often does it hap-pen? What blocks it from happening more? What are the necessary conditions? Examples:Concord jet, LHC, international space station, etc. What conditions would make internationalcollaboration on AI safety issues likely?

iii. What kinds of policies are likely to be implemented, with what effect?

• What happens when governments ban or restrict certain kinds of technological develop-ment? What happens when a certain kind of technological development is banned orrestricted in one country but not in other countries where technological development seesheavy investment?

• What kinds of innovative technology projects do governments monitor, shut down, ornationalize? How likely are major governments to monitor, shut down, or nationalizeserious general AI projects?

(b) How will the public respond? What sorts of technological innovations tend to cause public panicor outrage, under what conditions?

(c) What sorts of developments would cause governments or the public to consider AI safety to be aserious issue? How did public perception respond to previous AI milestones? How will the publicreact to self-driving taxis?

(d) How much warning will we have before we reach general AI? What kinds of future developmentswould serve as advance signposts indicating the kind of scenario we’re likely to see?

4. in terms of rates of progress:

(a) How quickly will computer hardware performance be improving (on various metrics)?

i. Improved performance enables more AI approaches to be feasible and makes experimentsmore productive. Will hardware advances also contribute to researcher productivity in otherways?

ii. Will departures from the von Neumann architecture contribute significantly to some typesof AI development? (For example quantum computers, or computers inspired by cellularautomata.)

(b) How quickly does performance of algorithms tend to advance over time? (Grace, 2013)[42] findsthat (in six areas) algorithmic progress is nearly as significant as hardware progress; but furtheranalysis of this question with a view toward economic or productivity impact would be worthwhile.

(c) Researcher performance is affected by improvements in algorithmic and hardware performancewhich make experiments more productive. What other factors will affect researcher performance?Some candidates:

i. changing ways to do software development

ii. changing ways to collaborate

iii. changing ways to publish or present results

iv. improved computer interfaces (such as brain-computer interfaces or virtual reality)

v. genetic enhancement technology

(d) Related to the previous question: what AI subfields will benefit particularly from hardware per-formance improvements?

25

4.4 Forecasting AI takeoff

If we develop AI that’s advanced enough to do AI research and development, we may enter the era that I. J.Good dubbed the “Intelligence Explosion”, in which the growth in AI capability is driven primarily by theAI itself. We will refer to the transition of AI to a superintelligent level as takeoff. How (and how quickly)this would unfold is important, but difficult to predict. Of course, some of the usual forecasting methodsare applicable, in particular those that rely on expert judgment. (A survey of timelines and impacts forhumanity (with n=170) is here[81].) Here are some more approaches toward understanding aspects of theintelligence explosion:

1. Compare with earlier takeoff-like scenarios. Some candidates[51, 50]:

(a) development of proto-human brains

(b) agricultural revolution

(c) industrial revolution

2. Besides software improvements, AI self-improvement may occur via engaging in commercial activity,via expropriating computing resources, via manufacturing computing resources, or in other ways.Enumerate the specific technologies required for each pathway and forecast their development.

3. Assess how best to model the amount of self-improvement that could be accomplished with varyingamounts of (i) intelligence, (ii) parallelism / parallel computations, (iii) serial depth of computation;what evidence is available? [137]

(a) How do different areas of knowledge respond to being given more serial research time vs morepeople vs other inputs?

(b) One type of cognitive work that has been recorded for millennia is the progress of mathematics,in particular the resolution of conjectures. Some data analysis[43] suggests these times follow anexponential distribution with halflife 100 years. Could further analysis help us understand thebenefits of serial vs parallel intellectual work in this domain?

It’s possible that there will be multiple AI projects undergoing takeoff at the same time; this has beencalled a multipolar takeoff. Multipolar takeoff is more likely if (i) takeoff is slow, (ii) more of the necessaryinnovations and/or tools are shared, and (iii) implementation doesn’t require any non-commodity resources,such as access to specialized hardware. It’s been proposed[6] that multipolar scenarios might carry a higherrisk of accidents than unipolar ones because no party wishes to lose the competition. It’s also been pro-posed[29] that multipolar scenarios might be safer, because there might be a “balance of power” enablingcooperation and mutual scrutiny.

1. What kind of multipolar scenarios may occur? What would be the consequences?

2. What kinds of multipolar scenarios would collapse into unipolar ones, or vice versa?

4.5 Brain emulations (uploads)

1. Can we get whole brain emulation without producing neuromorphic general AI slightly earlier or shortlyafterward? See section 3.2 of [35].

2. Is the first functional whole brain emulation likely to be (1) an emulation of low-level functionalitythat doesn’t require much understanding of human cognitive neuroscience at the computational level,as described in [105], or is it more likely to be (2) an emulation that makes heavy use of advancedhuman cognitive neuroscience, as described eg by Hayworth[77], or is it likely to be (3) something else?

3. Investigate the feasibility of creating safe general-purpose superintelligences by modifying brain emu-lations, based on currently known cognitive neuroscience.

26

5 Policy and Collaboration

For any powerful new technology, appropriate policies can ensure that humanity can enjoy the benefits whilerisks are minimized. Both nuclear technology and biotechnology have thus far avoided global-scale disasters(global nuclear war, nuclear terrorism, engineered pandemics, etc.), at least in part thanks to helpful policies.For example, the policies developed at the 1975 Asilomar conference on Recombinant DNA have contributedto the sterling safety record of that field without stifling its progress in any significant way. In this spirit, itappears worthwhile to research the analogous question for AI: what policies would help ensure that humanityreaps the benefits of AI while avoiding potential pitfalls? Here are some more specific questions along theselines:

1. What is the space of possible AI risk reduction policies worth studying? (Dewey[32] and Sotala andYampolskiy[118] have written some analyses of possible policies/responses.) Dimensions include:

(a) Implementation level: global, national, organizational, etc.,

(b) Strictness: mandatory regulations, voluntary industry guidelines, etc.

(c) Type: Do policies/monitoring efforts focus on software, hardware, projects or individuals? Isthere some sort of tiered system of security clearances? Is some information classified? What arepossible approaches to monitoring and tracking general AI development? What kind of researchshould be funded? Are new governance structures created?

2. Which criteria should be used to determine the merits of a policy? Some candidates:

(a) verifiability of compliance

(b) enforceability

(c) ability to reduce AI risk

(d) ability to avoid stifling desirable technology development and have other negative consequences

(e) adoptability (the prospects of adoption increase when policy benefits those whose support isneeded for implementation and when its merits can be effectively explained to decision-makersand opinion leaders)

(f) ability to adapt over time to changing circumstances

To shed light on 2.d: What happens when governments ban or restrict certain kinds of technological devel-opment? What happens when a certain kind of technological development is banned or restricted in onecountry but not in other countries where technological development sees heavy investment?

Collaboration is another important topic that deserves recurring thought and discussion. To build safegeneral AI of human level and beyond, it will likely be necessary to bring together multiple research subdis-ciplines and communities and let them influence each other’s work. Some thematic questions here are:

1. What are the most important collaborations and information flows we need between different researchsubdisciplines and communities?

2. What attitudes would be most useful to foster?

3. What kind of organizations or organizational mechanisms would best enable these collaborations andinformation flows, bringing us closer to safety?

27

References

[1] Rakesh Agrawal and Ramakrishnan Srikant. “Privacy-preserving data mining”. In: ACM SigmodRecord 29.2 (2000), pp. 439–450.

[2] Rajeev Alur. “Formal verification of hybrid systems”. In: Embedded Software (EMSOFT), 2011 Pro-ceedings of the International Conference on. IEEE. 2011, pp. 273–278.

[3] Kenneth Anderson, Daniel Reisner, and Matthew C Waxman. “Adapting the Law of Armed Conflictto Autonomous Weapon Systems”. In: International Law Studies 90 (2014).

[4] Susan Leigh Anderson and Michael Anderson. “A Prima Facie Duty Approach to Machine EthicsMachine Learning of Features of Ethical Dilemmas, Prima Facie Duties, and Decision Principlesthrough a Dialogue with Ethicists”. In: Machine Ethics (2011), p. 476.

[5] David Andre and Stuart J Russell. “State abstraction for programmable reinforcement learningagents”. In: Eighteenth national conference on Artificial intelligence. American Association for Arti-ficial Intelligence. 2002, pp. 119–125.

[6] Stuart Armstrong, Nick Bostrom, and Carl Shulman. “Racing to the precipice: a model of artificialintelligence development”. In: (2013).

[7] Stuart Armstrong, Kaj Sotala, and Sean S O hEigeartaigh. “The errors, insights and lessons offamous AI predictions–and what they mean for the future”. In: Journal of Experimental & TheoreticalArtificial Intelligence ahead-of-print (2014), pp. 1–26. url: http : / / www . fhi . ox . ac . uk / wp -

content/uploads/FAIC.pdf.

[8] Gustaf Arrhenius. “The impossibility of a satisfactory population ethics”. In: Descriptive and nor-mative approaches to human behavior (2011).

[9] Peter M Asaro. “What should we want from a robot ethic?” In: International Review of InformationEthics 6.12 (2006), pp. 9–16.

[10] Peter Asaro. “How just could a robot war be?” In: Current issues in computing and philosophy (2008),pp. 50–64.

[11] Karl J Astrom and Bjorn Wittenmark. Adaptive control. Courier Dover Publications, 2013.

[12] Silvia Bellezza, Anat Keinan, and Neeru Paharia. Conspicuous Consumption of Time: When Busynessat Work and Lack of Leisure Time Become a Status Symbol. 2014. url: http://www.hbs.edu/faculty/Pages/item.aspx?num=47139.

[13] M Boden et al. “Principles of robotics”. In: The United Kingdom’s Engineering and Physical SciencesResearch Council (EPSRC). web publication (2011).

[14] Nick Bostrom. “Infinite ethics”. In: Analysis and Metaphysics 10 (2011), pp. 9–59.

[15] Nick Bostrom. Moral Uncertainty–Towards a Solution? 2009. url: http://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html.

[16] Nick Bostrom. Superintelligence: Paths, dangers, strategies. Oxford University Press, 2014.

[17] Nick Bostrom. “The superintelligent will: Motivation and instrumental rationality in advanced arti-ficial agents”. In: Minds and Machines 22.2 (2012), pp. 71–85.

[18] Jurgen Branke et al. Multiobjective optimization: Interactive and evolutionary approaches. Vol. 5252.Springer Science & Business Media, 2008.

[19] Selmer Bringsjord et al. “Piagetian roboethics via category theory: Moving beyond mere formaloperations to engineer robots whose decisions are guaranteed to be ethically correct”. In: Machineethics (2011), pp. 361–374.

[20] Yuriy Brun and Michael D Ernst. “Finding latent code errors via machine learning over programexecutions”. In: Proceedings of the 26th International Conference on Software Engineering. IEEEComputer Society. 2004, pp. 480–490.

28

http://www.fhi.ox.ac.uk/wp-content/uploads/FAIC.pdf

http://www.fhi.ox.ac.uk/wp-content/uploads/FAIC.pdf

http://www.hbs.edu/faculty/Pages/item.aspx?num=47139

http://www.hbs.edu/faculty/Pages/item.aspx?num=47139

http://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html

http://www.overcomingbias.com/2009/01/moral-uncertainty-towards-a-solution.html

[21] Erik Brynjolfsson and Andrew McAfee. The second machine age: work, progress, and prosperity in atime of brilliant technologies. W.W. Norton & Company, 2014.

[22] Erik Brynjolfsson, Andrew McAfee, and Michael Spence. “Labor, Capital, and Ideas in the PowerLaw Economy”. In: Foreign Aff. 93 (2014), p. 44.

[23] Ryan Calo. “Robotics and the New Cyberlaw”. In: Available at SSRN 2402972 (2014).

[24] Ryan Calo. “The Case for a Federal Robotics Commission”. In: Available at SSRN 2529151 (2014).

[25] David Chalmers. “The singularity: A philosophical analysis”. In: Journal of Consciousness Studies17.9-10 (2010), pp. 7–65.

[26] Wei Chu and Zoubin Ghahramani. “Preference Learning with Gaussian Processes”. In: In Proc. ICML2005. 2005, pp. 137–144.

[27] Robin R Churchill and Geir Ulfstein. “Autonomous institutional arrangements in multilateral envi-ronmental agreements: a little-noticed phenomenon in international law”. In: American Journal ofInternational Law (2000), pp. 623–659.

[28] Andrew E Clark and Andrew J Oswald. “Unhappiness and unemployment”. In: The Economic Journal(1994), pp. 648–659.

[29] Owen Cotton-Barratt and Toby Ord. Strategic considerations about different speeds of AI takeoff.Aug. 2014. url: http://www.fhi.ox.ac.uk/strategic-considerations-about-different-speeds-of-ai-takeoff/.

[30] Andre DeHon et al. “Preliminary design of the SAFE platform”. In: Proceedings of the 6th Workshopon Programming Languages and Operating Systems. ACM. 2011, p. 4.

[31] Louise A Dennis et al. “Practical Verification of Decision-Making in Agent-Based Autonomous Sys-tems”. In: arXiv preprint arXiv:1310.2431 (2013).

[32] Daniel Dewey. “Long-term strategies for ending existential risk from fast takeoff”. In: (Nov. 2014).url: http://www.danieldewey.net/fast-takeoff-strategies.pdf.

[33] United Nations Institute for Disarmament Research. The Weaponization of Increasingly AutonomousTechnologies: Implications for Security and Arms Control. UNIDIR, 2014.

[34] Bonnie Lynn Docherty. Losing Humanity: The Case Against Killer Robots. Human Rights Watch,2012.

[35] Peter Eckersley and Anders Sandberg. “Is Brain Emulation Dangerous?” In: Journal of ArtificialGeneral Intelligence 4.3 (2013), pp. 170–194.

[36] Beno Eckmann. “Social choice and topology a case of pure and applied mathematics”. In: ExpositionesMathematicae 22.4 (2004), pp. 385–393.

[37] Benja Fallenstein and Nate Soares. Vingean Reflection: Reliable Reasoning for Self-Modifying Agents.Tech. rep. Machine Intelligence Research Institute, 2014. url: https://intelligence.org/files/VingeanReflection.pdf.

[38] Kathleen Fisher. “HACMS: high assurance cyber military systems”. In: Proceedings of the 2012 ACMconference on high integrity language technology. ACM. 2012, pp. 51–52.

[39] Carl Frey and Michael Osborne. The future of employment: how susceptible are jobs to computerisa-tion? Working Paper. Oxford Martin School, 2013.

[40] Edward L Glaeser. “Secular joblessness”. In: Secular Stagnation: Facts, Causes and Cures (2014),p. 69.

[41] Irving John Good. “Speculations concerning the first ultraintelligent machine”. In: Advances in com-puters 6.31 (1965), p. 88.

[42] Katja Grace. Algorithmic Progress in Six Domains. Tech. rep. Machine Intelligence Research Institute,2013. url: http://intelligence.org/files/AlgorithmicProgress.pdf.

29

http://www.fhi.ox.ac.uk/strategic-considerations-about-different-speeds-of-ai-takeoff/

http://www.fhi.ox.ac.uk/strategic-considerations-about-different-speeds-of-ai-takeoff/

http://www.danieldewey.net/fast-takeoff-strategies.pdf

https://intelligence.org/files/VingeanReflection.pdf

https://intelligence.org/files/VingeanReflection.pdf

http://intelligence.org/files/AlgorithmicProgress.pdf

[43] Katja Grace and Paul Christiano. Resolutions of mathematical conjectures. 2014. url: http://www.aiimpacts.org/resolutions-of-mathematical-conjectures.

[44] The Tauri Group. Retrospective Analysis of Technology Forecasting: In-scope Extension. Tech. rep.2012. url: http://www.dtic.mil/get-tr-doc/pdf?AD=ADA568107.

[45] Tom Gunter et al. “Sampling for inference in probabilistic models with fast Bayesian quadrature”.In: Advances in Neural Information Processing Systems. 2014, pp. 2789–2797.

[46] Joseph Y. Halpern and Rafael Pass. “Game Theory with Translucent Players”. In: CoRR abs/1308.3778(2013). url: http://arxiv.org/abs/1308.3778.

[47] Joseph Y. Halpern and Rafael Pass. “I Don’t Want to Think About it Now: Decision Theory WithCostly Computation”. In: CoRR abs/1106.2657 (2011). url: http://arxiv.org/abs/1106.2657.

[48] Joseph Y Halpern, Rafael Pass, and Lior Seeman. “Decision Theory with Resource-Bounded Agents”.In: Topics in cognitive science 6.2 (2014), pp. 245–257.

[49] Kristian J Hammond, Timothy M Converse, and Joshua W Grass. “The stabilization of environ-ments”. In: Artificial Intelligence 72.1 (1995), pp. 305–327.

[50] Robin Hanson. “Economics of the singularity”. In: Spectrum, IEEE 45.6 (2008), pp. 45–50.

[51] Robin Hanson. “Long-term growth as a sequence of exponential modes”. In: George Mason University.Citeseer. 1998.

[52] Philipp Hennig and Martin Kiefel. “Quasi-Newton methods: A new direction”. In: The Journal ofMachine Learning Research 14.1 (2013), pp. 843–865.

[53] Clemens Hetschko, Andreas Knabe, and Ronnie Schob. “Changing identity: Retiring from unemploy-ment”. In: The Economic Journal 124.575 (2014), pp. 149–166.

[54] Henry Hexmoor, Brian McLaughlan, and Gaurav Tuli. “Natural human role in supervising complexcontrol systems”. In: Journal of Experimental & Theoretical Artificial Intelligence 21.1 (2009), pp. 59–77.

[55] Bill Hibbard. “Avoiding unintended AI behaviors”. In: Artificial General Intelligence. Springer, 2012,pp. 107–116.

[56] Bill Hibbard. Ethical Artificial Intelligence. 2014. url: arxiv.org/abs/1411.1373.

[57] Bill Hibbard. “Self-Modeling Agents and Reward Generator Corruption”. In: AAAI-15 Workshop onAI and Ethics. 2015.

[58] Daniel Hintze. “Problem Class Dominance in Predictive Dilemmas”. Honors Thesis. Arizona StateUniversity, 2014.

[59] Eric J Horvitz. “Reasoning about beliefs and actions under computational resource constraints”. In:Third AAAI Workshop on Uncertainty in Artificial Intelligence. 1987, pp. 429–444.

[60] Eric Horvitz. One-Hundred Year Study of Artificial Intelligence: Reflections and Framing. Whitepaper. Stanford University, 2014. url: https://stanford.app.box.com/s/266hrhww2l3gjoy9euar.

[61] Eric Horvitz and Bart Selman. Interim Report from the Panel Chairs. AAAI Presidential Panel onLong Term AI Futures. 2009.

[62] Marcus Hutter. “A theory of universal artificial intelligence based on algorithmic complexity”. In:arXiv preprint cs/0004001 (2000).

[63] Gerwin Klein et al. “seL4: Formal verification of an OS kernel”. In: Proceedings of the ACM SIGOPS22nd symposium on Operating systems principles. ACM. 2009, pp. 207–220.

[64] Patrick LaVictoire et al. “Program Equilibrium in the Prisoner’s Dilemma via Lob’s Theorem”. In:AAAI Multiagent Interaction without Prior Coordination workshop. 2014.

[65] Terran D Lane. “Machine learning techniques for the computer security domain of anomaly detection”.PhD thesis. Purdue University, 2000.

30

http://www.aiimpacts.org/resolutions-of-mathematical-conjectures

http://www.aiimpacts.org/resolutions-of-mathematical-conjectures

http://www.dtic.mil/get-tr-doc/pdf?AD=ADA568107

http://arxiv.org/abs/1308.3778


arxiv.org/abs/1411.1373

https://stanford.app.box.com/s/266hrhww2l3gjoy9euar

[66] Patrick Lin, Keith Abney, and George A Bekey. Robot ethics: the ethical and social implications ofrobotics. MIT Press, 2011.

[67] Alan K Mackworth. “Agents, bodies, constraints, dynamics, and evolution”. In: AI Magazine 30.1(2009), p. 7.

[68] James Manyika et al. Big data: The next frontier for innovation, competition, and productivity. Report.McKinsey Global Institute, 2011.

[69] James Manyika et al. Disruptive technologies: Advances that will transform life, business, and theglobal economy. Vol. 180. McKinsey Global Institute, San Francisco, CA, 2013.

[70] Bruce M McLaren. “Computational models of ethical reasoning: Challenges, initial steps, and futuredirections”. In: Intelligent Systems, IEEE 21.4 (2006), pp. 29–37.

[71] Donald Michie. “Machines and the theory of intelligence”. In: Nature 241.5391 (1973), pp. 507–512.

[72] Joel Mokyr. “Secular stagnation? Not in your life”. In: Secular Stagnation: Facts, Causes and Cures(2014), p. 83.

[73] James H Moor. “The nature, importance, and difficulty of machine ethics”. In: Intelligent Systems,IEEE 21.4 (2006), pp. 18–21.

[74] Luke Muehlhauser. AGI outcomes and civilizational competence. Oct. 2014. url: https://intelligence.org/2014/10/16/agi-outcomes-civilizational-competence/.

[75] Luke Muehlhauser. The world’s distribution of computation (initial findings). Feb. 2014. url: http://intelligence.org/2014/02/28/the- worlds- distribution- of- computation- initial-

findings/.

[76] Luke Muehlhauser. “Transparency in Safety-Critical Systems”. In: (2013). url: http://intelligence.org/2013/08/25/transparency-in-safety-critical-systems/.

[77] Luke Muehlhauser and Ken Hayworth. Ken Hayworth on brain emulation prospects. Sept. 2014. url:http://intelligence.org/2014/09/09/hayworth/.

[78] Luke Muehlhauser and Louie Helm. “The singularity and machine ethics”. In: Singularity Hypotheses.Springer, 2012, pp. 101–126.

[79] Luke Muehlhauser and Jonah Sinick. How well will policy-makers handle AGI? (initial findings). Sept.2013. url: https://intelligence.org/2013/09/12/how-well-will-policy-makers-handle-agi-initial-findings/.

[80] Luke Muehlhauser and Chris Williamson. Ideal Advisor Theories and Personal CEV. 2013. url:http://intelligence.org/files/IdealAdvisorTheories.pdf.

[81] Vincent C Muller and Nick Bostrom. “Future progress in artificial intelligence: A survey of expertopinion”. In: Fundamental Issues of Artificial Intelligence (2014). forthcoming. url: http://www.sophia.de/pdf/2014_PT-AI_polls.pdf.

[82] Hirotaka Nakayama, Yeboon Yun, and Min Yoon. Sequential approximate multiobjective optimizationusing computational intelligence. Springer, 2009.

[83] Andrew Y Ng and Stuart Russell. “Algorithms for Inverse Reinforcement Learning”. In: in Proc. 17thInternational Conf. on Machine Learning. Citeseer. 2000.

[84] Nils J Nilsson. “Artificial intelligence, employment, and income”. In: AI Magazine 5.2 (1984), p. 5.

[85] Stephen M Omohundro. “The Basic AI Drives. Artificial General Intelligence”. In: 2008 proceedingsof the First AGI Conference, eds. Pei Wang, Ben Goertzel, and Stan Franklin. Vol. 171. 2008.

[86] Stephen M Omohundro. The nature of self-improving artificial intelligence. Presented at SingularitySummit 2007.

[87] Laurent Orseau and Mark Ring. “Space-Time embedded intelligence”. In: Artificial General Intelli-gence. Springer, 2012, pp. 209–218.

31

https://intelligence.org/2014/10/16/agi-outcomes-civilizational-competence/

https://intelligence.org/2014/10/16/agi-outcomes-civilizational-competence/

http://intelligence.org/2014/02/28/the-worlds-distribution-of-computation-initial-findings/



http://intelligence.org/2013/08/25/transparency-in-safety-critical-systems/

http://intelligence.org/2013/08/25/transparency-in-safety-critical-systems/

http://intelligence.org/2014/09/09/hayworth/

https://intelligence.org/2013/09/12/how-well-will-policy-makers-handle-agi-initial-findings/

https://intelligence.org/2013/09/12/how-well-will-policy-makers-handle-agi-initial-findings/

http://intelligence.org/files/IdealAdvisorTheories.pdf

http://www.sophia.de/pdf/2014_PT-AI_polls.pdf

http://www.sophia.de/pdf/2014_PT-AI_polls.pdf

[88] Raja Parasuraman, Thomas B Sheridan, and Christopher D Wickens. “A model for types and levelsof human interaction with automation”. In: Systems, Man and Cybernetics, Part A: Systems andHumans, IEEE Transactions on 30.3 (2000), pp. 286–297.

[89] Luıs Moniz Pereira and Ari Saptawijaya. “Modelling morality with prospective logic”. In: Progress inArtificial Intelligence. Springer, 2007, pp. 99–111.

[90] Charles Perrow. Normal Accidents: Living with High-Risk Technologies. New York: Basic Books, 1984.

[91] Andr Platzer. Logical analysis of hybrid systems: proving theorems for complex dynamics. SpringerPublishing Company, Incorporated, 2010.

[92] Associated Press. “Atom-Powered World Absurd, Scientists Told”. In: New York Herald Tribune ().September 12, 1933, p. 1.

[93] Probabilistic Numerics. http://probabilistic-numerics.org. Accessed: 27 November 2014.

[94] Matthew J Probst and Sneha Kumar Kasera. “Statistical trust establishment in wireless sensor net-works”. In: Parallel and Distributed Systems, 2007 International Conference on. Vol. 2. IEEE. 2007,pp. 1–8.

[95] Luca Pulina and Armando Tacchella. “An abstraction-refinement approach to verification of artificialneural networks”. In: Computer Aided Verification. Springer. 2010, pp. 243–257.

[96] Reuters. “Space Travel ‘Utter Bilge’”. In: The Ottawa Citizen (). January 3, 1956, p. 1. url: http://news.google.com/newspapers?id=ddgxAAAAIBAJ&sjid=1eMFAAAAIBAJ&pg=3254%2C7126.

[97] Konrad Rieck et al. “Automatic analysis of malware behavior using machine learning”. In: Journalof Computer Security 19.4 (2011), pp. 639–668.

[98] Heather M Roff. “Responsibility, liability, and lethal autonomous robots”. In: Routledge Handbook ofEthics and War: Just War Theory in the 21st Century (2013), p. 352.

[99] Heather M Roff. “The Strategic Robot Problem: Lethal Autonomous Weapons in War”. In: Journalof Military Ethics 13.3 (2014).

[100] Stuart J Russell and Devika Subramanian. “Provably bounded-optimal agents”. In: Journal of Arti-ficial Intelligence Research (1995), pp. 1–36.

[101] Stuart Russell. “Learning agents for uncertain environments”. In: Proceedings of the eleventh annualconference on Computational learning theory. ACM. 1998, pp. 101–103.

[102] Stuart Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. 3rd. Pearson, 2010.

[103] Jordi Sabater and Carles Sierra. “Review on computational trust and reputation models”. In: Artificialintelligence review 24.1 (2005), pp. 33–60.

[104] Scott D Sagan. The limits of safety. 1993.

[105] Anders Sandberg and Nick Bostrom. “Whole brain emulation: A roadmap”. In: Future of HumanityInstitute Technical Report 3 (2008).

[106] Johann M Schumann and Yan Liu. Applications of neural networks in high assurance systems.Springer, 2010.

[107] Murray Shanahan. The Technological Singularity. Forthcoming. MIT Press, 2015.

[108] Carl Shulman and Anna Salamon. Risk-averse preferences as an AGI safety technique. Presented atAGI-11. 2011. url: http://intelligence.org/2014/01/31/two-miri-talks-from-agi-11/.

[109] Peter W Singer and Allan Friedman. Cybersecurity: What Everyone Needs to Know. Oxford UniversityPress, 2014.

[110] Nate Soares. Formalizing Two Problems of Realistic World-Models. Tech. rep. Machine IntelligenceResearch Institute, 2014. url: https://intelligence.org/files/RealisticWorldModels.pdf.

[111] Nate Soares. The Value Learning Problem. Tech. rep. Machine Intelligence Research Institute, 2014.url: https://intelligence.org/files/ValueLearningProblem.pdf.

32

http://probabilistic-numerics.org

http://news.google.com/newspapers?id=ddgxAAAAIBAJ&sjid=1eMFAAAAIBAJ&pg=3254%2C7126

http://news.google.com/newspapers?id=ddgxAAAAIBAJ&sjid=1eMFAAAAIBAJ&pg=3254%2C7126

http://intelligence.org/2014/01/31/two-miri-talks-from-agi-11/

https://intelligence.org/files/RealisticWorldModels.pdf

https://intelligence.org/files/ValueLearningProblem.pdf

[112] Nate Soares. The Value Learning Problem. Tech. rep. Machine Intelligence Research Institute, 2015.url: https://intelligence.org/files/ValueLearningProblem.pdf.

[113] Nate Soares and Benja Fallenstein. Aligning Superintelligence with Human Interests: A Technical Re-search Agenda. Tech. rep. Machine Intelligence Research Institute, 2014. url: http://intelligence.org/files/TechnicalAgenda.pdf.

[114] Nate Soares and Benja Fallenstein. Questions of Reasoning Under Logical Uncertainty. Tech. rep.url: http://intelligence.org/files/QuestionsLogicalUncertainty.pdf. Machine IntelligenceResearch Institute, 2014.

[115] Nate Soares and Benja Fallenstein. Toward Idealized Decision Theory. Tech. rep. url: https://intelligence.org/files/TowardIdealizedDecisionTheory.pdf. Machine Intelligence ResearchInstitute, 2014.

[116] Nate Soares et al. “Corrigibility”. In: AAAI-15 Workshop on AI and Ethics. 2015.

[117] David Sobel. “Do the desires of rational agents converge?” In: Analysis 59.263 (1999), pp. 137–147.

[118] Kaj Sotala and Roman V Yampolskiy. “Responses to catastrophic AGI risk: a survey”. In: PhysicaScripta 90.1 (2015), p. 018001.

[119] Diana F Spears. “Assuring the behavior of adaptive agents”. In: Agent technology from a formalperspective. Springer, 2006, pp. 227–257.

[120] John P Sullins. “Introduction: Open questions in roboethics”. In: Philosophy & Technology 24.3(2011), pp. 233–238.

[121] Christian Szegedy et al. “Intriguing properties of neural networks”. In: CoRR abs/1312.6199 (2013).url: http://arxiv.org/abs/1312.6199.

[122] Brian J. (Ed.) Taylor. Methods and Procedures for the Verification and Validation of Artificial NeuralNetworks. Springer, 2006.

[123] Max Tegmark. “Friendly Artificial Intelligence: the Physics Challenge”. In: AAAI-15 Workshop onAI and Ethics. 2015. url: http://arxiv.org/pdf/1409.0813.pdf.

[124] Moshe Tennenholtz. “Program equilibrium”. In: Games and Economic Behavior 49.2 (2004), pp. 363–373.

[125] The Scientists’ Call To Ban Autonomous Lethal Robots. International Committee for Robot ArmsControl. Accessed January 2015. url: http://icrac.net/call/.

[126] Vernor Vinge. “The coming technological singularity”. In: Whole Earth Review 81 (1993), pp. 88–95.

[127] David C Vladeck. “Machines without Principals: Liability Rules and Artificial Intelligence”. In: Wash.L. Rev. 89 (2014), p. 117.

[128] Wendell Wallach and Colin Allen. Moral machines: Teaching robots right from wrong. Oxford Uni-versity Press, 2008.

[129] N. Weaver. “Paradoxes of rational agency and formal systems that verify their own soundness”. In:ArXiv e-prints (Dec. 2013). arXiv: 1312.3626 [math.LO].

[130] Dafniel Weld and Oren Etzioni. “The first law of robotics (a call to arms)”. In: AAAI. Vol. 94. 1994,pp. 1042–1047.

[131] Alan FT Winfield, Christian Blum, and Wenguo Liu. “Towards an Ethical Robot: Internal Models,Consequences and Ethical Action Selection”. In: Advances in Autonomous Robotics Systems. Springer,2014, pp. 85–96.

[132] AD Wissner-Gross and CE Freer. “Causal entropic forces”. In: Physical review letters 110.16 (2013),p. 168702.

[133] Roman V. Yampolskiy. “The Universe of Minds”. In: CoRR abs/1410.0369 (2014). url: http://arxiv.org/abs/1410.0369.

33

https://intelligence.org/files/ValueLearningProblem.pdf

http://intelligence.org/files/TechnicalAgenda.pdf

http://intelligence.org/files/TechnicalAgenda.pdf

http://intelligence.org/files/QuestionsLogicalUncertainty.pdf

https://intelligence.org/files/TowardIdealizedDecisionTheory.pdf

https://intelligence.org/files/TowardIdealizedDecisionTheory.pdf


http://arxiv.org/pdf/1409.0813.pdf

http://icrac.net/call/




[134] V. Roman Yampolskiy. “Leakproofing the Singularity: Artificial Intelligence Confinement Problem”.In: Journal of Consciousness Studies 19.1-2 (2012), pp. 1–2.

[135] Eliezer Yudkowsky. “Artificial intelligence as a positive and negative factor in global risk”. In: Globalcatastrophic risks 1 (2008), p. 303.

[136] Eliezer Yudkowsky. “Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Archi-tectures”. In: (2001). url: http://intelligence.org/files/CFAI.pdf.

[137] Eliezer Yudkowsky. Intelligence Explosion Microeconomics. Tech. rep. Citeseer, 2013.

34

http://intelligence.org/files/CFAI.pdf

A survey of research questions for robust and bene cial AI · PDF fileA survey of research...

Documents

Transcript of A survey of research questions for robust and bene cial AI · PDF fileA survey of research...