arXiv:1711.03902v1 [cs.AI] 10 Nov 2017 · Neural-Symbolic Learning and Reasoning logics play an...

58
Neural-Symbolic Learning and Reasoning Neural-Symbolic Learning and Reasoning: A Survey and Interpretation Tarek R. Besold [email protected] Department of Computer Science, City, University of London Artur d’Avila Garcez [email protected] Department of Computer Science, City, University London Sebastian Bader [email protected] Department of Computer Science, University of Rostock Howard Bowman [email protected] School of Computing, University of Kent Pedro Domingos [email protected] Department of Computer Science & Engineering, University of Washington Pascal Hitzler [email protected] Department of Computer Science & Engineering, Wright State University Kai-Uwe K¨ uhnberger [email protected] Institute of Cognitive Science, University of Osnabr¨ uck Luis C. Lamb [email protected] Instituto de Informatica, Universidade Federal do Rio Grande do Sul Daniel Lowd [email protected] Department of Computer and Information Science, University of Oregon Priscila Machado Vieira Lima [email protected] NCE, Universidade Federal do Rio de Janeiro Leo de Penning [email protected] Illuminoo B.V. Gadi Pinkas [email protected] Center for Academic Studies and Gonda Brain Research Center, Bar-Ilan University, Israel Hoifung Poon [email protected] Microsoft Research Gerson Zaverucha [email protected] COPPE, Universidade Federal do Rio de Janeiro Abstract The study and understanding of human behaviour is relevant to computer science, artificial intelligence, neural computation, cognitive science, philosophy, psychology, and several other areas. Presupposing cognition as basis of behaviour, among the most promi- nent tools in the modelling of behaviour are computational-logic systems, connectionist models of cognition, and models of uncertainty. Recent studies in cognitive science, artifi- cial intelligence, and psychology have produced a number of cognitive models of reasoning, learning, and language that are underpinned by computation. In addition, efforts in com- puter science research have led to the development of cognitive computational systems 1 arXiv:1711.03902v1 [cs.AI] 10 Nov 2017

Transcript of arXiv:1711.03902v1 [cs.AI] 10 Nov 2017 · Neural-Symbolic Learning and Reasoning logics play an...

Neural-Symbolic Learning and Reasoning

Neural-Symbolic Learning and Reasoning:A Survey and Interpretation

Tarek R. Besold [email protected] of Computer Science, City, University of London

Artur d’Avila Garcez [email protected] of Computer Science, City, University London

Sebastian Bader [email protected] of Computer Science, University of Rostock

Howard Bowman [email protected] of Computing, University of Kent

Pedro Domingos [email protected] of Computer Science & Engineering, University of Washington

Pascal Hitzler [email protected] of Computer Science & Engineering, Wright State University

Kai-Uwe Kuhnberger [email protected] of Cognitive Science, University of Osnabruck

Luis C. Lamb [email protected] de Informatica, Universidade Federal do Rio Grande do Sul

Daniel Lowd [email protected] of Computer and Information Science, University of Oregon

Priscila Machado Vieira Lima [email protected], Universidade Federal do Rio de Janeiro

Leo de Penning [email protected] B.V.

Gadi Pinkas [email protected] for Academic Studies and Gonda Brain Research Center, Bar-Ilan University, Israel

Hoifung Poon [email protected] Research

Gerson Zaverucha [email protected]

COPPE, Universidade Federal do Rio de Janeiro

Abstract

The study and understanding of human behaviour is relevant to computer science,artificial intelligence, neural computation, cognitive science, philosophy, psychology, andseveral other areas. Presupposing cognition as basis of behaviour, among the most promi-nent tools in the modelling of behaviour are computational-logic systems, connectionistmodels of cognition, and models of uncertainty. Recent studies in cognitive science, artifi-cial intelligence, and psychology have produced a number of cognitive models of reasoning,learning, and language that are underpinned by computation. In addition, efforts in com-puter science research have led to the development of cognitive computational systems

1

arX

iv:1

711.

0390

2v1

[cs

.AI]

10

Nov

201

7

Besold et al.

integrating machine learning and automated reasoning. Such systems have shown promisein a range of applications, including computational biology, fault diagnosis, training andassessment in simulators, and software verification. This joint survey reviews the personalideas and views of several researchers on neural-symbolic learning and reasoning. The ar-ticle is organised in three parts: Firstly, we frame the scope and goals of neural-symboliccomputation and have a look at the theoretical foundations. We then proceed to describethe realisations of neural-symbolic computation, systems, and applications. Finally wepresent the challenges facing the area and avenues for further research.

1. Overview

The study of human behaviour is an important part of computer science, artificial in-telligence (AI), neural computation, cognitive science, philosophy, psychology, and otherareas. Presupposing that behaviour is generally determined and guided by cognition andmental processing, among the most prominent tools in the modelling of behaviour arecomputational-logic systems mostly addressing high-level reasoning and thought processes(classical logic, nonmonotonic logic, modal and temporal logic), connectionist models of cog-nition and the brain mostly addressing lower-level dynamics and emergent processes (feed-forward and recurrent networks, symmetric and deep networks, self-organising networks),and models of uncertainty addressing the often vague or probabilistic nature of many as-pects of cognitive processing (Bayesian networks, Markov decision processes, Markov logicnetworks, probabilistic inductive logic programs).

Recent studies in cognitive science, artificial intelligence, and psychology have produceda number of cognitive models of reasoning, learning, and language that are underpinnedby computation (Pinker, Nowak, & Lee, 2008; Shastri, 2007; Sun, 2009). In addition,recent efforts in computer science have led to the development of cognitive computationalsystems integrating machine learning and automated reasoning (Garcez, Broda, & Gabbay,2002; Garcez, Lamb, & Gabbay, 2009; Valiant, 2000). Such systems have shown promisein a range of applications, including fault diagnosis, computational biology, training andassessment in simulators, and software verification (de Penning, d’Avila Garcez, Lamb, &Meyer, 2010, 2011).

Forestalling the presentation of the theoretical foundations in Section 2, the intuitionmotivating neural-symbolic integration as an active field of research is the following: Inneural computing, it is assumed that the mind is an emergent property of the brain, andthat computational cognitive modelling can lead to valid theories of cognition and offer anunderstanding of certain cognitive processes (Sun, 2009). From this it is in turn assumedthat connectionism should be able to offer an appropriate representational language forartificial intelligence as well. In particular, a connectionist computational theory of the mindshould be able to replicate the parallelism and kinds of adaptive learning processes seen inneural networks, which are generally accepted as responsible for the necessary robustnessand ultimate effectiveness of the system in dealing with commonsense knowledge. As aresult, a purely symbolic approach would not be sufficient, as argued by Valiant in (Valiant,2008).

On the other hand, logic is firmly established as a fundamental tool in the modellingof thought and behaviour (Kowalski, 2011; Pereira, 2012) and by many has been viewedgenerally as the “calculus of computer science”. In this context, often also nonclassical

2

Neural-Symbolic Learning and Reasoning

logics play an important role: Temporal logic, for instance, has had significant impact inboth academia and industry (Pnueli, 1977), and different modal logics have become a linguafranca for, among others, the specification and analysis of knowledge and communicationin multi-agent and distributed systems (Fagin, Halpern, Moses, & Vardi, 1995). Researchon practical reasoning in AI has been dominated by nonmonotonic formalisms. Intuition-istic logic can provide an adequate logical foundation for several core areas of theoreticalcomputer science, including type theory and functional programming (Van Dalen, 2002).Finally, description logics—which are similar to Kripke models—have been instrumental inthe study of the semantic web (Baader, Calvanese, McGuinness, Nardi, & Patel-Schneider,2003).

However, when building models that combine learning and reasoning, one has to concili-ate the methodologies of distinct areas—namely predominantly statistics and logic—in orderto combine the respective advantages and circumvent the shortcomings and limitations. Forinstance, the methodology of neural-symbolic systems aims to transfer principles and mech-anisms between (often nonclassical) logic-based computation and neural computation. Inparticular, it considers how principles of symbolic computation can be implemented by con-nectionist mechanisms and how subsymbolic computation can be described and analysed inlogical terms. Here, connectionism provides the hardware upon which different levels of ab-straction can be built according to the needs of the application. This methodology—lookingat principles, mechanisms, and applications—has proven a fruitful way of progressing theresearch in the area of neural-symbolic integration for more than two decades now as evi-denced by, for instance, the results summarised in the overview by (Bader & Hitzler, 2005),collected in the books by (Hammer & Hitzler, 2007) and (Garcez et al., 2009), and reportedin the present survey.

For example in (Garcez et al., 2009), the described approach has led to a prototypicalconnectionist system for nonclassical reasoning in an attempt to find an adequate balancebetween complexity and expressiveness. In this framework—known as a neural-symbolicsystem—artificial neural networks (ANNs) provide the machinery for parallel computationand robust learning, while logic provides the necessary explanation for the network models,facilitating the necessary interaction with the world and other systems. In the integratedmodel, no conflict arises between a continuous and a discrete component of the system.Instead, a tightly-coupled hybrid system exists that is continuous by nature (the ANN),but that has a clear discrete interpretation (its logic) at various levels of abstraction.

From a more practical perspective, rational agents are often conceptualised as perform-ing concept acquisition (generally unsupervised and statistical) and concept manipulation(generally supervised and symbolic) as part of a permanent cycle of perception and action.The question of how to reconcile the statistical nature of learning with the logical natureof reasoning, aiming to build such robust computational models integrating concept acqui-sition and manipulation, has been identified as a key research challenge and fundamentalproblem in computer science (Valiant, 2003). Against this backdrop we see neural-symbolicintegration as a way of addressing stated challenge through the mechanisms of knowledgetranslation and knowledge extraction between symbolic logic systems and subsymbolic net-works.

There are also important applications of neural-symbolic integration with high relevancefor industry applications. The merging of theory (known as background knowledge in ma-

3

Besold et al.

chine learning) and data learning (i.e. learning from examples) in ANNs has been shownmore effective than purely symbolic or purely connectionist systems, especially in the caseof real-world, noisy, unstructured data (de Penning et al., 2010, 2011; Towell & Shavlik,1994). Here, successfully addressed application scenarios include business process modelling,service-oriented computing (trust management and fraud prevention in e-commerce), syn-chronisation and coordination in large multi-agent systems, and multimodal processing andintegration.

In multimodal processing, for example, there are several forms of reasoning: a scene clas-sification can be achieved by the well-trained network giving an immediate answer followinga number of assumptions. A change in the scene, however, may require more specific tem-poral, nonmonotonic reasoning and learning from data (based, for example, on the amountof change in the scene). Some assumptions may need to be revised, information from animage annotation may provide a different context, abduction and similarity reasoning byintersecting network ensembles may be needed, probability distributions may have to bereasoned about, and so on. The integrated system will need to respond quickly, reviseits answers in the presence of new information, and control the inevitable accumulation oferrors derived from real-world data (i.e. prove its robustness). This provides an excellentopportunity for the application of neural-symbolic systems.

The remainder of the paper is structured as follows. In Section 2, we describe theprinciples of neural-symbolic computation, revisit the theoretical underpinnings of the en-deavour, and give a prototypical example of how one can combine learning and reasoning inan integrated fashion. In Section 3 we illustrate the application of the methodology usingNSCA, a neural-symbolic agent endowed with learning and reasoning capabilities, as a firstdetailed example. Section 4 relates concepts underpinning the theories of mind in psychol-ogy and cognitive science and their counterparts in neural-symbolic computation, beforeSection 5 outlines work addressing the binding problem (introduced in the preceding sec-tion), thus enabling first-order inference computed by neural-symbolic systems. Section 6then highlights the more technical foundations of first-order (predicate) logic learning inconnectionist systems, followed in Section 7 by an introduction to Markov logic and corre-sponding networks as a combination between logic and graphical models. This leads intothe conceptual considerations given in Section 8 which relate neural-symbolic computationto recent developments in AI and the reinvigorated interest in (re-)creating human-levelcapacities with artificial systems, notably recently for instance in the area of language mod-elling and processing. Finally, Section 9 summarises selected currently ongoing and widelyrecognised approaches to solving core questions of neural-symbolic integration arising fromneighbouring research efforts and disciplines, before Sections 10 and 11 present suggesteddirections for further research and conclude the survey.

2. Prolegomena of Neural-Symbolic Computation

The goals of neural-symbolic computation are to provide a coherent, unifying view forlogic and connectionism, to contribute to the modelling and understanding of cognitionand, thereby, behaviour, and to produce better computational tools for integrated machinelearning and reasoning. To this end, logic and network models are studied together asintegrated models of computation. Typically, translation algorithms from a symbolic to

4

Neural-Symbolic Learning and Reasoning

a connectionist representation and vice-versa are employed to provide either (i) a neuralimplementation of a logic, (ii) a logical characterisation of a neural system, or (iii) a hybridlearning system that brings together features from connectionism and symbolic artificialintelligence.

From a theoretical perspective, these efforts appear well-founded. According to ourcurrent knowledge and understanding, both symbolic/cognitive and sub-symbolic/neuralmodels—especially when focusing on physically-realisable and implementable systems (i.e.physical finite state machines) rather than strictly abstract models of computation, togetherwith the resulting physical and conceptual limitations—seem formally equivalent in a verybasic sense: notwithstanding partially differing theoretical arguments such as given by (Ta-bor, 2009), both paradigms are considered in practice equivalent concerning computability(Siegelmann, 1999). Also from a tractability perspective, for instance in (van Rooij, 2008),equivalence in practice with respect to classical dimensions of analysis (i.e. interchangeabil-ity except for a polynomial overhead) has been established, complementing and supportingthe prior theoretical suggestion of equivalence in the widely accepted Invariance Thesis of(van Emde Boas, 1990). Finally, (Leitgeb, 2005) provided an in principle existence result,showing that there is no substantial difference in representational or problem-solving powerbetween dynamical systems with distributed representations and symbolic systems withnon-monotonic reasoning capabilities.

But while these findings provide a solid foundation for attempts at closing the gap be-tween connectionism and logic, many questions nonetheless remain unanswered especiallywhen crossing over from the realm of theoretical research to implementation and applica-tion, among others switching from compositional symbols denoting an idealised reality tovirtually real-valued vectors obtained from sensors in the real world: Although introduc-ing basic connections and mutual dependencies between the symbolic and the subsymbolicparadigm, the levels of analysis are quite coarse and almost all results are only existen-tial in character. For instance, while establishing the in principle equivalence describedabove, (Leitgeb, 2005) does not provide constructive methods for how to actually obtainthe corresponding symbolic counterpart to a sub-symbolic model and vice versa.

Still, over the last decades several attempts have been made at developing a generalneural-symbolic framework, usually trying to apply the most popular methods of theirrespective time—such as currently modular deep networks. Growing attention has beengiven recently to deep networks where it is hoped that high-level abstract representationswill emerge from low-level unprocessed data (Hinton, Osindero, & Teh, 2006). Most mod-ern neural-symbolic systems use feedforward and recurrent networks, but seminal work inthe area used symmetric networks (Pinkas, 1991b) of the kind applied in deep learning,and recent work starts to address real applications of symmetric neural-symbolic networks(de Penning et al., 2010). There, in general each level of a neural-symbolic system representsthe knowledge evolution of multiple agents over time. Each agent is represented by a net-work in this level encoding commonsense (nonmonotonic) knowledge and preferences. Thenetworks/agents at different levels can be combined upwards to represent relational knowl-edge and downwards to create specialisations, following what is known as a network-fibringmethodology (Garcez & Gabbay, 2004).

Fibring—which will continue to serve as general example throughout the remainder ofthis section—offers a principled way of combining networks and can be seen as one of the

5

Besold et al.

general methodologies of neural-symbolic integration. The main idea of network fibring issimple: fibred networks may be composed of interconnected neurones, as usual, but also ofother networks, forming a recursive structure. A fibring function defines how this networkarchitecture behaves; it defines how the networks should relate to each other. Typically,the fibring function will allow the activation of neurones in one network A to influence thechange of weights in another network B. Intuitively, this may be seen as training networkB at the same time that network A is running. Albeit being a combination of simple andstandard ANNs, fibred networks can approximate any polynomial function in an unboundeddomain, thus being more expressive than standard feedforward networks.

Fibring is just one example of how principles from symbolic computation (in this case,recursion), can be used by connectionism to advance the research in this area. In theremainder of this section, we discuss in more detail—also in way of a manifesto summarisingour view(s) on neural-symbolic integration—the principles, mechanisms and applicationsthat drive the research in neural-symbolic integration.

2.1 Principles of Neural-Symbolic Integration

From the beginning of connectionism (McCulloch & Pitts, 1943)—arguably the first neural-symbolic system for Boolean logic—most neural-symbolic systems have focused on repre-senting, computing, and learning languages other than classical propositional logic (Browne& Sun, 2001; Cloete & Zurada, 2000; Garcez et al., 2002, 2009; Holldobler & Kalinke,1994; Shastri, 2007), with much effort being devoted to representing fragments of classicalfirst-order logic. In (Garcez et al., 2009), a new approach to knowledge representation andreasoning has been proposed, establishing connectionist nonclassical logic (including con-nectionist modal, intuitionistic, temporal, nonmonotonic, epistemic and relational logic).More recently, it has been shown that argumentation frameworks, abductive reasoning, andnormative multi-agent systems can also be represented by the same network framework.This is encouraging to the extent that a variety of forms of reasoning can be realised by thesame, simple network structure that specialises in different ways.

A key characteristic of many neural-symbolic systems is modularity. One way of buildingneural-symbolic networks is through the careful engineering of network ensembles, wheremodularity then serves an important role for comprehensibility and maintenance. Eachnetwork in the ensemble can be responsible for a specific task or logic, with the overall modelbeing potentially very expressive despite its relatively simple components. Still, althoughbeing fairly common, modularity is not a strict necessity: alternatively, as described inSection 5, one can start with an unstructured network (i.e. not modular) and let weightchanges shape its ability to process symbolic or subsymbolic representations.

Another common organisational property of neural-symbolic networks is—similar todeep networks—their generally hierarchical organisation. The lowest-level network takesraw data as input and produces a model of the dataset. The next-level network takesthe first network’s output as its input and produces some higher-level representation ofthe information in the data. The next-level network then further increases the level ofabstraction of the model, and so on, until some high-level representation can be learned.The idea is that such networks might be trained independently, possibly also combiningunsupervised and supervised learning at different levels of the hierarchy. The resulting

6

Neural-Symbolic Learning and Reasoning

parallel model of computation can be very powerful as it offers the extra expressivenessrequired by complex applications at comparatively low computational costs (Garcez et al.,2009).

2.2 Mechanisms of Neural-Symbolic Integration

We subscribe to the view that representation initially precedes learning. Neural-symbolicnetworks can represent a range of expressive logics and implement certain important prin-ciples of symbolic computation. However, neural-symbolic computation is not just aboutrepresentation. The mechanisms of propagation of activation and other message passingmethods, gradient-descent and other learning algorithms, reasoning about uncertainty, mas-sive parallelism, fault tolerance, etc. are a crucial part of neural-symbolic integration. Putsimply, neural-symbolic networks are efficient computational models, not representationaltools. It is the mechanisms in place in the form of efficient algorithms that enable thecomputational feasibility of neural-symbolic systems.

Returning to the example of fibring, also fibred networks are computational models,not just graphical models or mathematical abstractions like graphs or networks generally.The neural-symbolic networks can be mapped directly onto hardware which promises, forinstance, an implementation in a Very-large-scale integration (VLSI) chip to be straight-forward and cost effective. The main architectural constraint, which here is brain-inspired,is that neural-symbolic systems should replicate and specialise simple neuronal structuresto which a single algorithm can be applied efficiently at different levels of abstraction, withthe resulting system being capable of exhibiting emergent behaviour.

It is precisely this emergence as common characterising feature of connectionist ap-proaches which prompts the need for mechanisms of knowledge extraction. The cycleof neural-symbolic integration therefore includes (i) translation of symbolic (background)knowledge into the network, (ii) learning of additional knowledge from examples (and gen-eralisation) by the network, (iii) executing the network (i.e. reasoning), and (iv) symbolicknowledge extraction from the network. Extraction provides explanation, and facilitatesmaintenance and incremental or transfer learning.

In a general neural-symbolic system, a network ensemble A (representing, for example,a temporal theory) can be combined with another network ensemble B (representing, forexample, an agent’s epistemic state). Again using fibring as exemplary mechanism, meta-level knowledge in one network can be integrated with object-level knowledge in anothernetwork. For example, one may reason (in the meta-level) about the actions that are neededto be taken (in the object-level) to solve inconsistencies in a database. Relational knowledgecan also be represented in the same way, with relations between concepts encoded in dis-tinct (object-level) networks potentially being represented and learned through a meta-levelnetwork. More concretely, if two networks denote concepts P (X,Y ) and Q(Z) containingvariables X, Y and Z, respectively, a meta-level network can be used to map a representa-tion of P and Q onto a new concept, say R(X,Y, Z), such that, for example, the relationP (X,Y ) ∧Q(Z)→ R(X,Y, Z) is valid (Garcez et al., 2009).

Figure 1 illustrates a prototypical neural-symbolic system. The model, in its mostgeneral form, allows a number of network ensembles to be combined at different levelsof abstraction through, for instance, fibring. In the figure, each level is represented by a

7

Besold et al.

Agents

Time

Possible Worlds

Levels of Abstraction

Fibring

Figure 1: General conceptual overview of a neural-symbolic system (Garcez et al., 2009).

network ensemble in a horizontal plane, while network fibring takes place vertically amongnetworks at different ensembles. Specialisation occurs downwards when a neurone is fibredonto a network. Relational knowledge is represented upwards when multiple networks arecombined onto a meta-level network. Knowledge evolution through time occurs at eachlevel, as do alternative outcomes, and nonmonotonic and epistemic reasoning for multiple,interacting agents. Modular learning takes place inside each network, but is also applicableacross multiple networks in the ensemble. The same brain-inspired structure is replicatedthroughout the model so that a single algorithm is applied at each level and across levels.

2.3 Applications of Neural-Symbolic Integration

As so often, it will ultimately be through successful implementations and real-world appli-cations that the usefulness and importance of neural-symbolic integration will be widelyacknowledged. Practical applications are a crucial ingredient and have been a permanent

8

Neural-Symbolic Learning and Reasoning

feature of the research on neural-symbolic computation. On the theory side—in parallelto the quest for understanding the interplay and integration between connectionism andsymbolism as the basis of cognition and intelligence—we are interested in finding the limitsof representation and developing better machine learning methods which may open up newapplications for consideration. In practice, already at the current state of the research, realapplications are possible in areas with societal relevance and/or potentially high economicimpact, like bioinformatics, semantic web, fault diagnosis, robotics, software systems ver-ification, business process modelling, fraud prevention, multimodal processing, noisy textanalytics, and training and assessment in simulators.

All these different application areas have something in common: a computational sys-tem is required that is capable of learning from experience and which may reason aboutwhat has been learned (Browne & Sun, 2001; Valiant, 2003). For this learning-reasoningprocess to be successful, the system must be robust (in a way so that the accumulation oferrors resulting from the intrinsic uncertainty associated with the problem domain can becontrolled (Valiant, 2003)). The method used by the research on neural-symbolic integra-tion to enable some of the above applications has been (i) to apply translation algorithmsbetween logic and networks (making use of the associated equivalence proofs), (ii) to studythe systems empirically through case studies (following the practical motivation from sta-tistical machine learning and neural computation), and (iii) to focus on the needs of theapplication (noting that some potentially interesting applications require just even rudi-mentary logical representations, e.g. (Becker, Mackay, & Dillaway, 2009)). An example ofa neural-symbolic system that is already providing a contribution of this type to problemsin bioinformatics and fault diagnosis is the Connectionist Inductive Learning and LogicProgramming (CILP) System (Garcez & Zaverucha, 1999; Garcez et al., 2002). Similarly,the NSCA system discussed in Section 3 has successfully been applied, for instance, tobehaviour modelling in training simulators.

2.4 Neural-Symbolic Integration in a Nutshell

In summary, neural-symbolic computation encompasses the high-level integration of cogni-tive abilities (including induction, deduction, and abduction) and the study of how brainsmake mental models (Thagard & Stewart, 2011), among others also covering the modellingof emotions and attention/utility. At the computational level, it addresses the study of theintegration of logic, probabilities, and learning, and the development of new models of com-putation combining robust learning and efficient reasoning. With regard to applications,successes have been achieved in various domains, including simulation, bioinformatics, faultdiagnosis, software engineering, model checking, visual information processing, and fraudprevention.

In the following section, in order to further familiarise the reader with the overall mindsetwe will have a detailed look at one such system—namely a neural-symbolic cognitive agentarchitecture called NSCA—aligning the presentation with the three categories principles,methods, and applications.

9

Besold et al.

3. NSCA as Application Example for Neural-Symbolic Computing

It is commonly agreed that an effective integration of automated learning and cognitivereasoning in real-world applications is a difficult task (Valiant, 2003). Usually, most appli-cations deal with large amounts of data observed in the real-world containing errors, missingvalues, and inconsistencies. Even in controlled environments, like training simulators, inte-grated learning and reasoning is not very successful (Sandercock, 2004; Heuvelink, 2009).Although the use of simulated environments simplifies the data and knowledge acquisition,it is still very difficult to construct a cognitive model of an (intelligent) agent that is ableto deal with the many complex relations in the observed data. For example, when it comesto the assessment and training of high-level complex cognitive abilities (e.g. leadership,tactical manoeuvring, safe driving, etc.) training is still guided or done by human experts(van den Bosch, Riemersma, Schifflett, Elliott, Salas, & Coovert, 2004). The reason is thatexpert behaviour on high-level cognition is too complex to model, elicit, and represent in anautomated system. Among others, there can be many temporal relations between low- andhigh-order aspects of a training task, human behaviour is often non-deterministic and sub-jective (i.e. biased by personal experience and other factors like stress or fatigue), and whatis known is often described vaguely and limited to explicit (i.e. “explainable”) behaviour.

Several attempts have been made to tackle these problems. For instance (Fernlund,Gonzalez, Georgiopoulos, & DeMara, 2006) describes a number of systems that use machinelearning to learn the complex relations from observations of experts and trainees during taskexecution. Although these systems are successful at learning and generalisation, they lackthe expressive power of symbolic systems and are therefore difficult to interpret and validate(Smith & Kosslyn, 2006). Alternatively, one could add probabilistic reasoning to logic-based systems (Heuvelink, 2009). These systems perform better in expressing their internalknowledge as they use explicit symbolic representations and are able to deal with manyof the most common types of inconsistencies in the data by reasoning with probabilities.Unfortunately, when it comes to knowledge representation and modelling, these systemsstill require either statistical analyses of large amounts of data or knowledge representationby hand. Therefore, both approaches are time expensive and are not appropriate for use inreal-time applications, which demand online learning and reasoning.

3.1 Principles of Neural-Symbolic Integration Exemplified

The construction of effective cognitive agent models is a long-standing research endeav-our in artificial intelligence, cognitive science, and multi-agent systems (Valiant, 2003;Wooldridge, 2009). One of the main challenges toward achieving such models is the provisionof integrated cognitive abilities, such as learning, reasoning, and knowledge representation.Neural-symbolic systems seek to do so within the neural computation paradigm, integratinginductive learning and deductive reasoning (cf. (Garcez et al., 2009; Lehmann, Bader, &Hitzler, 2010) for examples). In such models, ANNs are used to learn and reason about(an agent’s) knowledge about the world, represented by symbolic logic. In order to do so,algorithms map logical theories (or knowledge about the world) T into a network N whichcomputes the logical consequences of T . This provides also a learning system in the networkthat can be trained by examples using T as background knowledge. In agents endowed withneural computation, induction is typically seen as the process of changing the weights of

10

Neural-Symbolic Learning and Reasoning

a network in ways that reflect the statistical properties of a dataset, allowing for general-isations over unseen examples. In the same setting, deduction is the neural computationof output values as a response to input vectors (encoding stimuli from the environment)given a particular set of weights. Such network computations have been shown equivalentto a range of temporal logic formalisms (Lamb, Borges, & Garcez, 2007). Based on thisapproach, an agent architecture called Neural Symbolic Cognitive Agent (NSCA)has been proposed in (de Penning et al., 2011). NSCA uses temporal logic as theory Tand a Restricted Boltzmann Machine (RBM) as ANN N . An RBM is a partially connectedANN with two layers, a visible layer V and a hidden layer H, and symmetric connectionsW between these layers (Smolensky, 1986).

An RBM defines a probability distribution P (V = v,H = h) over pairs of vectors vand h encoded in these layers, where v encodes the input data in binary or real valuesand h encodes the posterior probability P (H|v). Such a network can be used to inferor reconstruct complete data vectors based on incomplete or inconsistent input data andtherefore implement an auto-associative memory or an autoencoder. It does so by combiningthe posterior probability distributions generated by each unit in the hidden layer with aconditional probability distribution for each unit in the visible layer. Each hidden unitconstrains a different subset of the dimensions in the high-dimensional data presented at thevisible layer and is therefore called an expert on some feature in the input data. Together,the hidden units can form a “Product of Experts” model that constrains all the dimensionsin the input data.

NSCA is capable of (i) performing learning of complex temporal relations from un-certain observations, (ii) reasoning probabilistically about the knowledge that has beenlearned, and (iii) representing the agent’s knowledge in a logic-based format for validationpurposes. This is achieved by taking advantage of neural learning to perform robust learn-ing and adaptation, and symbolic knowledge to represent qualitative reasoning. NSCA wasvalidated in a training simulator employed in real-world scenarios, illustrating the effectiveuse of the approach. The results show that the agent model is able to learn to perform au-tomated driver assessment from observation of real-time simulation data and assessmentsby driving instructors, and that this knowledge can be extracted in the form of temporallogic rules (de Penning et al., 2011).

3.2 Mechanisms of Neural-Symbolic Integration Exemplified

Using a Recurrent Temporal Restricted Boltzmann Machine (RTRBM) (Sutskever, Hinton,& Taylor, 2009) as specialised variant of the general RBM concept, NSCA encodes temporalrules in the form of hypotheses about beliefs and previously-applied rules. This is possibledue to recurrent connections between hidden unit activations at time t and the activations attime t−1 in the RTRBM. Based on the Bayesian inference mechanism of the RTRBM, eachhidden unit Hj represents a hypothesis about a specific rule Rj that calculates the posteriorprobability that the rule implies a certain relation in the beliefs b being observed in thevisible layer V , given the previously applied rules rt−1 (i.e. P (R|B = b, Rt−1 = rt−1)). Fromthese hypotheses the RTRBM selects the most applicable rules r using random Gaussiansampling of the posterior probability distribution (i.e. r ∝ P (R|B = b, Rt−1 = rt−1)) andcalculates the conditional probability or likelihood of all beliefs given the selected rules

11

Besold et al.

are applied (i.e. P (B|R = r)). The difference between the observed and inferred beliefscan be used by NSCA to train the RTRBM (i.e. update its weights) in order to improvethe hypotheses about the observed data. For training, the RTRBM uses a combination ofContrastive Divergence and backpropagation through time.

In the spirit of (Bratman, 1999)’s Belief, Desire, Intention (BDI) agents, the observeddata (e.g. simulation data or human assessments) are encoded as beliefs and the differencebetween the observed and inferred beliefs are the actual implications or intentions of theagent on its environment (e.g. adapting the assessment scores in the training simulator).The value of a belief represents either the probability of the occurrence of some event orstate in the environment (e.g. Raining = true), or a real value (e.g. Speed = 31.5). Inother words, NSCA deals with both binary and continuous data, for instance, by usinga continuous stochastic visible layer (Chen & Murray, 2003). This improves the agent’sability to model asymmetric data, which in turn is very useful since measured data comingfrom a simulator is often asymmetric (e.g. training tasks typically take place in a restrictedregion of the simulated world).

Due to the stochastic nature of the sigmoid activation functions, the beliefs can beregarded as fuzzy sets with a Gaussian membership function. This allows one to representvague concepts like fast and slow, as well as approximations of learned values, which isuseful when reasoning with implicit and subjective knowledge (Sun, 1994).

The cognitive temporal logic described in (Lamb et al., 2007) is used to represent domainknowledge in terms of beliefs and previously-applied rules. This logic contains several modaloperators that extend classical modal logic with a notion of past and future. To expressbeliefs on continuous variables, this logic is extended with the use of equality and inequalityformulae (e.g. Speed < 30, Raining = true). As an example, consider a task where atrainee drives on an urban road and approaches an intersection. In this scenario the traineehas to apply a yield − to − the − right rule. Using the extended temporal logic, one candescribe rules about the conditions, scenario, and assessment related to this task. In rules(1) to (4) in Table 1, �A denotes “A is true sometime in the future” and ASB denotes “Ahas been true since the occurrence of B”.

Conditions:(1) (Weather > good)meaning: the weather is at least goodScenario:(2) ApproachingIntersection ∧ �(ApproachingTraffic = right)meaning: the car is approaching an intersection and sometime in the future traffic is approaching from the right(3) ((Speed > 0) ∧HeadingIntersection)S(DistanceIntersection < x)→ ApproachingIntersectionmeaning: if the car is moving and heading towards an intersection since it has been deemed close to theintersection, then the car is approaching the intersection.Assessment:(4) ApproachingIntersection ∧ (DistanceIntersection = 0) ∧ (ApproachingTraffic = right) ∧ (Speed = 0) →(Evaluation = good)meaning: if the car is approaching an intersection and arrives at the intersection when traffic is coming from theright and stops then the trainee gets a good evaluation

Table 1: Situations and assessments from a driving simulator scenario.

Rule (4) is an example of an uncertain notion that is highly subjective (the distance xat which a person is regarded as approaching an intersection is dependent on the situation

12

Neural-Symbolic Learning and Reasoning

and personal experience). When this rule is encoded in an RTRBM, it becomes possible tolearn a more objective value for x based on the observed behaviour of different people invarious scenarios. This exemplifies a main objective of combining reasoning and learning.

The temporal logic rules are represented in NSCA by setting the weights of the connec-tions in the RTRBM. Therefore the rules need to be translated to a form that relates onlyto the immediately previous time step (denoted by the temporal operator •). A transfor-mation algorithm for this is also described in (Lamb et al., 2007). Then we can encode anyrule as a stochastic relation between the hidden unit that represents the rule, the visibleunits that represent the beliefs, and the previous hidden unit activations that represent theapplied rules in the previous time step. For example, the rule αSβ can be translated to thefollowing rules: β → αSβ and α ∧ •(αSβ)→ αSβ, where α and β are modelled by visibleunits, αSβ by a hidden unit, and •(αSβ) is modelled by a recurrent connection to thesame hidden unit. (Pinkas, 1995) shows how to map these logic-based propositions into theenergy function of a symmetric network and how to deal with uncertainty by introducing anotion of confidence, called a penalty, as discussed further below.

3.3 Applications of Neural-Symbolic Integration Exemplified

NSCA has been developed as part of a three-year research project on assessment in driv-ing simulators. It is implemented as part of a multi-agent platform for Virtual Instruction(de Penning, Boot, & Kappe, 2008) and was used for an experiment on a driving simula-tor. In this experiment five students participated in a driving test consisting of five testscenarios each. For each attempt all data from the simulator (i.e. 43 measurements, likerelative positions and orientations of all traffic, speed, gear, and rpm of the student’s car)and numerical assessments scores on several driving skills (i.e. vehicle control, economicdriving, traffic flow, social-, and safe driving) that were provided by three driving instruc-tors present during the attempts were observed by NSCA in real-time. NSCA was able tolearn from these observations and infer assessment scores that are similar to those of thedriving instructors (de Penning et al., 2010, 2011).

The NSCA system has also been applied as part of a Visual Intelligence (VI) system,called CORTEX, in DARPA’s Mind’s Eye program (Donlon, 2010). This program pursuesthe capacity to learn generally-applicable and generative representations of actions betweenentities in a scene (e.g. persons, cars, objects) directly from visual inputs (i.e. pixels),and reason about the learned representations. A key distinction between Mind’s Eye andthe state-of-the-art in VI is that the latter has made large progress in recognising a widerange of entities and their properties (which might be thought of as the nouns in thedescription of a scene). Mind’s Eye seeks to add the ability to describe and reason aboutthe actions (which might be thought of as the verbs in the description of the scenes), thusenabling a more complete narrative of the visual experience. This objective is supported byNSCA as it seeks to learn to describe an action in terms of entities and their properties,providing explanations for the reasoning behind it. Results have shown that the system isable to learn and represent the underlying semantics of the actions from observation anduse this for several VI tasks, like recognition, description, anomaly detection, and gap-filling(de Penning, den Hollander, Bouma, G., & Garcez, 2012).

13

Besold et al.

Most recently, NSCA has been included in an Intelligent Transport System to reduceCO2 emissions (de Penning, d’Avila Garcez, Lamb, Stuiver, & Meyer, 2014). Resultsshow that NSCA is able to recognise various driving styles from real-time in-car sensordata and outperforms state-of-the-art in this area. In addition, NSCA is able to describethese complex driving styles in terms of multi-level temporal logic-based rules for humaninterpretation and expert validation.

3.4 NSCA in a Nutshell

In summary, NSCA is an example for a cognitive model and agent architecture that offersan effective approach integrating symbolic reasoning and neural learning in a unified model.This approach allows the agent to learn rules about observed data in complex, real-worldenvironments (e.g. expert behaviour for training and assessment in simulators). Learnedbehaviour can be extracted to update existing domain knowledge for validation, reporting,and feedback. Furthermore the approach allows domain knowledge to be encoded in themodel and deals with uncertainty in real-world data. Results described in (de Penninget al., 2010, 2011) show that the agent is able to learn new hypotheses from observationsand extract them into a temporal logic formula. But although results are promising, themodel requires further evaluation by driving experts. This will allow further validationof the model in an operational setting with many scenarios, a large trainee population,and multiple assessments by driving instructors. Other on-going work includes research onusing Deep Boltzmann Machines (Salakhutdinov & Hinton, 2009; Tran & Garcez, 2012)to find higher-level rules and the application of an RTRBM to facilitate adaptive training.Overall, this work illustrates an application model for knowledge representation, learning,and reasoning which may indeed lead to realistic computational cognitive agent models,thus addressing the challenges put forward in (Valiant, 2003; Wooldridge, 2009).

4. Neural-Symbolic Integration in and for Cognitive Science: BuildingMental Models

Thus far, we elaborated on the general theoretical and conceptual foundations of neural-symbolic integration and discussed NSCA as example of a successful implementation andapplication deployment of a neural-symbolic system. In what follows, we now return in moredetail to some of the most pressing theoretical and applied core questions—including thebinding problem, connectionist first-order logic (FOL) learning, the combination of proba-bilities and logic in Markov Logic Networks, and the relationship between neural-symbolicreasoning and human-level AI—and describe our current state of knowledge, as well aspotential future developments, in the following sections. We start with an introduction tothe intimate relation between neural-symbolic integration and core topics from cognitivescience.

To fully explain human cognition, we need to understand how brains construct andmanipulate mental models. Clarification of this issue would inform the development of(artificially) intelligent systems and particularly neural-symbolic systems. The notion ofmental model has a long history, perhaps most notably being used by Johnson-Laird todesignate his theory of human reasoning (Johnson-Laird, 1983). Taking partial inspirationfrom (Thagard, 2010) and (Nersessian, 2010), we interpret a mental model as a cognitive

14

Neural-Symbolic Learning and Reasoning

representation of a real or imagined situation, the relationship amongst the situation’sparts and, even perhaps, how those parts can act upon one another. These models willnecessarily preserve the constraints inherent in what is represented, and their evaluationand manipulation has been argued to be key to human reasoning (Johnson-Laird, 1983).For example, sight of an empty driveway and a scattering of broken glass, might conjurea mental representation (i.e. a model) of a felon breaking the window of your car, leadingyou to reason that your car had been stolen. As a more abstract example, when ponderingon why the Exclusive Or problem is linearly inseparable in the context of network learning,one might visualise a two dimensional plane containing four regularly-placed crosses (twofor true, two for false) and mentally consider placement of classification boundary lines. Byexploring such a mental model, one might convince oneself that a line partitioning true andfalse outputs cannot be found.

It has particularly been argued that construction and inference on mental models mightplay a major role in abductive reasoning and, even, creativity (Thagard & Stewart, 2011),our car theft example being a case of the former. Exactly how mental models and theirmanipulation are neurally realised, though, remains largely uncertain. Central to construc-tion of a mental model is the formation of a combined (superordinate) whole representationfrom sets of (subordinate) part representations. For example, in the car theft example,a composite of your car and a hooded figure breaking a window might be created frommemory and imagination of car and figure in isolation.

The most basic requirement of a neural realisation of such representation combinationis a solution to the so called “binding problem”, i.e. identification of a general neural mech-anism to represent which individual assemblies of active neurones (e.g. those representingred Ford and those representing hooded figure) are associated and which are not (also seeSection 5.2). Many proposals have been made for the brain’s binding mechanism, e.g. tem-poral synchrony (Engel & Singer, 2001), conjunctive codes (O’Reilly, Busby, & Soto, 2003;Bowman & Wyble, 2007; Rigotti, Rubin, Wang, & Fusi, 2010) and even convolution-basedapproaches (Thagard & Stewart, 2011). Of these, conjunctive codes seem to involve thesmallest step from traditional rate-coded ANNs, e.g. spiking and oscillatory dynamics arenot assumed, and would thus seem most naturally integrable with neural-symbolic systemsas currently formulated.

Under the conjunctive codes approach, it is assumed that units that respond selectivelyto the coactivation of multiple item representations are available. Such conjunctive unitsmight reside in a widely accessible binding resource, such as, say, the Binding Pool in(Bowman & Wyble, 2007). Two challenges to conjunctive code binding approaches (and,by extension, to neural representation of mental models) are, (i) scalability, and (ii) novelconjunctions:

• A binding resource that exhaustively enumerates all possible combinations of repre-sentations as unique (localist) units does not scale. Thus, realistic conjunctive codestypically assume that the binding resource employs distributed representations, whichprovide more compact, and indeed scalable, codings of binding associations, see forexample, the explorations of this issue in (Rigotti et al., 2010) and (Wyble & Bowman,2006).

15

Besold et al.

• We often experience representation combinations that we have never experienced be-fore. A standard example is the proverbial blue banana or, in the context of the cartheft example, it is likely that a mental model conjoining a hooded figure, a red Ford,and a particular driveway would never have been previously experienced. (Indeed,the fact that every experience must—necessarily—have a first occurrence makes thepoint.) As a result, conjunctive coding approaches propose randomly preconfiguredbinding resources with such discrimination capacity that they can effectively conjoinany combination of item representations that might arise in the future; the model ofFrontal lobe function in (Rigotti et al., 2010) is such an approach.

Neural conjunctive coding methods of this kind would, then, seem prerequisite to brain-based theories of mental models.

Clearly, a capacity to construct mental models is of limited value unless it helps us tomake decisions and, in general, to reason, e.g. to determine that a crime has been committedand that the Police should be called. Two ways in which modern cognitive neurosciencesuggests such reasoning could arise is through either statistical inference, or deliberativeexploration. The former of these may employ statistical inference based upon patternmatching, in the manner of classic ANNs, as reflected in use of the descriptor “associativelearning” in this context. In contrast, the latter would be more explicitly deliberative innature and, accordingly, would involve the serial exploration of a space of hypotheticalpossibilities. Such a process might most naturally be framed in terms of classic symboliccomputation and, perhaps, search in production system architectures (Newell, 1994). Inaddition, the statistical inference process might be viewed as more opaque to consciousawareness than deliberation.

Dual process theories of reasoning, e.g. (Evans, 2003), have argued that a subdivision ofthis kind is fundamental to human reasoning. With regard to our main interest here, though,the second of these forms of reasoning, deliberative exploration, seems most naturally toembrace the notion of mental model, with its connotation of forming conscious hypotheticalrepresentations.

Modern cognitive neuroscience also emphasises the role of emotion/body-state evalua-tions in reasoning. In particular, emotion may be viewed as providing an assessment of theevaluative quality of a situation or alternative, much in the manner of (Damasio, 2008)’ssomatic markers. But importantly, theories of affect, such as, say, appraisal theory (Scherer,1999), suggest a multidimensional space of emotionally-charged feelings, richer than thatrequired of simple reward and punishment or utility function. For instance, both disgustand fear are negatively valenced, marking, if you like, punishing experiences; however,the aversive responses to the two are typically quite different. In particular, only fear islikely to engage highly urgent flight or fight responses, associated with immediate physicalrisk. Indeed, it could be argued that a key function of emotions is to mark out responsetypes and, at least in that sense, the distinctions between emotions on many dimensionsare cognitively significant. This is relevant to cognitive and neuroscientific accounts ofneural-symbolic integration in that mental models might, then, be viewed as “stamped”with affective evaluations, which would be bound (in the manner previously discussed) tononaffective portions of a mental model, yielding a rich affect-coloured high dimensionalrepresentation.

16

Neural-Symbolic Learning and Reasoning

4.1 Cognitive Neuroscience and Neural-Symbolic Methods

Cognitive neuroscience (Gazzaniga, Ivry et al. 1998) is a major current scientific project,with the explanation of cognitive capacities (such as, perception, memory, attention, andlanguage) in terms of their implementation in the brain as central objective. Computationalmodelling (especially when ANN-based) is a key ingredient of the research programme. Con-ceptually, such modelling provides the “glue” between cognition and the brain: it enablesconcrete (disproveable) explanations to be formulated of how cognitive capacities could begenerated by the brain. In this sense, models that explain cognitive behaviour as well asneuroimaging data do a particularly good job of bridging between the cognitive and theneural. Examples of such models include (Bowman 2006; Chennu, Craston et al. 2009;Cowell and Cottrell 2009; Craston, Wyble et al. 2009), and the whole Dynamic CausalModelling project (Friston, Harrison et al. 2003) can be seen to have this objective.

This then brings to the fore the suitability of different cognitive modelling paradigmsin Cognitive Neuroscience; that is, how should computational models in Cognitive Neu-roscience be formulated? If we consider the cognitive and brain sciences in very broaddefinition, although computational models of many varieties have been employed, there aretwo dominant traditions. The first of these is symbolic modelling, which would most oftenbe formulated in Production System architectures, such as SOAR (Newell, 1994), EPIC(Kieras, Meyer, Mueller, & Seymour, 1999) and ACT-R (Anderson & Lebiere, 2014; Ander-son, Fincham, Qin, & Stocco, 2008). In contrast, the second tradition is network modelling,ranging from abstract connectionist, e.g. (Rumelhart, McClelland, & PDP Research Group,1986), to neurophysiologically detailed approaches, e.g. (Hasselmo & Wyble, 1997; Fransen,Tahvildari, Egorov, Hasselmo, & Alonso, 2006). From the earliest computation-based pro-posals for the study of mind and brain, which probably date to the 1950s, pre-eminencehas oscillated between symbolic and neural network-based. However, modern interest inthe brain and mapping cognition to it, has led to a sustained period of neural network pre-eminence and, certainly in the cognitive neuroscience realm, symbolic modelling is now rare;the most prominent exceptions being (Anderson et al., 2008; Taatgen, Juvina, Schipper,Borst, & Martens, 2009).

There does though remain a minority who dissent from the neural networks empha-sis. Some have taken a very strongly anti-neural networks perspective, e.g. (Fodor &Pylyshyn, 1988), arguing that thought is fundamentally symbolic—indeed linguistic, c.f.(Fodor, 2001)—and that it should thus necessarily be studied at that level. Computer sys-tems are often taken as a justification for this position. It is noted that software is theinteresting level, which governs a computer’s functional behaviour at any moment. In con-trast, it is argued that the mapping to hardware implementation, e.g. compilation down toassembly code, is a fixed predefined transformation that does not need to be reprogrammedwhen the algorithm being run is changed. That is, the algorithm being executed is deter-mined by software, not the mapping to hardware, or to paraphrase, in computer systems,software is where the functional complexity resides. The key point being that software issymbolic. If one views cognition as marked by its functional richness, one might then con-clude that cognition is most appropriately studied symbolically and that the mapping tothe brain is a fixed uninteresting implementation step, which does not reflect the richnessand variety of cognitive experience and behaviour.

17

Besold et al.

This position does, of course, hinge upon an analogy between the structure of computersystems and the structure of the mind-brain. Many would reject this analogy. Indeed, themajority of those that express dissent from network modelling, take a less extreme line.While acknowledging the relevance of brain implementation, they argue that ANNs areexpressively limited as a cognitive modelling method, or—at least—emphasise that ANNsdo not lead to natural modelling of certain cognitive functions. The following are cases inpoint.

(i) Rule-guided problem solving: Duncan has explored the human capacity to en-code and apply sequences of rules governing how to perform mental tasks (Duncan, Parr,Woolgar, Thompson, Bright, Cox, Bishop, & Nimmo-Smith, 2008). Such task rules statethat fulfilment of a set of properties mandates a response of a particular kind. Furthermore,these properties and the relationships between them could be quite complex, e.g. “if allthe letters on one side of the screen are in capitals and an arrow points diagonally up tothe right on the other side then respond in the direction indicated by the big triangle”.Duncan and co-workers have shown that performance on their task rule experiments isstrongly correlated with fluid intelligence and thus, with IQ. Performance is also associatedwith activation of brain areas believed to be involved in effortful task-governed behaviour(Dumontheil, Thompson, & Duncan, 2011). Thus, these task rule experiments seem tobe revealing the neurocognitive underpinnings of intelligence and problem solving; a keyconcern for science. A natural way to think about these task rules is as symbolic operationswith logic preconditions and actions that define a particular response. In this context, par-ticipants are seeking to correctly evaluate preconditions against presented stimuli and applythe corresponding (correct) action. Clearly, this perspective could be directly reflected ina production system model. Finally, one of the key questions for task rule research ischaracterising how errors are made. To a large extent, errors seem to arise from precondi-tions and/or actions migrating between operations (i.e. between rules). Again, such errorscould naturally be modelled in a production system formulation in which preconditions andactions could misbind to one another.

(ii) Central Executive Function: The notion of a centralised control system thatguides thought (a Central Executive) is common in information processing theories of cogni-tion, c.f. Baddeley and Hitch’s working memory model (Baddeley, 2000), Shallice’s Super-visory Attentional System (Norman & Shallice, 1986) and the Central Engine in Barnard’sInteracting Cognitive Subsystems (Barnard, 1999). The Central Executive is, for example,hypothesised to direct sensory, perceptual, and motor systems, particularly overriding pre-potent (stereotyped) responses. Accordingly, it would be involved in mapping the organismscurrent goals to a strategy to realise those goals (a, so called, task set). In addition, it wouldimpose that task set upon the brain’s processing through centralised control. Indeed, theCentral Executive would be involved in encoding and applying task rules of the kind justdiscussed, and also conscious exploration of sequences of alternatives in search of a strategyto obtain a goal, i.e. what in AI terms would be called planning. Neurophysiological local-ization of the Central Executive has focused on the Frontal lobe and particularly problemsolving deficits arising from frontal lobe damage (Norman & Shallice, 1986). A number ofthe most prominent workers in this field have formulated their models in symbolic produc-tion systems, rather than ANNs. One such approach employs the COGENT high-levelmodelling notation (Shallice & Cooper, 2011), in which components, coded in production

18

Neural-Symbolic Learning and Reasoning

systems, execute in parallel, subject to intercomponent interaction. In particular, symbolicAI methods seem well suited to modelling some classic Central Executive (frontal lobe)tasks such as, Towers of Hanoi, Towers of London, and planning in general (Shallice &Cooper, 2011).

(iii) Syntactic structures: A long running debate in psycholinguistics has contrastedrule-based and association-based explanations of language morphology, with Pinker typi-cally carrying the flag for the former and McClelland and co-advocates for the latter (Pinker& Ullman, 2002). The hottest debate has been focused on a particularly distinctive (U-shaped) profile of developmental data arising when children learn the English past tenseinflection; i.e. when they determine that addition of an “ed” suffix enforces a past tenseinterpretation (e.g. “stay” to “stayed”), subject to many exceptions (e.g. “run” to “ran”).Although focused on one specific aspect of language, this debate serves as a sounding-board for a broader consideration: are the regular syntactic and morphological aspects oflanguage in general, best viewed as symbolic rules or as emergent from the training of a Par-allel Distributed Processing (PDP)-style network. This debate is specific to regular aspectsof language; there is more agreement that exceptions could be naturally handled by PDPnetworks. The critical point for the connectionists is that, in the PDP case, there is no senseto which any particular rule or, indeed, an abstract notion of rule, is built into the networka priori; the network, in a sense, “discovers” regularity; indeed, if you like, the concept ofregularity. Although the rules vs. associations debate will surely run and run, in termsof our focus here, it demonstrates an—at the least perceived—mismatch between classicconnectionist approaches and the characteristics of cognitive behaviour. Furthermore, theproposed response to this mismatch is a symbolic modelling metaphor, which could natu-rally be realised in classical AI methods, such as, production systems, logic programming,etc.

(iv) Compositionality: One characterisation of the problem connectionism suffers isthat it does not naturally reflect the representational compositionality inherent to manycognitive capacities (Fodor & Pylyshyn, 1988). For example, when parts are combinedinto wholes, those parts, at least to a large extent, carry their interpretation with them,e.g. “John” in “John loves Jane” has, ostensibly, the same meaning as it does in “Janeloves John”. Although different classes of ANNs adhere to this characteristic to differentextents, it certainly seems that classic PDP-style models do not automatically generaterepresentational compositionality. At least when taken on face value, PDP models exhibitposition-specific coding schemes; that is, the same item (e.g. “John”) would have to beseparately learned in position X and position Y, with no automatic interpretational carry-over from one position to the other. Consequently, the item may have a very differentinterpretation in the two positions and, indeed, what has been learned about “John” fromits appearance at position X (where it might have been seen frequently) would not au-tomatically inform the understanding of “John” at position Y (where it might have beenseen rarely, if at all). This does not fit with subjective experience: the first time we see“John” in a particular grammatical position, having seen it elsewhere frequently, does not,it would seem, manifest as a complete unfamiliarity with the concept of “John”. Althoughnot always explicitly framed in terms of compositionality, the intermittent attacks on PDPapproaches by proponents of localist neural coding schemes have a similar flavour (Page,

19

Besold et al.

2000). Indeed, localists’ criticism of the slot-coding used in PDP models of word reading(Bowers, 2002) is very much a critique of position-specific coding.

This dissatisfaction with connectionism suggests a symbolic perspective, since symbolicmodels are foundationally compositional in the fashion ANNs are argued not to be. If oneaccepts this weight of argument, one is left with the perspective that while the link to thebrain is certainly critical (indeed, the brain does only have neurones, synapses, etc., toprocess with), it may not be practical to map directly to neural implementation. Rather,cognitive modelling at an intermediate symbolic level may be more feasible.

In particular, ANNs have certainly made a profound contribution to understandingcognition. This though has been most marked for a specific set of cognitive faculties.For example, compelling models of sensory processes have been developed, e.g. stages ofthe “what” visual processing pathway (i.e. the ventral stream), c.f. (Li, 2001; Raizada &Grossberg, 2003; Serre, Oliva, & Poggio, 2007); and face-specific processing within the whatpathway, cf. (Burton, Bruce, & Johnston, 1990; Cowell & Cottrell, 2009). Mature neuralmodels of visual attention through both space and time have also been proposed, e.g. (Itti,Koch, & Niebur, 1998; Heinke & Humphreys, 2003; Bundesen, Habekost, & Kyllingsbæk,2005; Bowman & Wyble, 2007). There are also sophisticated neural models of memorysystems and learning in general, e.g. (Bogacz & Brown, 2003; Norman & O’Reilly, 2003;Davelaar & Usher, 2004). Furthermore, many of these models are compelling in theirneurophysiological detail.

However, as justified above, there are areas where neural modelling has had less success.In general terms, architectural-level neural models are scarce, i.e. models that are broad-scope, general-purpose, and claim relevance beyond a specific cognitive phenomenon. Thereare, though, many symbolic architecture-level models of mind, e.g. SOAR (Newell, 1994)—possible reasons for this situation have been discussed, among others, in (Bowman & Su,2014). It may also be the case that neural explanations become less compelling as cognitivefunctions become more high-level; that is, they are well suited to characterising “peripheral”systems, such as, vision, audition, and motor action, but they do less well as one stepsup the processing hierarchy away from input and output. In particular, what Slomanwould call deliberative processing (Sloman & Logan, 1999), which might embrace Duncan’stask rule experiments, planning and, at least elements of, Central Executive function, areless naturally modelled with ANNs. To emphasise again, we are not saying that suchcognitive capacities cannot in some fundamental sense be neurally modelled, but rather weare suggesting it might not be the most natural method of description.

As already suggested, a possible response to this situation is to consider whether map-ping to neural substrate could be expedited by breaking the modelling problem into twosteps: the first a symbolic modelling of cognitive behaviour and the second a mappingfrom symbolic to neural. From a philosophical perspective, one would be arguing that thesymbolic intermediary more directly reflects the functional characteristics of the cognitivecapacity in question, leading the modeller to a, somehow, more faithful model at the cog-nitive level. Furthermore, the suggestion would be that such a functional characterisationwould also strongly constrain the neural implementation and vice-versa, via a direct map-ping to ANNs, i.e. effectively a compilation step (which may nevertheless be necessarycomputationally for the sake of efficiency).

20

Neural-Symbolic Learning and Reasoning

The first, cognitive behaviour to symbolic model, step might very naturally yield someform of computational logic model, perhaps formulated in a production systems architecture,such as SOAR (Newell, 1994), ACT-R (Anderson & Lebiere, 2014) or EPIC (Kieras et al.,1999), or as a logic program. Either way, the second mapping would involve some formof computational logic to ANN transformation—exactly the topic of the neural-symbolicproject. Thus, in this respect, application of the results of neural-symbolic research couldmake a significant contribution to Cognitive Neuroscience.

However, such an application raises a number of research topics. Perhaps the most sig-nificant of these is to develop mappings from computational logic to neurophysiologicallyplausible neural networks. This requires consideration of more complex activation equa-tions that directly model ion channels and membrane potential dynamics (e.g. based uponHodgkin-Huxley equations (Hodgkin & Huxley, 1952)); it also suggests approaches thattreat excitation and inhibition distinctly and employ more biologically-realistic learning,such as contrastive Hebbian learning, as arises in, say, O’Reilly and Munakata’s GeneralisedRecirculation Algorithm. A good source for such neural theories is (O’Reilly & Munakata,2000).

5. Putting the Machinery to Work: Binding and First-Order Inference ina Neural-Symbolic Framework

While it should have become clear that it seems plausible to assume that human cognitionproduces and processes complex combinatorial structures using neural networks, on thecomputer science and systems engineering side such systematic and dynamic creation ofthese structures presents challenges to ANNs and for theories of neuro-cognition in general(Fodor & Pylyshyn, 1988; Marcus, 2003; Van der Velde & De Kamps, 2006; Feldman,2013). We now will zoom in on concrete work relating to the problem of representingsuch complex combinatorial structures in ANNs—with emphasis on FOL representations—while efficiently computationally learning and processing them in order to perform high-levelcognitive tasks. Similar to the presentation of NSCA in Section 3 as application example forthe theoretical and conceptual considerations from Section 2, the following shall—besidespresenting a relevant body of work from neural-symbolic integration—provide a practicalcounterpart and grounding for some of the cognition and cognitive neuroscience-centreddiscussion of the previous Section 4.

Specifically, this section describes how predicate logic can efficiently be represented andprocessed in a type of recurrent network with symmetric weights. The approach is basedon a (variable) binding mechanism which encodes compactly first-order logic expressions inactivation. Processing is then done by encoding symbolic constraints in the weights of thenetwork, thus forcing a solution to an inference problem to emerge in activation.

5.1 A Computational View on Fodor, Pylyshyn, and the Challenge(s) forModelling Cognition

One of the computationally most influential and relevant positions relating cognition andneural-symbolic computation has been developed by (Fodor & Pylyshyn, 1988). There, twocharacteristics are deemed essential to any paradigm that aims at modelling cognition: (i)combinatorial syntax and semantics for mental representations and (ii) structure sensitivity

21

Besold et al.

of processes. The first characteristic allows for representations to be recursively built fromatomic ones in a way that the semantics of a non-atomic structure is a function of thesemantics of its syntactic components. The second characteristic refers to the ability tobase the application of operations on a structured representation, on its syntactic structure.

Several works partially answered their call, yet none could capture the full expressivepower of FOL: Even with limited expressiveness, most attempts to address Fodor’s andPylyshyn’s challenges are extremely localist, have no robustness to neural damage, havelimited learning ability, have large size-complexity, and require ad hoc network engineeringrather than using a general purpose mechanism which would be independent of the respec-tive knowledge base (KB) (Anandan, Letovsky, & Mjolsness, 1989; Ballard, 1986; Garcezet al., 2009; Holldobler, 1990; Holldobler & Kurfess, 1992; Stolcke, 1989; Van der Velde &De Kamps, 2006). Additionally, only few systems used distributed representations—andeven those which do typically suffer from information loss, and little processing abilities(Plate, 1995; Pollack, 1990; Stewart & Eliasmith, 2008).

5.2 The Binding Problem Computationally Revisited

Despite the large body of work (non-exhaustively) summarised in Section 4 and the cor-responding many attempts to approach Fodor’s and Pylyshyn’s criticism, computationalneural modelling of high-level cognitive tasks and especially language processing is stillconsidered a hard challenge. Many of the problems in modelling are taken to very likelybe related to the already previously described “binding problem”. Following the definitionby (Feldman, 2013), in short the general binding problem concerns how items, which areencoded in distinct circuits of a massively parallel computing device, can be combined incomplex ways, for various cognitive tasks. In particular, binding becomes extremely chal-lenging in language processing and in abstract reasoning (Barrett, Feldman, & Dermed,2008; Jackendorff, 2002; Marcus, 2003; Van der Velde & De Kamps, 2006), where one mustdeal with a more demanding, special case: variable binding. The underlying question isnarrowed down to: How are simple constituents glued to variables and then used by severaland distinct parts of a massively parallel system, in different roles? Following (Jackendorff,2002), the binding problem can be further decomposed into four constitutive challenges:(i) the massiveness of the binding problem, (ii) the problem of multiple instances of a fact,(iii) the problem of associating values to variables and (iv) the relation between bindingsin short-term working memory (WM) as opposed to bindings in long-term memory (LTM).When jointly looking at all four, the binding mechanism seems to be the fundamental ob-stacle as each individual item depends on it. But even if a binding mechanism should existand enables the representation of complex structures in neurone-like units, it is not clearhow they are processed in order to achieve a goal and how the procedural knowledge neededfor such processing is encapsulated in the synapses. Specifically, in the computational case,when FOL expressions shall be processed using just neural units with binary values, howshould the respective formulae be encoded? How are neural-ensembles, representing simpleconstituents, glued together to form complex structures? And how are these structuresmanipulated for unification and ultimately for reasoning?

As a first step, a binding mechanism is required that will allow to “glue” togetheritems in a KB, such as predicates, functions, constants, and variables, and then apply it

22

Neural-Symbolic Learning and Reasoning

in manipulating these structures. Correspondingly, several attempts have been made totackle the variable binding problem (Anandan et al., 1989; Barrett et al., 2008; Browne &Sun, 1999; Shastri & Ajjanagadde, 1993; Van der Velde & De Kamps, 2006). But while therespective approaches have greatly contributed to our understanding of possible mechanismsat work, it turned out that they each still have signifiant shortcomings related to limitedexpressiveness, high space requirements, or central control demands.

5.3 Inference Specifications as Fixed Points of ANNs

Instead of developing a bottom-up, modular solution to performing reasoning with ANNs,let us examine the opposite approach: specifying what should be the possible results of areasoning process as fixed points (or stable states) of an ANN. In every ANN that reachesequilibrium, converging onto a stable state (instead of oscillating), the fixed points of thenetwork—given clamped inputs—can be associated with solutions for the problem at hand.In particular, ANNs with a symmetric matrix of weights such as, for instance, the alreadydiscussed Boltzmann machines from Section 3, perform gradient descent in an energy func-tion, whose global minima can be associated with solutions to the problem. Although sym-metric (energy minimisation) networks are the straightforward architecture choice, other(non-symmetric) architectures may be used as long as stable states emerge and those stablestates correspond to solutions of the problem at hand. When using symmetric networks,the general strategy is to associate the problem-solutions with the global minima of theenergy equation that corresponds to the network. In particular, as the “problem” is FOLinference, valid inference chains are the solutions, which are mapped into the global minimaof the energy function.

It has been shown that high-order symmetric ANNs (with multiplicative synapses)are equivalent to “standard” symmetric ANNs (with pairwise synapses) with hidden units(Pinkas, 1991b). In the corresponding simulations, high-order variants of Boltzmann Ma-chines are therefore used, as they are faster to simulate and have a smaller search spacethan the “standard” Boltzmann machines with hidden units and only pair-wise connections.The declarative specifications (constraints) characterising valid solutions are compiled (orlearned) into the weights of a higher-order Boltzmann machine. The first-order KB is eithercompiled into weights, or it is clamped onto the activation of some of the visible neurones.A query, short-term facts, and/or optional cues may also be clamped onto the WM. Oncethe KB is in its place (stored as long-term weights) and queries or cues are clamped ontothe WM, the ANN starts performing its gradient descent algorithm, searching for a solutionfor the inference problem that satisfies the constraints stored as weights. When the networksettles onto a global minimum, such a stable state is interpreted as a FOL inference chainproving queries using resolution-based inference steps (Pinkas, 1991a; Lima, 2000; Pinkas,Lima, & Cohen, 2012, 2013).

5.3.1 ANN Specifications Using Weighted Boolean Constraints

The first step in the process is the specification of the problem (e.g., FOL inference) as aset of weighted propositional logic formulae. These formulae represent Boolean constraintsthat enforce the neurones to obtain zero/one assignments that correspond exactly to thevalid FOL inference solutions.

23

Besold et al.

Figure 2: A symmetric 3-order network, characterised by a 3-order energy function: XY Z−3XY + 2X + 2Y . Minima of this energy function are fixed points of the network. Thisnetwork searches for satisfying solutions for the weighted conjunctive normal form (CNF):(¬X ∨ ¬Y ∨ Z) ∧ (X ∨ Y ). Note that the clauses of the weighted CNF are augmented bypenalties reflecting the importance of each constraint.

Satisfying a propositional formula is the process of assigning truth-values (true/false)to the atomic variables of the formula, in such a way that the formula turns out to be true.Maximally satisfying a weighted set of such formulae is finding truth-values that minimisethe sum of the weights of the violated formulae. By associating the truth-value true with1 and false with 0, a quadratic energy function is created, which basically calculates theweighted sum of the violated constraints. Conjunctions are expressed as multiplicativeterms, disjunctions as cardinality of union, and negations are expressed as subtraction from1. In order to hint the network about how close it is to a solution, the conjunction ofweighted formulae is treated as a sum of the weights of Boolean constraints that are notsatisfied. It is possible to associate the problem of maximal-weighted-satisfiability of theoriginal set of weighted formulae with finding a global minimum for the energy equation thatcorresponds to the problem specification (Pinkas, 1991b). In other words, not satisfying aclause would incur an increase of the system’s energy. One interesting and powerful sideeffect of such a mapping is the possibility of partially satisfying an otherwise unsatisfiableset of formulae. This phenomenon is used for specifying optimal and preferred solutions asin: finding a shortest explanation or proof (parsimony), finding most general unifications,finding the most probable inference or most likely explanation, etc. When different levelsof weights are associated with the formulae of the KB, a non-monotonic inference engineis obtained based on Penalty Logic. In fact, in (Pinkas, 1995) equivalence has been shownbetween minimising the energy of symmetric ANNs and satisfiability of penalty logic, i.e.,every penalty logic formula can be efficiently translated into a symmetric ANN and viceversa. Figure 2 gives an example for a network instantiation of a satisfiability problem.

Based on these results, a compiler has been built that accepts solution specificationsas a form of index-quantified penalty logic formulae and generates ANNs that seek formaximally satisfying solutions for the specified formulae. This compiler can then be used togenerate ANNs for a variety of FOL retrieval, unification, and inference problems, involvingthousands of neural units and tens of thousands of synapses. Alternatively to compilation,

24

Neural-Symbolic Learning and Reasoning

these specification constraints could also be learned (unsupervised) using an anti-Hebbianlearning rule, whenever a fixed point is found that does not satisfy some constraints (Pinkas,1995).

5.3.2 ANN Specifications of FOL Inference Chains

There are two possible approaches to logical reasoning, one that uses Model Theory andthe other Proof Theory. Informally, Model Theory confirms that a statement could beinferred by checking all possible models that satisfy a KB, while Proof Theory appliesinference steps to generate a proof for the statement in question. Most attempts at doingFOL inferences in ANNs—or using propositional satisfiability in general—employ model-theoretic techniques based on grounded KBs (Clarke, Biere, Raimi, & Zhu, 2001; Garcezet al., 2002; Domingos, 2008; Holldobler & Kurfess, 1992). However, grounding resultsin either exponential explosion of the number of boolean variables (neurones) or severelimitations to the expressive power of the FOL language (e.g. no function symbols and noexistential quantifiers). Proof Theory techniques may use ungrounded FOL formulae andare more efficient in size. Another consequence of having a proof for a query statementis that this proof can be seen as an explanation for that statement, which is not alwayspossible to obtain when model-theoretic methods are used. A proof-theoretic approachwas, for instance, used by (Pinkas, 1991a) and allows for a variety of inference rules (e.g.Resolution, Modus Ponens, etc.) with either clamped KB or with the KB stored in weights.In (Lima, 2000), a FOL theorem prover has been studied with resolution and a knowledgebase that is clamped entirely in activation. In (Pinkas et al., 2013) most general unificationwith occurrence checking is implemented using a more recent binding mechanism with areduced size complexity.

After the query and/or cues had been clamped onto the WM, the network is set to rununtil a solution emerges as a global minimum of the network’s energy function. This stablestate may be interpreted as a proof or a chain of FOL clauses. Each clause is obtainedby either retrieving it from the LTM, by copying it from previous steps (optionally withcollapse), or by resolving two previous clauses in the chain. In every step, the need tounify resolved literals may cause literals to become more specific by variable binding. It ispossible to construct an ANN that searches for proofs by refutation (a complete strategythat derives a contradiction when the negated query is added to the KB) or directly infera statement. Directly inferring the conclusion, though not a complete strategy, bears moreresemblance to human reasoning, and seems more cognitively plausible.

Variations of the above basic inference mechanism use competition among proofs and canbe used for inferring the most likely explanation for a query, most probable explanation,most probable inference, least expensive plan or most preferable action (as in Asimov’srobotic laws).

5.4 A Fault Tolerant Mechanism for Dynamic Binding

A compact binding mechanism is at the foundation of the described FOL encoding and ofthe general purpose inference engine. The binding technique is the basic mechanism uponwhich the WM is constructed and it is this mechanism that enables the retrieval of theneeded knowledge from LTM and then unification and inference.

25

Besold et al.

Contrary to temporal binding (Shastri & Ajjanagadde, 1993), it uses spatial (conjunc-tive) binding which captures compositionality in the stable state of the neurones and is notsensitive to timing synchronisation or time slot allocation. Although several conjunctive(non-temporal) binding mechanisms have been proposed, the one in (Pinkas et al., 2012)compactly uses only few ensembles of neurones, each of which can bind any of the manyknowledge items stored in the LTM. This property has three side effects: (i) reduction inthe number of neurones (units), as binders are only needed to represent a proof (and notto capture all the KB), (ii) fault tolerance, as each binding ensemble is general purposeand may take the place of any other failed binding ensemble, and finally (iii) high expres-sive power, as any labelled directed graph structure may be represented by this mechanism(FOL as a special case).

This binding technique applies crossbars of neurones to bind FOL objects (clauses,literals, predicates, functions, and constants) using special neural ensembles called GeneralPurpose (GP) binders. These GP-binders act as indirect pointers, i.e., two objects arebound if and only if a GP-binder points to both of them. Since GP-binders can pointto other GP-binders, a nested hierarchy of objects may be represented using such binderspointing at the same time to objects/concepts and to other binders. In fact, arbitrarydigraphs may be represented and, as special cases, nested FOL terms, literals, clauses, andeven complete resolution-based proofs (Pinkas, 1991a; Lima, 2000). Manipulation of suchcompound representations is done using weighted Boolean constraints (compiled or learnedinto the synapses), which cause GP-binders to be dynamically allocated to FOL objects,retrieved from the LTM. Since the GP-binders can be used to bind any two or more objects,the number of binders that is actually needed is proportional to the size of the proof. As aresult only few such binders are actually needed for cognitively-plausible inference chains,causing the size complexity of the WM to be drastically reduced to O(n · log(k)) where nis the proof length and k is the size of the KB (Pinkas et al., 2013). Interestingly, the faulttolerance of this FOL WM is the result of the generic nature of the binders, which is due toredundancy in synapses rather than to redundancy in units. GP-binders happen to addressmost of the challenges stated by (Jackendorff, 2002) and summarised above.

5.5 Examples of Inferences, Cues, and Deductive Associations

To illustrate the GP-binder approach, assume a KB consisting of the following rules:

• For every u, there exists a mother of u who is also a parent of u: Parent(mother(u), u)

• For all x, y, z: Parent(x, y) ∧ Parent(y, z)→ GrandParent(x, z)

• More KB items: facts and/or rules optionally augmented by weights

A general purpose resolution-based inference engine and a WM implemented in a vari-ation of Boltzmann Machines is compiled (or learned) with the above KB stored in itssynaptic connections. When clamping the query: GrandParent(V,Anne) onto some ofthe visible units in the WM, while settling onto a global energy minimum, the network iscapable of binding the variable V to the term representing “the mother of the mother ofAnne” and finding the complete inference chain as a proof for that query:

26

Neural-Symbolic Learning and Reasoning

1. Retrieving and unifying rule 1: Parent(mother(Anne), Anne)

2. Retrieving and unifying rule 2: ¬Parent(mother(mother(Anne)),mother(Anne))∨¬Parent(mother(Anne), Anne)∨ GrandParent(mother(mother(Anne)), Anne)

3. Resolving 1 and 2: ¬Parent(mother(mother(Anne)),mother(Anne))∨GrandParent(mother(mother(Anne)), Anne)

4. Retrieving rule 1 again but with different unification (multiple instances):Parent(mother(mother(Anne)),mother(Anne))

5. QED by resolving 3 and 4: GrandParent(mother(mother(Anne)), Anne)

FOL clauses, literals, atoms, and terms are represented as directed acyclic graphs whileGP-binders act as nodes and labeled arcs. The GP-binders are dynamically allocated with-out central control and only the needed rules are retrieved from the KB, in order to inferthe query (the goal). Other rules that are stored in the KB (in the LTM) are not retrievedinto the WM because they are not needed for the inference.

The query may be considered as a kind of cue in the WM, that directs the network’s“attention” to searching for the grandparent of Anne; yet, other cues may be providedinstead or in addition to the query: for example, we can accelerate the system’s convergenceto the correct solution by specifying in the WM that rule 1 and/or rule 2 should be used, or,even better, by clamping these rules onto the WM activation prior to the network operation.

This ANN inference mechanism could also be interpreted as an intelligent associativememory because cues generate a chain of deduction steps that leads to complex associations.For example, instead of a query, if just “Anne” and “mother” are clamped as cues in theWM this amounts to asking the system to infer something about the person “Anne” and theconcept “mother”. In this case the system will converge to one of several possible inferences(each with a known probability). For instance, the system may find that Anne has a parentwho is her mother. Alternatively, it may find that Anne has a grandparent (the mother ofher mother), or that Anne’s mother has a mother and that mother also has a mother. Thelikelihood of the possible emerging associations is determined by such factors as the weightsof the LTM (i.e. penalties of the FOL KB items, stored in the synapses) and the depth ofthe inference chains.

5.6 Can Neurones Be Logical? Fundamental Negative and Positive Results

The just described work provides evidence that unrestricted (but memory-bound) FOLinference can be represented and processed by relatively compact, attractor-based ANNssuch as the previously introduced Boltzmann machines. Nevertheless, in (Pinkas & Dechter,1995) it has also been shown that distributed neural mechanisms—among which also Boltz-mann Machines count—may have pathological oscillations in ANNs that have cycles andmay never find a global solution. The related positive result is that when the ANN has nocycles, it is possible to have a neural activation function that guarantees finding a globalsolution. Once a cycle is added to the network, no activation function is able to elimi-nate all the pathological scenarios above. Unfortunately, the outlined FOL networks havemany cycles and are therefore doomed to be incomplete even with a bound on the proof

27

Besold et al.

length. Luckily with some plausible assumptions on the daemon, one can guarantee thatsuch pathological scenarios will not last forever. Another major obstacle for achieving goodperformance is that Boltzmann Machines and other symmetric networks use stochastic localsearch algorithms to search for a global minimum of their energy function. Theoreticallyand practically, local minima may plague the search space, so such ANNs are not guaran-teed to find a solution in a finite time. Current research is trying to solve this problem byusing many variations of Hebbian learning methods and variations of Boltzmann machinesto change the synapses of an ANN in a way that local minima are eliminated and a globalminimum is found faster. As it turns out, the recent success of deep learning mostly alsoemerges from the same variations but from a more practical perspective. Learning stylesof this type are called “speed-up practicing” as they aim to use learning as a strategy forself-improvement.

6. Connectionist First-Order Logic Learning Using the Core Method

As has already become clear in the previous sections, one of the central problems at thecore of neural-symbolic integration is the representation and learning of first-order rules(i.e. relational knowledge) within a connectionist setting. But there is a second motivationbesides the incentives coming out of cognitive science and cognitive neuroscience and aimingat the modelling and (re-)creation of high-level cognitive processes in a plausible neuralsetting: The aim to create a new, neurally-inspired computational platform for symboliccomputation which would, among others, be massively distributed with no central control,robust to unit failures, and self-improving with time.

In this and the following section we take a more technical perspective—to a certainextent contrasting the strongly cognitively-motivated Sections 4 and 5—and focus on twoprototypical examples for work on the computer science side of neural-symbolic integration,namely (i) the approximation of the immediate consequence operator TP associated toa FOL program P in a neural learning context, and (ii) Markov Logic as probabilisticextension of FOL combining logic and graphical models. While relegating elaborations onthe latter topic to the following Section 7, we start with the question of the consequenceoperator.

Assume a FOL program containing the fact p(X). Applying the associated TP -operatoronce (to an arbitrary initial interpretation), leads to a result containing infinitely manyatoms, namely all p(X)-atoms for every X within the underlying Herbrand universe UL.Thus, one has to approximate the operator because even a single application can lead toinfinite results. And while in this simple example one might still be able to define a finiterepresentation, the problem might become arbitrary complex for other programs. So-calledrational models have been developed to tackle this problem (Bornscheuer, 1996).

Unfortunately, there is no way to compute an upper bound on the size of the resultingrational representation. Because we are not aware of any other finite representation, we willin the following concentrate on the standard representation using Herbrand interpretations.In principle, there are two options to approximate a given operator. On the one hand,one can develop an approximating connectionist representation for a given accuracy. Thisapproximation in space leads to an increasing number of hidden layer units in the result-ing networks (Bader, 2009a; Bader, Hitzler, Holldobler, & Witzel, 2007). Alternatively, a

28

Neural-Symbolic Learning and Reasoning

connectionist system can be constructed that approximates a single application of TP thebetter the longer it runs. This approximation in time was used in (Bader & Hitzler, 2004)and (Bader, Hitzler, & Garcez, 2005). Networks constructed along the approximation inspace approach are more or less standard architectures with many hidden layer units, whilethe others are usually very small but use non-standard architectures and units.

6.1 Feasibility of the First-Order Core Method and the Embedding ofFirst-Order Rules into Neural Networks

It is well known that multilayer feed-forward networks are universal approximators for allcontinuous functions on compact subsets (Funahashi, 1989). If a suitable way to representfirst-order interpretations by (finite vectors of) real numbers could be found, then feed-forward networks may be used to approximate the meaning function of such programs. Itis necessary that such a representation is compatible with both, the logic programmingand the ANN paradigm. For this, one defines a homeomorphic embedding from the spaceof interpretations into a compact subset of the real numbers. (Holldobler, Kalinke, &Storr, 1999) showed the existence of a corresponding metric homeomorphism, and (Hitzler,Holldobler, & Seda, 2004) subsequently expanded this result to the characterisation of atopological isomorphism. Following these insights, one can use level mappings to realise thisembedding. For some Herbrand interpretation I, a bijective level mapping | · | : BL → N+,and b > 2, the embedding function η : BL → R and its extension η : IL → R are defined asfollows:

η : BL → R : A 7→ b−|A| η : IL → R : I 7→∑A∈I

η(A)

Here, Cb := {η(I)|I ∈ L} ⊂ R is used to denote the set of all embedded interpretations.This results in a “binary” representation of a given interpretation I in the number systemwith base b. After embedding an interpretation and looking at its representation withinthis number system one finds a 1 at each position |A| for all A ∈ I. As mentioned above,the embedding is required to be homeomorphic (i.e. to be continuous, bijective and to havea continuous inverse), because being homeomorphic ensures that η is at least a promisingcandidate to bridge the gap between logic programs and connectionist networks. One canconstruct a real valued version fP of the immediate consequence operator TP as follows:

fP : Cb → Cb : X 7→ η(TP (η−1(X)))

Because η is a homeomorphism, fP is a correct embedding of TP in the sense that allstructural information is preserved. Furthermore, η is known to be continuous which allowsto conclude that fP is continuous for a continuous TP . Based on these insights, one canconstruct connectionist networks to approximate TP . As mentioned above there are inprinciple two options for the approximation—in space and in time. The embedding ηintroduced above allows for both, as discussed in detail in (Bader, Hitzler, & Witzel, 2005;Bader et al., 2007; Bader & Hitzler, 2004; Bader et al., 2005). In (Bader, 2009b), methodshave been described in full detail to construct standard sigmoidal networks, as well asradial basis function networks for any given accuracy of approximation. That is, if an

29

Besold et al.

interpretation I is fed into the network, the resulting output can be interpreted as aninterpretation J and the distance between J and TP (I) is limited by a given ε.

So far we have been concerned with the approximation of a single application of theimmediate consequence operator only. More interesting is whether the repeated applicationof the approximating function converges to a fixed point relating to the model of the un-derlying program. Convergence can be shown by showing that a function is contractive andby employing the Banach contraction mapping principle. Unfortunately, it turns out thatthe approximating functions described above are not contractive on R in general, but theyare contractive if restricted to Cb. Therefore, one can conclude that not only the meaningfunction TP can be approximated using a connectionist network, but also the iteration ofthe approximation converges to a state corresponding to a model of the underlying programprovided that TP itself is contractive.

6.2 Learning in a First-Order Setting

Within a propositional setting, learning has been shown to be possible (Garcez et al., 2002)and to be of advantage (Bader, Holldobler, & Marques, 2008). As in the propositional case,training data is used to adapt the weights and structure of randomly initialised networks.An approach built upon vector-based networks has first been proposed in (Bader et al.,2007) along with a specifically tailored learning method. To show the applicability ofthat approach, a network had randomly been initialised and pairs of interpretations andcorresponding consequence (i.e. pairs of I and TP (I)), had been used as training samples.After convergence of the training process the network was evaluated by feeding randominterpretations into the network and iterating the computation by feeding the outputs backinto the input layer. The resulting sequence of interpretations was found to converge to themodel of the underlying program P . Hence it can be concluded that the network indeedmanaged to learn the immediate consequence operator sufficiently accurate.

From a more practical perspective (acknowledging the above difficulties, yet recognisingtheir scientific, representational value), some have opted for connecting neural computationwith the work done in Inductive Logic Programming (Muggleton & de Raedt, 1994; Basilio,Zaverucha, & Barbosa, 2001). Initial results, that take advantage of ILP methods forefficient FOL learning called propositionalisation (Krogel, Rawles, Zelezny, Flach, Lavrac,& Wrobel, 2003), have only started to emerge, but with promising results (cf., amongothers, the Connectionist ILP system by (Franca, Zaverucha, & Garcez, 2014), or the workby (Sourek, Aschenbrenner, Zelezny, & Kuzelka, 2015; de Raedt, Kersting, Natarajan, &Poole, 2016)). Also within a more practical approach, an extension of the above fixed-pointmethod to multiple networks, as shown in Figure 1, has been proposed in (Garcez et al.,2009). Such an extension, called connectionist modal logic, is of interest here in relationto FOL learning because it has proven capable of representing and learning propositionalmodal logics, which in turn are equivalent to the fragment of FOL with two variables.Propositional modal logic has been shown to be robustly decidable (Vardi, 1996) and it canbe considered, as a result, a candidate for offering an adequate middle ground between therequirements of representation capacity and learning efficiency.

30

Neural-Symbolic Learning and Reasoning

English First-Order Logic Weight

Smoking causes cancer ∀x : Sm(x)→ Ca(x) 1.5

Friends share smoking habits ∀x∀y : Fr(x, y) ∧ Sm(x)→ Sm(y) 1.1

Table 2

Figure 3: A Markov network with two constants Anna (A) and Bob (B).

7. Markov Logic Networks Combining Probabilities and First-OrderLogic

As a second prototypical example for the more technical side of neural-symbolic integrationbesides the connectionist first-order learning application from the previous section, we nowturn our attention to Markov logic networks. Markov logic is a probabilistic extensionof FOL (Richardson & Domingos, 2006; Domingos & Lowd, 2009). Classical logic cancompactly model high-order dependencies. However, at the same time it is deterministicand therefore brittle in the presence of noise and uncertainty. The key idea of Markovlogic is to overcome this brittleness by combining logic with graphical models. To that end,logical formulae are softened with weights, and effectively serve as feature templates for aMarkov network or Markov Random Field (MRF) (Taskar, Chatalbashev, & Koller, 2004),i.e. an undirected graphical model that allows arbitrary high-order potentials, similar to(but potentially much more compact than) a high-order Boltzmann distribution.

Concretely, a Markov logic network (MLN) is a set of weighted FOL formulae. Togetherwith a set of constants, it defines a Markov network with one node per ground atom (apredicate with all variables replaced by a constant) and one feature per ground formula.The weight of a feature is the weight of the first-order formula that originated it. Theprobability of a state x in such a network is given by P (x) = 1

Z exp(∑

iwifi(x)), where Zis a normalisation constant, wi is the weight of the i-th formula, fi = 1 if the i-th formulais true, and fi = 0 otherwise.

As an example, Table 2 shows an MLN with two weighted formulae, and Figure 3 showsthe corresponding Markov network with two constants, Anna (A) and Bob (B).

Exact and approximate inference methods for MRFs and other graphical models canalso be applied to MLNs. Popular choices include variable elimination or junction trees forexact inference and message-passing or Markov chain Monte Carlo methods for approximateinference. However, compared to standard MRFs, MLNs introduce new challenges and

31

Besold et al.

opportunities for efficient reasoning. One challenge is that even modestly sized domainscan yield exceptionally large and challenging ground Markov networks. For example, ina social network domain such as Table 2, the number of friendships grows quadraticallywith the number of people, since any pair of people can be friends. Worse, simple conceptssuch as transitivity (friends of your friends are likely to be your friends as well) lead to acubic number of ground formulae. Thus, a domain of 1000 people has 1 million possiblefriendships and 1 billion possible transitivity relations. Standard inference methods do notscale well to these sizes, especially when formula weights are large, representing stronginteractions among the variables. The opportunity is that these large models have a greatdeal of symmetry, structure that can be exploited by “lifted” inference algorithms (cf.(Braz, Amir, & Roth, 2005; Singla & Domingos, 2008)). Lifted inference works by groupingvariables or variable configurations together, leading to smaller, simpler inference problems.For example, when there is no evidence to differentiate Anna and Bob, the probability thateach smokes must be identical. Depending on the MLN structure and evidence, liftedinference can compute exact or approximate probabilities over millions of variables withoutinstantiating or reasoning about the entire network. Another technique for scaling inferenceis to focus on just the subset of the network required to answer a particular query or setof queries (Riedel, 2008). Related methods include lazy inference (Singla & Domingos,2006; Poon, Domingos, & Sumner, 2008), which only instantiates the necessary part of thenetwork, and coarse-to-fine inference (Kiddon & Domingos, 2011), which uses more complexmodels to refine the results of simpler models.

The parameter values in an MLN are critical to its effectiveness. For some applications,simply setting the relative weights of different formulae is sufficient; for example, a domainexpert may be able to identify some formulae as hard constraints, others as strong con-straints, and others as weak constraints. In many cases, however, the best performance isobtained by carefully tuning these to fit previously observed training data. As with otherprobabilistic models, MLN weight learning is based on maximizing the (penalized) likeli-hood of the training data. Given fully-observed training data, this is a convex optimizationproblem that can be solved with gradient descent or second-order optimization methods.However, computing the gradient requires running inference in the model to compute theexpected number of satisfied groundings of each formula. Furthermore, the gradient is oftenvery ill-conditioned, since some formulae have many more satisfying groundings than oth-ers. The standard solution is to approximate the gradient with a short run of Markov chainMonte Carlo, similar to persistent contrastive divergence methods used to train restrictedBoltzmann machines (Tieleman, 2008), and to precondition the gradient by dividing by itsvariance in each dimension (Lowd & Domingos, 2007). A popular alternative is optimizingpseudolikelihood (Besag, 1975), which does not require running inference but may handlelong chains of evidence poorly.

Most recently, a new paradigm has emerged by tightly integrating learning with in-ference, as exemplified by sum-product networks (SPNs) (Poon & Domingos, 2011). Thedevelopment of SPNs is motivated by seeking the general conditions under which the par-tition function is tractable. SPNs are directed acyclic graphs with variables as leaves, sumsand products as internal nodes, and weighted edges. Figure 4a shows an example SPNsimplementing a junction tree with clusters (X1 and X2) and (X1 and X3) and separa-

32

Neural-Symbolic Learning and Reasoning

(a) (b)

Figure 4: (a) Example SPNs implementing a junction tree with clusters (X1 and X2) and(X1 and X3) and separator X1. (b) Naive Bayes model with variables X1 and X2 andthree components.

tor X1, and Figure 4b depicts a Naive Bayes model with variables X1 and X2 and threecomponents.

To compute the probability of a partial or complete variable configuration, set the valuesof the leaf variables and compute the value of each sum and product node, starting at thebottom of the SPN. For example, to compute P (X1 = true, X3 = false) in the SPN inFigure 4a, set the value of X1 and X3 to 0, since these variable values contradict thespecified configuration, and set all other leaves to 1. Each of the four lower sum nodesis a weighted sum of two leaf values; from left to right, they evaluate to 1, 1, 0.5, and0.3. The product nodes above them evaluate to 0.5 and 0.3, and the overall root equals0.95 · 0.5 + 0.05 · 0.3 = 0.49. Thus, inference follows the structure of the SPN, allowingarbitrary probability queries to be answered in linear time.

It has been shown that if an SPN satisfies very general conditions called completenessand consistency, then it represents the partition function and all marginals of some graphicalmodel. Essentially all tractable graphical models can be cast as SPNs, but SPNs arealso strictly more general. SPNs achieve this by leveraging the key insight of dynamicprogramming, where intermediate results are captured (by internal sums and products) andreused. In this aspect, SPNs are similar to many existing formalisms such as AND/ORgraphs and arithmetic circuits. However, while these formalisms are normally used ascompilation targets for graphical models, SPNs are a general probabilistic model class inits own right, with efficient learning methods developed based on backpropagation and EM.Experiments show that inference and learning with SPNs can be both faster and moreaccurate than with standard deep networks. For example, SPNs perform image completionbetter than many state-of-the-art deep networks for this task. SPNs also have intriguingpotential connections to the architecture of the cortex (cf. (Poon & Domingos, 2011)).

Relational sum-product networks (RSPNs) extend SPNs to handle relational domainssuch as social networks and cellular pathways (Nath & Domingos, 2015). Like MLNs, anRSPN represents a probability distribution over the attributes and relations among a set ofobjects. Unlike MLNs, inference in an RSPN is always tractable. To achieve this, RSPNsdefine a set of object classes. A class specifies the possible attributes of each object, theclasses of its component parts, and the possible relations among the parts. For example,Figure 5 shows a partial class specification for a simple political domain. A ‘Region’ may

33

Besold et al.

class Region:

exchangeable part Nation

relation Adjacent(Nation,Nation)

relation Conflict(Nation,Nation)

class Nation:

unique part Government

exchangeable part Person

attribute HighGDP

relation Supports(Person,Government)

Figure 5: Partial class definition for a relational sum-product network in a simple politicaldomain.

contain any number of nations, and two nations within a region may be adjacent, in conflict,or both. A ‘Nation’ contains a government object and any number of person objects, each ofwhich may or may not support the government. A class also specifies a tractable distributionover the attributes, parts, and relations. The structure and parameters of SPNs can belearned from data. RSPNs have been applied to social network and automated debuggingapplications, where they were much faster and more accurate than MLNs.

In summary, due to the achieved combination between symbolic representations andsubsymbolic (in this case, graphical) features and forms of reasoning, MLNs and relatedformalisms have found a growing number of applications and gained popularity in AI ingeneral. They can therefore be seen as success stories of symbolic-numerical integrationclosely related to the original domain of neurally-plausible or -inspired implementations ofhigh-level cognitive reasoning, exemplifying valuable theory work with high relevance tocomputer science and, thus, complementing the aforementioned examples—among othersgiven by NSCA and Penalty Logic—as relevant neural-symbolic applications on the systemslevel.

8. Relating Neural-Symbolic Systems to Human-Level ArtificialIntelligence

Following the quite technical exposition of the last two sections, we now return to moreconceptually-motivated considerations focusing on the relation between neural-symbolic in-tegration and research in Human-Level Artificial Intelligence (HLAI), before identifyingcorresponding challenges and research opportunities.

HLAI, understood as the quest for artificially-created entities with human-like abilities,has been pursued by humanity since the invention of machines. It has also been a driv-ing force in establishing artificial intelligence as a discipline in the 1950s. 20th-centuryAI, however, has developed into a much narrower direction, focussing more and more onspecial-purpose and single-method driven solutions for problems which were once (or stillare) considered to be challenging, like game playing, problem solving, natural languageunderstanding, computer vision, cognitive robotics, and many others. In our opinion, 20th-

34

Neural-Symbolic Learning and Reasoning

century AI can therefore be perceived as expert AI, producing and pursuing solutions forrelevant specific tasks by using specialised methodologies designed for the tasks in question.We do not aim to say that this is a bad development—quite to the contrary, we think thatthis was (and still is) a very worthwhile and very successful endeavour with ample (and insome cases well-proven) scope for considerable impact on society.

However—although having been one of the core ideas in the early days of AI—thepursuit of HLAI has been declining in the 20th century, presumably because the originalvision of establishing systems with the desired capabilities turned out to be much harderto realise than expected in the beginning. Nevertheless, in recent years a rejuvenationof the original ideas has become apparent, driven on the one hand by the insight thatcertain complex tasks are outside the scope of specialised systems, and on the other handby the discovery of new methodologies appropriate to address hard problems of generalintelligence. Some examples for such new methodologies are the numerous approaches formachine learning, non-classical logic frameworks, or probabilistic reasoning, just to mentionsome of them. Furthermore, rapid developments in the neurosciences based on the inventionof substantially refined means of recording and analysing neural activation patterns in thebrain influenced this development. These are accompanied by interdisciplinary efforts withinthe cognitive science community, including psychologists and linguists with similar visions.

It is apparent that the realisation of HLAI requires the cross-disciplinary integration ofideas, methods, and theories. Indeed we believe that disciplines such as (narrow) artificialintelligence, neuroscience, psychology, and computational linguistics will have to convergesubstantially before we can hope to realise human-like artificially intelligent systems. Oneof the central questions in this pursuit is thus a meta-question: What are concrete linesof research which can be pursued in the immediate future in order to advance in the rightdirection and to avoid progress towards—currently prominently discussed, albeit highlyspeculative—less desirable technological or social scenarios? The general vision does notgive any answers to this, and while it is obvious that we require some grand all-encompassinginterdisciplinary theories for HLAI, we cannot hope to achieve this in one giant leap. Forpractical purposes—out of pure necessity since we cannot, and for principled reasons shouldnot even attempt to, shred our scientific inheritance (Besold, 2013)—we require the iden-tification of next steps, of particular topics which are narrow enough so that they can bepursued, but general enough so that they can advance us into the right direction. Thisbrings us back to the topic of this survey article, neural-symbolic integration.

The proposal for neural-symbolic integration as such a research direction starts fromtwo observations. These correspond to the two principled perspectives intersecting andcombining in neural-symbolic integration:

(i) The physical implementation of our mind is based on the neural system, i.e. ona network of neurones as identified and investigated in the neurosciences. If we hope toachieve HLAI, we cannot ignore this neural or subsymbolic aspect of biological intelligentsystems.

(ii) Formal modelling of complex tasks and human—in particular, abstract—thinkingis based on symbol manipulation, complex symbolic data structures (like graphs, trees,shapes, and grammars) and symbolic logic. At present, there exists no viable alternative tosymbolic approaches in order to encode complex tasks.

35

Besold et al.

These two perspectives however—the neural and the symbolic—are substantially or-thogonal to each other in terms of the state-of-the-art in the corresponding disciplines. Asalso evidenced by the previous sections, symbolically understanding neural systems is hardand requires significant amounts of refined theory and engineering while still falling shortof providing general solutions. Furthermore, it is quite unclear at present how symbolicprocessing at large emerges from neural activity of complex neural systems. Therefore,symbolic knowledge representation and the manipulation of complex data structures usedfor representing knowledge at the level required for HLAI is way outside the scope of currentartificial neural approaches.

At the same time humans—using their neural wetware, i.e. their brains—are able to dealsuccessfully with symbolic tasks, to manipulate symbolic formalisms, to represent knowledgeusing them, and to solve complex problems based on them. In total, there is a considerablemismatch between human neurophysiology and cognitive capabilities as role models forHLAI on the one hand, and theories and computational models for neural systems andsymbolic processing on the other hand.

It is our belief that significant progress in HLAI requires the reconciliation of neural andsymbolic approaches in terms of theories and computational models. We believe that thisreconciliation is as much central for the advancement of HLAI as it is for cognitive scienceand cognitive neuroscience (as described in Section 4). We also believe that the pursuitof this reconciliation is timely and feasible based on the current state of the art, and willcontinue by briefly mentioning some recent developments in neural-symbolic integrationwhich we consider to be of particular importance for the HLAI discussion. Further pointerscan be found, e.g., in (Bader et al., 2005; Hammer & Hitzler, 2007; Garcez et al., 2009).

The line of investigation we want to mention takes its starting point from computationalmodels in (narrow) AI and machine learning. It attempts to formally define and practicallyimplement systems based on ANNs which are capable of learning and dealing with logicrepresentations of complex domains and inferences on these representations. While this canbe traced back to (McCulloch & Pitts, 1943) ’s landmark paper on the relation betweenpropositional logic and binary threshold ANNs, it has been largely dormant until the 1990s,where (as already mentioned in previous sections) first neural-symbolic learning systemsbased on these ideas were realised—cf., among others, (Towell & Shavlik, 1994; Garcez &Zaverucha, 1999; Garcez et al., 2002). While these initial systems were still confined topropositional logics, in recent years systems with similar capabilities based on first-orderlogic have been realised—cf., for instance, (Gust, Kuhnberger, & Geibel, 2007) or Section 6.It is to be noted, however, that these systems—despite the fact that they provide a concep-tual breakthrough in symbol processing by ANNs—are still severely limited in their scopeand applicability, and improvements in these directions do not appear to be straightforwardat all. In order to overcome such problems, we also require new ideas borrowed from otherdisciplines, in order to establish neural-symbolic systems which are driven by the HLAIvision. Results from cognitive psychology on particularities of human thinking which areusually not covered by standard logical methods need to be included. Recent paradigmsfor ANNs which are more strongly inspired from neuroscience—as discussed, for example,in (Maass, 2002)—need to be investigated for neural-symbolic integration. On top of this,we require creative new ideas borrowed, among others, from dynamical systems theory ororganic computing to further the topic.

36

Neural-Symbolic Learning and Reasoning

The presented selection is considered to be exemplary and there are several other effortswhich could be mentioned. However, we selected this line of research for presentation in thelast paragraph as to us it appears to be typical and representative in that it is driven bycomputer science, machine learning, and classical AI research augmented with ideas fromcognitive science and neuroscience.

9. (Most) Recent Developments and Work in Progress from theNeural-Symbolic Neighbourhood

Before concluding this opinionated survey with two sections dedicated to future challengesand directions (Section 10) and to some short overarching remarks (Section 11), in thefollowing we will highlight recent work surfacing outside the traditional core field but in thetopical proximity of neural-symbolic integration with high relevance and potential impactfor the corresponding core questions.

9.1 Other Paradigms in Computation and Representation Narrowing theNeural-Symbolic Gap

We first want to direct the reader’s attention to a recently proposed neuro-computationalmechanism called “conceptors” (Jaeger, 2014). Broadly speaking, conceptors are a pro-posed mathematical, computational, and neural model combining two basic ideas, namelythat the processing modes of a recurrent neural network (RNN) can be characterised bythe geometries of the associated state clouds, and that the associated processing mode isselected and stabilised if the states of the respective RNN are filtered to remain within acertain corresponding state cloud. When taken together, this basically amounts to a possi-bility to control a multiplicity of processing modes of one single RNN, introducing a formof top-down control to the bottom-up connectionist network. While the underlying math-ematical considerations are fairly elaborate, the overall approach can be conceptualised asa two-step approach. First, the different elipsoids envelopping the differently shaped stateclouds a driven RNN exhibits when exposed to different dynamical input patterns each givea different conceptor. Once the driving patterns have been stored in the network (i.e., thenetwork has learned to replicate the pattern-driven state sequences in the absence of thedriver stimuli, performing a type of self-simulation), they can be selected and stably re-generated by inserting the corresponding conceptor filters (i.e., the geometry of the elipsoidserves as decision mechanism of the filter) in the network’s update loop. This partiallygeometric nature of the approach also offers another advantage in that conceptors can becombined by operations akin (and partially co-extensional) to Boolean logic, thus introduc-ing an additional dimension of semantic interpretation especially catering to the symbolicside of neural-symbolic integration.

While conceptors are an approach to equipping sub-symbolic models in form of RNNswith certain symbolic capacities and properties, the second recent research result we wantto mention aims at a connectionist implementation of Von Neumann’s computing architec-ture (Von Neumann, 1945) as the prototypical symbolic computer model. “Neural TuringMachines” (NTMs), introduced in (Graves, Wayne, & Danihelka, 2014), basically coupleRNNs with a large, addressable memory yielding the end-to-end differentiable and gradient-decent trainable analog of Turing’s memory-tape enriched finite-state machines. In terms of

37

Besold et al.

Figure 6: The basic structure of a Neural Turing Machine (taken from (Graves et al., 2014)):The controller network receives external inputs and emits outputs during each update cycle.Simultaneously, it reads to and writes from a memory matrix via parallel read and writeheads.

architecture, an NTM is constituted by a neural network controller and a memory bank (seeFigure 6). The controller manages the interaction with the environment via input and out-put vectors and interacts with a memory matrix using selective read and write operations.The main feature of NTMs is their end-to-end differentiability which allows for trainingwith gradient-descent approaches. In order to introduce this capacity, read and write op-erations have been defined in a way which allows them to interact to different degrees withall elements in the memory (modulated by a focus mechanism limiting the actual rangeof interaction), resulting in a probabilistic mode of reading and writing (different from thehighly specific discrete read/write operations a Turing machine performs). Also, interac-tion with the memory tends to be highly sparse, biasing the model towards interference-freedata storage. Which memory location to attend to is determined by specialised networkoutputs parameterising the read and write operations over the memory matrix, defining anormalised weighting over the rows of the matrix. These network outputs thereby can focusattention to a sharply defined single memory location or spread it out to weakly cover thememory at multiple adjacent locations. Several experiments also reported in (Graves et al.,2014) demonstrate that NTMs are generally capable of learning simple algorithms fromexample data and subsequently generalise these algorithms outside of the training regime.

Another approach to combining the learning capacities of connectionist networks witha form of read- and writeable memory, as well as inference capacities, has been presentedin the form of so called “memory networks” (Weston, Chopra, & Bordes, 2015). A memorynetwork consists of an array of indexed objects (such as arrays or vectors) m constitutingthe memory component, an input feature map I converting incoming input to the respectiveinternal feature representation, a generalisation G updating (and/or possibly compressingand generalising) old memories given the new input, an output feature map O producinga new output given the new input and the current memory state, and finally a response Rconverting the output into the final format. Except for m, the remaining four components I,

38

Neural-Symbolic Learning and Reasoning

G, O, and R could potentially be learned. Against this backdrop, if an input x arrives at thememory network, x is converted to an internal feature representation I(x) before updatingthe memories mi within m given x: ∀i : mi = G(mi, I(x),m). Subsequently, output featureso = O(I(x),m) are computed given the new input and the memory, and the output featuresare decoded into a final response r = R(O). The flow of the model remains the same duringboth training and testing, with the only difference being that the model parameters I, G,O, and R stay unchanged at test time. Due to this very general characterisation, memorynetworks can be flexible concerning the possible implementation mechanisms for I, G, O,and R, generally allowing the use of standard models from machine learning (such as RNNs,SVMs, or decision trees). In (Weston et al., 2015) also an account of an instantiation of thegeneral model is given, describing an implementation of a memory network for applicationin a text-based question answering (QA) setting using neural networks as components.The resulting system is then applied and evaluated separately in a large-scale QA and asimulated QA setting (the latter also including some QA with previously unseen words),before combining both setups in a final hybrid large-scale and simulated QA system servingas proof of concept for the possible future capacities of the approach.

9.2 Other Application Systems Narrowing the Neural-Symbolic Gap

Following the previous discussion of neural-symbolically relevant advances on the architec-tural and methodological level, in this subsection we will focus on several recent applicationsystems which are partially or fully based on connectionist approaches but are successfullyapplied to tasks of predominantly symbolic nature.

The first application area which recently has seen rapid development and systems op-erating on a previously unachieved, qualitatively new level of performance is the domainof vision-based tasks such as the semantic labelling of images according to their pictorialcontent. In (Vinyals, Toshev, Bengio, & Erhan, 2014) and (Karpathy & Fei-Fei, 2015)two approaches to related problems have been presented. While the generative, deep RNNmodel in (Vinyals et al., 2014) generates natural language sentences describing the contentof input images, (Karpathy & Fei-Fei, 2015) introduces an approach to recognition andlabelling tasks for the content of different image regions (corresponding to different objectsand their properties in the pictures). The model from (Vinyals et al., 2014) combines aninput convolutional neural network, used for encoding the image (the network is pre-trainedfor an image classification task, and the last hidden layer serves as interface to the followingnatural language decoder), with an output RNN, generating natural language sentencesbased on the processed data of the input network. The result is a fully stochastic gradi-ent descent-trainable neural network combining, in its sub-networks a vision and a languagemodel, and—when put in application—significantly outperforming previous approaches andnarrowing the gap to human performance on the used test sets. In (Karpathy & Fei-Fei,2015), a region convolutional neural network is applied to the identification of objects (andthe corresponding regions) in images and their subsequent encoding in a multimodal embed-ding space, together with a bidirectional RNN which computes word representations frominput sentences and also embeds them in the same space. Once this mapping of images andsentences into the embedding space has been achieved, an alignment objective in terms of animage-sentence score (a function of the individual region-word scores) is introduced, tying

39

Besold et al.

together both parts of the model. The resulting system again performs better than previousapproaches, showing state of the art performance in image-sentence ranking benchmarksand outperforming previous retrieval baselines in fullframe and region-level experiments.Both models and the corresponding implementation systems are remarkable in that classi-cally tasks involving semantic descriptions had been associated with databases containingsome form of background knowledge, and the relevant picture processing/computer visionapproaches had for a long time also relied on rule-based techniques. The described workgives proof that this type of task—which clearly conceptually still operates on a symboliclevel—generally can also be addressed in practice in purely connectionist architectures.

Another closely related type of task is visual analogy-making as, for instance, exemplifiedin transforming a query image according to an example pair of related images. In (Reed,Zhang, Zhang, & Lee, 2015), the authors describe a way how to tackle this challenge againapplying deep convolutional neural networks. In their approach a first network learns tooperate as a deep encoder function mapping images to an embedding space (which dueto its internal structure allows to perform some forms of reasoning about analogies viafairly standard mathematical operations), before a second network serves as a deep decoderfunction mapping from the embedding back to the image space. The resulting models arecapable of learning analogy-making based on appearance, rotation, 3D pose, and severalobject attributes, covering a variety of tasks ranging from analogy completion, to animationtransfer between video game characters, and 3D pose transfer and rotation. Also here, ingeneral analogy models previously have mostly been symbol-based, and generative modelsof images had to encode significant prior background knowledge and restrict the allowedtransformations, making the general task domain mostly symbolic in nature.

A different domain is addressed by the work described in (Foerster, Assael, de Freitas,& Whiteson, 2016). There, a deep distributed recurrent Q-network is successfully appliedto learning communication protocols for teams of agents in coordination tasks in a partiallyobservable multi-agent setting. This type of task requires agents to coordinate their be-haviour maximising their common payoff while dealing with uncertainty about both, thehidden state of the environment and the information state and future actions of the mem-bers of their team. In order to solve this challenge, so called “deep distributed recurrent Q-networks” (DDRQN) have been introdcued. These networks are based on independent deepQ-network agents with Long Short-Term Memory (LSTM) networks (Hochreiter & Schmid-huber, 1997)—a currently very popular type of RNN for tasks involving the processing ofsequential data—augmented by (i) supplying each agent with its previous action as inputon the following time step (enabling agents to approximate their own action-observationhistory), (ii) sharing a single network’s weights between all agents and conditioning on theagent’s ID (enabling fast learning and diverse behaviour), and (iii) disabling experiencereplay (avoiding problems with non-stationarity arising from the simultaneous learning ofmultiple agents). The resulting system proves capable of solving (for limited numbers ofagents) two classical logic riddles necessarily requiring collaborative communication and in-formation exchange between players in their optimal solution strategies. While future workstill will have to prove the scalability of the approach, the work in (Foerster et al., 2016)nonetheless is a proof of concept for the ability of connectionist networks to learn com-munication strategies in multi-agent scenarios—which is another instance of a genuinelysymbolic capacity that has successfully been implemented in a neural network architecture.

40

Neural-Symbolic Learning and Reasoning

The final example of a successful application system addressing aspects of neural-symbolic integration which we want to mention is the AlphaGo Go-playing system presentedin (Silver, Huang, Maddison, Guez, Sifre, van den Driessche, Schrittwieser, Antonoglou,Panneershelvam, Lanctot, Dieleman, Grewe, Nham, Kalchbrenner, Sutskever, Lillicrap,Leach, Kavukcuoglu, Graepel, & Hassabis, 2016). While Go had for a long time provento be computationally unfeasible to solve due to the quickly growing game tree and cor-responding search space, the authors introduce an architecture equipped with deep neuralnetwork-based move selection and position evaluation functions (learned using a combi-nation of supervised learning from human expert games and reinforcement learning frommassively parallel self-play). Integrating these, in the form of policy and value networks,with a Monte Carlo tree search lookahead approach, the resulting high-performance treesearch engine allows to efficiently explore the game tree and play the game at a level onpar with the world’s best human players. Different from the previous examples from thevisual domain, the resulting architecture is closer to being genuinely neural-symbolic in itscombination between trained connectionist networks for pruning and narrowing down thesearch space, and the subsequent Monte Carlo algorithm selecting actions by lookahead treesearch. With its previously unachieved performance AlphaGo constitutes a prime exampleof the possibilities introduced by combining connectionist and symbolic approaches andtechniques.

9.3 Other Research Efforts and Programs Narrowing the Neural-SymbolicGap

In this final subsection we want to focus on three different lines of work which stand outamong the usual work in neural-symbolic integration in that they either start out from adifferent perspective than neural computation or logic-based modelling, or apply techniquesand approaches which are not commonly used in the field. Examples for such “non-classicalneural-symbolic research programs” can be found, for instance, in certain investigations intomethods and mechanisms of multi-agent systems, in empirical attempts at understandingRNNs by using language models as testbed providing data for qualitative analyses, and intheoretical computer science and complexity theory projects targeting the practical differ-ences between symbolic and connectionist approaches with theoretical means. We describethree recent suggestions as prototypical cases in point.

(i) Anchoring Knowledge in Interaction in Multi-Agent Systems: The typicalsetting in multi-agent systems encompasses an agent exploring the environment, learningfrom it, reasoning about its internal state given the input, acting and, in doing so, changingthe environment, exploring further, and so on as part of a permanent cycle (Cutsuridis,Hussain, & Taylor, 2011). One of the biggest challenges in this context is the (bidirectional)integration between low-level sensing and interacting as well as the formation of high-levelknowledge and subsequent reasoning. In (Besold, Kuhnberger, Garcez, Saffiotti, Fischer, &Bundy, 2015) an interdisciplinary group of researchers spanning from cognitive psychologythrough robotics to formal ontology repair and reasoning sketch conceptually (what amountsto) a strongly cognitively-motivated neural-symbolic architecture and model of a situatedagent’s knowledge acquisition through interaction with the environment in a permanent

41

Besold et al.

cycle of learning through experience, higher-order deliberation, and theory formation andrevision.

In the corresponding research program—further introduced and actually framed in termsof neural-symbolic integration in (Besold & Kuhnberger, 2015)—different topics also men-tioned in the present survey article are combined for that purpose. Starting out from com-putational neuroscience and network-level cognitive modelling (as represented, e.g., by theconceptors framework described above) combined with psychological considerations on em-bodied interaction as part of knowledge formation, low-level representations of the agent’ssensing are created as output of the ground layer of the envisioned architecture/model.This output then is fed into a second layer performing an extended form of anchoring(Coradeschi & Saffiotti, 2000) not only grounding symbols referring to perceived physicalobjects but also dynamically adapting and repairing acquired mappings between environ-ment and internal representation. The enhanced low-level representations as output of theanchoring layer are then in turn fed to the—in terms of neural-symbolic integration mostclassical—lifting layer, i.e., to an architectural module repeatedly applying methods com-bining neural learning and temporal knowledge representation in stochastic networks (as,e.g., already discussed in the form of RBMs in the context of the NSCA framework in Sec-tion 3) to continuously further increase the abstraction level of the processed informationtowards a high-level symbolic representation thereof. Once a sufficiently general level hasbeen reached, a final layer combining elaborate cognitively-inspired reasoning methods onthe symbolic level (such as, for instance, analogy-making, concept blending, and conceptreformation) then is used for maintaining a model of the agent and the world on the levelof explicit declarative knowledge (Bundy, 2013), which nonetheless is intended to feed backinto the lower layers—among others by serving as guidance for the augmented anchoringprocess.

(ii) Visualising and Understanding Recurrent Networks: LSTMs stand out be-tween current deep network models due to their capacity to store and retrieve informationover prolonged time periods using built-in constant error carousels governed by explicitgating mechanisms. Still, while having been used in several high-profile application studies,this far only little is known about the precise properties and representation capacities ofLSTMs. In (Karpathy, Johnson, & Fei-Fei, 2015), a first step towards clarifying some ofthe relevant questions is described and results are presented which are also relevant from aneural-symbolic perspective. The authors empirically shed light on aspects of representa-tion, predictions and error types in LSTM networks, uncover the existence of interpretablecells within the network which keep track of long-range dependencies in the respective inputdata, and suggest that the remarkable performance of LSTMs might be due to long-rangestructural dependencies.

In order to gain insight into the inner workings of LSTM networks, character-levellanguage models are used as an interpretable testbed for several computational experiments.In theory, by making use of its memory cells an LSTM should be capable to remember long-range information and keep track of different attributes of the respective input it is justprocessing. Also, using text processing as an example, manually setting up a memory cellin such a way that it keeps track of whether it is inside a quoted string or not is notoverly challenging. Still, whether LSTMs actually resort to this type of internal structure(i.e., forming interpretable cells with dedicated, more abstract tasks) was not as clear.

42

Neural-Symbolic Learning and Reasoning

The empirical findings in (Karpathy et al., 2015) indicate that such “task cells” indeedexist, in the case of processing program code as textual input ranging from cells checkingfor parenthesis after an if statement to cells acting as line length counter or tracking theindentation of blocks of code. In a second step, using finite horizon n-gram models ascomparandum, it is shown that LSTM networks are remarkably good at keeping track oflong-range interactions and dependencies within the input data.

From the neural-symbolic point of view, the work reported in (Karpathy et al., 2015)is important in at least two ways. On the one hand, the findings relating to the existenceof interpretable long-range interactions and associated cells in the network setup actuallytake LSTMs one step closer to the level of symbolic interpretability and away from thecompletely distributed and non-localised classical connectionist setting. On the other hand,the overall approach and the set of methods applied in the corresponding empirical studiesand analyses (including qualitative visualisation experiments, statistical evaluations of cellactivations, and the comparison to the n-gram models) suggest potential techniques whichcould be used in more detailed follow-up studies about knowledge extraction from LSTMsand similar assessments of other types of recurrent neural networks in general.

(iii) Identifying and Exploring Differences in Complexity: Contrasting themulti-agent approach described in (i) and the empirical analysis of LSTM networks in(ii), the second research program proposed in (Besold & Kuhnberger, 2015) approachesthe differences between ANNs and logics from the angle of theoretical computer scienceand complexity theory. In the suggested line of work recent advances in the modellingand analysis of connectionist networks and new developments and tools for investigatingpreviously unconsidered properties of symbolic formalisms shall be combined in an attemptat providing an explanation for the empirical differences in terms of applicability and per-formance between neural and logical approaches very quickly showing up in applicationscenarios (which seem to harshly contradict the formal equivalence between both describedin Section 2).

To this end, it is suggested that the form and nature of the the polynomial overheadas computational-complexity difference between paradigms should be examined in moredetail. The underlying hope is that a closer look at this overhead might shed light onpreviously unconsidered factors as hitherto most complexity results have been establishedusing exclusively TIME and SPACE as classical resources for this form of analysis, andmost analyses have left out more precise investigations of the remaining polynomial overheadafter establishing tractability. Against this background, the working hypotheses for theprogram are that for a fully informative analysis other resources might have to be takeninto account on the connectionist side (such as the number of spikes a spiking networkrequires during computation, or the number of samples needed for convergence from aninitial network state to the stationary distribution) and that additional formal tools andtechniques will have to be applied on the symbolic side (such as parameterised-complexitymethods which take into account problem- and application-specific properties of problemclasses, or descriptive-complexity theory which also considers the polynomial-time and thelogarithmic-time hierarchy).

The results of these efforts promise to contribute to resolving some of the basic theo-retical and practical tensions arising when comparing the suitability and feasibility of anapplication of certain formalisms to different types of tasks and domains. Which, in turn,

43

Besold et al.

then is expected to also successively provide information on general challenges (and theirpotential remedies) which would have to be overcome for advancing towards neural-symbolicintegration on the formal and theoretical side.

10. Challenges and Future Directions

In this second-to-last section of our survey we give an overview of what we consider thecurrently most pressing and/or—if addressed successfully—potentially most rewarding the-oretical and practical questions and topics on the way towards bridging the gap betweenconnectionist computation and high-level reasoning.

As described throughout this article, if connectionism is to be an alternative paradigmto logic-based artificial intelligence, ANNs must be able to compute symbolic reasoning ef-ficiently and effectively, combining the fault-tolerance of the connectionist component withthe “brittleness and rigidity” of its symbolic counterpart. On the more architecturally-oriented side, by integrating connectionist and symbolic systems using some of the previ-ously sketched or similar approaches, hybrid architectures (Sun, 2002) seek to tackle thisproblem and offer a principled way of computing and learning with knowledge of varioustypes and on different levels of abstraction. Starting out from multi-modular approachessuch as, for instance, the CLARION architecture (Sun, 2003) in cognitive modelling or thevision of symbolic/sub-symbolic multi-context systems (Besold & Mandl, 2010) in data in-tegration and reasoning, they have the potential to open up the way and serve as foundationfor the convergence between the paradigms and the development of fully-integrated mono-lithic neural-symbolic systems. Nonetheless, the development of such systems is currentlystill in its infancy and often is motivated much more by concrete engineering imperativeson a case-by-case basis than by coordinated and targeted research efforts aiming at testinggeneral approaches and understanding the general mechanisms and constraints at work.

On a more positive note, from a more formally-centred perspective it is possible toidentify several current points of approximation and potential confluence: deep belief net-works are based on Boltzmann machines as an early symmetric ANN model, and related toBayesian networks. Certain neural-symbolic recurrent networks are very similar to dynamicBayesian networks. Connectionist modal logic uses modal logic as a language for reasoningabout uncertainty in a connectionist framework. Markov logic networks combine logicalsymbols and probability. In summary it can, thus, be observed that the objectives indeedseem to be converging. Still, also here the current state of affairs is not yet satisfactoryas too often too little consideration is given to an integration on the level of mechanisms,allowing them to remain varied instead of triggering convergence also along this (potentiallyeven more relevant) dimension.

In general, many questions and limits to integration hitherto are unsolved and some-times even unaddressed. Currently, the focus still predominantly lies on either learning orreasoning, with efforts aiming to improve one using state-of-the-art models from the other,rather than truly integrating both in a principled way with contribution to both areas. Inaddition to, for instance, (Garcez, Besold, de Raedt, Foldiak, Hitzler, Icard, Kuhnberger,Lamb, Miikkulainen, & Silver, 2015) we now turn to some of the (many) remaining chal-lenges: we have seen that neural-symbolic systems are composed of (i) translations fromlogic to network, (ii) machine learning and reasoning, (iii) translation from network to logic.

44

Neural-Symbolic Learning and Reasoning

Among the main challenges are in (i) finding the limits of representation, in (ii) finding rep-resentations that are amenable to integrated learning and reasoning, and in (iii) producingeffective knowledge extraction from very large networks.

What follows is a list of research issues related to challenges (i) to (iii) each of whichcould serve as basis for a research program in its own right:

• Reconciling first-order logic learning and first-order logic reasoning.

• Embracing semi-supervised and incremental learning.

• Evaluating and analysing large-scale gains of massive parallelism.

• Implementing learning-reasoning-acting cycles in cognitive agents.

• Developing representations for learning which learn the ensemble structure.

• Rule extraction and interpretability for networks with thousands of neurones.

• Applying fibring in practice and actually learning the fibring function.

• Theoretically understanding the differences in application behaviour between connec-tionist and symbolic methods.

• Developing a proof theory and type theory for ANNs.

• Developing analogical and abductive neuro-symbolism with the aim of automatingthe scientific method.

• Investigating neural-symbolic models of attention focus and modelling and contrastingemotions and utility functions.

While some of the former recently have started to attract active attention from researchers—examples are the work on neural-symbolic multi-agent systems, the proposal for a researchprogram addressing the learning-reasoning-acting cycle, or the effort to better theoreticallyunderstand the theoretical basis of application differences between neural and logical ap-proaches outlined in Section 9—our knowledge about these issues is only limited and manyquestions still have to be asked and answered.

Also in terms of approach, several remarks seem in place: while seeing it as highly rele-vant concerning motivations and as source of inspiration, we agree that the direct bottom-upapproach (i.e., brain modelling) is not productive given the current state of the art. Also,we agree that an engineering approach is valid but specific and will not answer the big ques-tions asked, for instance, in more cognitively- or HLAI-oriented lines of research (includingsome of the bullet points above). Therefore, a foundational approach is needed combiningall the different directions touched upon in this survey article, i.e., encompassing (at least)different elements of computer science (in particular, knowledge representation and machinelearning), cognitive science, and cognitive neuroscience.

Finally, in terms of applications, the following can be counted among the potential“killer applications” for neural-symbolic systems: ANNs have been very effective at imageprocessing and feature extraction as evidenced, for instance, by (Ji, Xu, Yang, & Yu, 2013)

45

Besold et al.

and the work reported in the previous section. Yet, large-scale symbolic systems have beenthe norm for text processing (as most approaches currently in use rely on large ontologies,are inefficient to train from scratch, and heavily dependent on data pre-processing). Even ifnetworks could be trained without providing starting knowledge to perform as well as, forexample, WordNet, the networks would be very difficult to maintain and validate. Neural-symbolic systems capable of combining network models for image processing and legacysymbolic systems for text processing seem therefore ideally suited for multimodal processing.Here concepts such as fibring (our running example from Section 2) seem important sothat symbolic systems and network models can be integrated loosely at the functionallevel. In this process, inconsistencies may arise, making the resolution of clashes a keyissue. One approach is to see inconsistency as a trigger for learning, with new informationin either part of the combined system serving to adjudicate the conflict (similar to themechanisms envisioned within the learning-reasoning-acting framework described in theprevious section). The immediate impact of this application would be considerable in manyareas including the web, intelligent applications and tools, and security.

11. Concluding Remarks

This survey summarises initial links between the fields of computer science (and more specif-ically AI), cognitive science, cognitive neuroscience, and neural-symbolic computation. Thelinks established are aimed at answering a long-standing dichotomy in the study of humanand artificial intelligence, namely the perceived dichotomy of brain and mind, characterisedwithin computer science by the symbolic approach to automated reasoning and the statis-tical approach of machine learning.

We have discussed the main characteristics and some challenges for neural-symbolicintegration. In a nutshell:

neural-symbolic systems = connectionist machine + logical abstractions

The need for rich, symbol-based knowledge representation formalisms to be incorporatedinto learning systems has explicitly been argued at least since (Valiant, 1984)’s seminalpaper, and support for combining logic and connectionism, or logic and learning was alreadyone of Turing’s own research endeavours (Pereira, 2012).

Both the symbolic and connectionist paradigms embrace the approximate nature ofhuman reasoning, but when taken individually have different virtues and deficiencies. Re-search into the integration of the two has important implications that can benefit comput-ing, cognitive science, and even cognitive neuroscience. The limits of effective integrationcan be pursued through neural-symbolic methods, following the needs of different applica-tions where the results of principled integration must be shown advantageous in practice incomparison with purely symbolic or purely connectionist systems.

The challenges for neural-symbolic integration today emerge from the goal of effective in-tegration, expressive reasoning, and robust learning. Computationally, there are challengesassociated with the more practical aspects of the application of neural-symbolic systems inareas such as engineering, robotics, semantic web, etc. These challenges include the effectivecomputation of logical models, the efficient extraction of comprehensible knowledge and,ultimately, striking of the right balance between tractability and expressiveness.

46

Neural-Symbolic Learning and Reasoning

In summary, by paying attention to the developments on either side of the divisionbetween the symbolic and the sub-symbolic paradigms, we are getting closer to a unifyingtheory, or at least promoting a faster and principled development of cognitive and computingsciences and AI. To us this is the ultimate goal of neural-symbolic integration together withthe associated provision of neural-symbolic systems with expressive reasoning and robustlearning capabilities.

References

Anandan, P., Letovsky, S., & Mjolsness, E. (1989). Connectionist variable-binding by op-timization. In Proceedings of the 11th Annual Conference of the Cognitive ScienceSociety (CogSci-89).

Anderson, J. R., Fincham, J. M., Qin, Y., & Stocco, A. (2008). A central circuit of themind. Trends in cognitive sciences, 12 (4), 136–143.

Anderson, J. R., & Lebiere, C. J. (2014). The atomic components of thought. PsychologyPress.

Baader, F., Calvanese, D., McGuinness, D. L., Nardi, D., & Patel-Schneider, P. F. (Eds.).(2003). The Description Logic Handbook: Theory, Implementation, and Applications.Cambridge University Press.

Baddeley, A. (2000). The episodic buffer: a new component of working memory?. Trendsin cognitive sciences, 4 (11), 417–423.

Bader, S., & Hitzler, P. (2005). Dimensions of neural-symbolic integration: A structuredsurvey. In Artemov, S., Barringer, H., d’Avila Garcez, A., Lamb, L., & Woods, J.(Eds.), We Will Show Them: Essays in Honour of Dov Gabbay (Volume 1), pp. 167–194. King’s College Publications.

Bader, S. (2009a). Extracting propositional rules from feed-forward neural networks bymeans of binary decision diagrams. In Garcez, A., & Hitzler, P. (Eds.), Proceed-ings of the 5th International Workshop on Neural-Symbolic Learning and Reasoning,NeSy’09, held at IJCAI-09, pp. 22–27.

Bader, S. (2009b). Neural-Symbolic Integration (Dissertation). Ph.D. thesis, TechnischeUniversit’t Dresden, Dresden, Germany.

Bader, S., & Hitzler, P. (2004). Logic programs, iterated function systems, and recurrentradial basis function networks. Journal of Applied Logic, 2 (3), 273–300.

Bader, S., Hitzler, P., & Garcez, A. (2005). Computing first-order logic programs by fib-ring artificial neural networks. In Russell, I., & Markov, Z. (Eds.), Proceedings ofthe 18th International Florida Artificial Intelligence Research Symposium Conference,FLAIRS05.

Bader, S., Hitzler, P., Holldobler, S., & Witzel, A. (2007). A fully connectionist modelgenerator for covered first-order logic programs. In Veloso, M. M. (Ed.), Proceedingsof the 20th International Joint Conference on Artificial Intelligence (IJCAI-07), pp.666–671. AAAI Press.

47

Besold et al.

Bader, S., Hitzler, P., & Witzel, A. (2005). Integrating first-order logic programs andconnectionist systems - a constructive approach. In Garcez, A., Elman, J., & Hitzler,P. (Eds.), Proceedings of the 1st International Workshop on Neural-Symbolic Learningand Reasoning, NeSy’05, held at IJCAI-05.

Bader, S., Holldobler, S., & Marques, N. C. (2008). Guiding backprop by inserting rules.In Garcez, A., & Hitzler, P. (Eds.), Proceedings of the 4th International Workshopon Neural-Symbolic Learning and Reasoning, NeSy’08, held at ECAI-08, Vol. 366 ofCEUR Workshop Proceedings.

Ballard, D. H. (1986). Parallel logical inference and energy minimization. In Proceedingsof the Fifth National Conference on Artificial Intelligence (AAAI-1986), pp. 203–208.AAAI Press.

Barnard, P. J. (1999). Interacting cognitive subsystems: Modeling working memory phe-nomena within a multiprocessor architecture. In Miyake, A., & Shah, P. (Eds.), Modelsof Working Memory, pp. 298–339. Cambridge University Press.

Barrett, L., Feldman, J. A., & Dermed, M. L. (2008). A (somewhat) new solution to thebinding problem. Neural Computation, 20, 2361–2378.

Basilio, R., Zaverucha, G., & Barbosa, V. C. (2001). Learning logic programs with neuralnetworks. In 11th International Conference on Inductive Logic Programming (ILP)2001, Vol. 2157 of Lecture Notes in Artificial Intelligence, pp. 15–26.

Becker, M., Mackay, J., & Dillaway, B. (2009). Abductive authorization credential gather-ing. In Policies for Distributed Systems and Networks, 2009. POLICY 2009. IEEEInternational Symposium on, pp. 1–8.

Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician, 24, 179–195.

Besold, T. R. (2013). Human-level artificial intelligence must be a science. In Kuhnberger,K.-U., Rudolph, S., & Wang, P. (Eds.), Artificial General Intelligence, Vol. 7999 ofLecture Notes in Computer Science, pp. 174–177. Springer.

Besold, T. R., & Kuhnberger, K.-U. (2015). Towards integrated neural-symbolic systemsfor human-level ai: Two research programs helping to bridge the gaps. BiologicallyInspired Cognitive Architectures, 14, 97–110.

Besold, T. R., Kuhnberger, K.-U., Garcez, A. d., Saffiotti, A., Fischer, M. H., & Bundy,A. (2015). Anchoring knowledge in interaction: Towards a harmonic subsym-bolic/symbolic framework and architecture of computational cognition. In Bieger,J., Goertzel, B., & Potapov, A. (Eds.), Artificial General Intelligence, Vol. 9205 ofLecture Notes in Computer Science, pp. 35–45. Springer.

Besold, T. R., & Mandl, S. (2010). Integrating logical and sub-symbolic contexts of reason-ing. In Filipe, J., Fred, A. L. N., & Sharp, B. (Eds.), Proceedings of the InternationalConference on Agents and Artificial Intelligence (ICAART) 2010, pp. 494–497. IN-STICC Press.

Bogacz, R., & Brown, M. W. (2003). Comparison of computational models of familiaritydiscrimination in the perirhinal cortex. Hippocampus, 13 (4), 494–524.

48

Neural-Symbolic Learning and Reasoning

Bornscheuer, S.-E. (1996). Rational models of normal logic programs. In Gorz, G., &Holldobler, S. (Eds.), KI-96: Advances in Artificial Intelligence, Vol. 1137 of LectureNotes in Computer Science, pp. 1–4. Springer Berlin Heidelberg.

Bowers, J. S. (2002). Challenging the widespread assumption that connectionism and dis-tributed representations go hand-in-hand. Cognitive Psychology, 45 (3), 413–445.

Bowman, H., & Su, L. (2014). Cognition, concurrency theory and reverberations in thebrain: in search of a calculus of communicating (recurrent) neural systems. InVoronkov, A., & Korovina, M. V. (Eds.), HOWARD-60: A Festschrift on the Oc-casion of Howard Barringer’s 60th Birthday, pp. 66–84. EasyChair.

Bowman, H., & Wyble, B. (2007). The simultaneous type, serial token model of temporalattention and working memory.. Psychological review, 114 (1), 38.

Bratman, M. (1999). Intention, plans, and practical reason. Cambridge University Press.

Braz, R., Amir, E., & Roth, D. (2005). Lifted first-order probabilistic inference. In Pro-ceedings of the Nineteenth International Joint Conference on Artificial Intelligence(IJCAI-2005), pp. 1319–1325, Edinburgh, UK. Morgan Kaufmann.

Browne, A., & Sun, R. (2001). Connectionist inference models. Neural Networks, 14, 1331–1355.

Browne, A., & Sun, R. (1999). Connectionist variable binding. Expert Systems, 16 (3),189–207.

Bundesen, C., Habekost, T., & Kyllingsbæk, S. (2005). A neural theory of visual attention:bridging cognition and neurophysiology.. Psychological review, 112 (2), 291.

Bundy, A. (2013). The interaction of representation and reasoning. In Proceedings of theRoyal Society A, Vol. 469, pp. 1–18.

Burton, A. M., Bruce, V., & Johnston, R. A. (1990). Understanding face recognition withan interactive activation model. British Journal of Psychology, 81 (3), 361–380.

Chen, H., & Murray, A. F. (2003). Continuous restricted boltzmann machine with an imple-mentable training algorithm. Vision, Image and Signal Processing, IEE Proceedings,150 (3), 153–158.

Clarke, E., Biere, A., Raimi, R., & Zhu, Y. (2001). Bounded model checking using satisfia-bility solving. Formal Methods in System Design, 19 (1), 7–34.

Cloete, I., & Zurada, J. (Eds.). (2000). Knowledge-Based Neurocomputing. MIT Press.

Coradeschi, S., & Saffiotti, A. (2000). Anchoring symbols to sensor data: Preliminary report.In Proceedings of the 17th AAAI Conference, pp. 129–135. AAAI Press.

Cowell, R., & Cottrell, G. (2009). Virtual brain reading: A connectionist approach tounderstanding fmri patterns. Journal of Vision, 9 (8), 472–472.

Cutsuridis, V., Hussain, A., & Taylor, J. G. (Eds.). (2011). Perception-Action Cycle: Models,Architectures, and Hardware. Springer.

Damasio, A. (2008). Descartes’ error: Emotion, reason and the human brain. RandomHouse.

49

Besold et al.

Davelaar, E., & Usher, M. (2004). An extended buffer model for active maintenance andselective updating. In Proceedings of the 8th Neural Computatoin and PsychologyWorkshop, Vol. 15, pp. 3–14. World Scientific.

de Penning, L., Boot, E., & Kappe, B. (2008). Integrating training simulations and e-learning systems: the simscorm platform. In Proceedings: Interservice/Industry Train-ing, Simulation, and Education Conference (I/ITSEC).

de Penning, L., d’Avila Garcez, A., Lamb, L. C., & Meyer, J. J. (2010). An integratedneural-symbolic cognitive agent architecture for training and assessment in simulators.In Proceedings of the 6th International Workshop on Neural-Symbolic Learning andReasoning, NeSy10, held at AAAI-2010.

de Penning, L., d’Avila Garcez, A., Lamb, L. C., & Meyer, J. J. (2011). A neural-symboliccognitive agent for online learning and reasoning. In Proceedings of the 22nd Interna-tional Joint Conference on Artificial Intelligence (IJCAI’11).

de Penning, L., d’Avila Garcez, A. S., Lamb, L. C., Stuiver, A., & Meyer, J. J. C. (2014).Applying neural-symbolic cognitive agents in intelligent transport systems to reduceco2 emissions. In 2014 International Joint Conference on Neural Networks (IJCNN),pp. 55–62.

de Penning, L., den Hollander, R., Bouma, H., G., B., & Garcez, A. d. (2012). A neural-symbolic cognitive agent with a mind’s eye. In Proceedings of the AAAI Workshopon Neural-Symbolic Learning and Reasoning (NeSy) 2012.

de Raedt, L., Kersting, K., Natarajan, S., & Poole, D. (2016). Statistical Relational ArtificialIntelligence: Logic, Probability, and Computation. Synthesis Lectures on ArtificialIntelligence and Machine Learning. Morgan and Claypool Publishers.

Domingos, P., & Lowd, D. (2009). Markov Logic: An Interface Layer for Artificial Intelli-gence. Morgan & Claypool.

Domingos, P. (2008). Markov logic: a unifying language for knowledge and information man-agement. In Proceedings of the 17th ACM conference on Information and knowledgemanagement, pp. 519–519. ACM.

Donlon, J. (2010). DARPA’s Mind’s Eye Program: Broad Agency Announcement. Arlington,USA.

Dumontheil, I., Thompson, R., & Duncan, J. (2011). Assembly and use of new task rulesin fronto-parietal cortex. Journal of Cognitive Neuroscience, 23 (1), 168–182.

Duncan, J., Parr, A., Woolgar, A., Thompson, R., Bright, P., Cox, S., Bishop, S., & Nimmo-Smith, I. (2008). Goal neglect and spearman’s g: competing parts of a complex task..Journal of Experimental Psychology: General, 137 (1), 131.

Engel, A. K., & Singer, W. (2001). Temporal binding and the neural correlates of sensoryawareness. Trends in cognitive sciences, 5 (1), 16–25.

Evans, J. S. B. (2003). In two minds: dual-process accounts of reasoning. Trends in cognitivesciences, 7 (10), 454–459.

Fagin, R., Halpern, J., Moses, Y., & Vardi, M. (1995). Reasoning About Knowledge. MITPress.

50

Neural-Symbolic Learning and Reasoning

Feldman, J. (2013). The neural binding problem(s). Cognitive neurodynamics, 7 (1), 1–11.

Fernlund, H. K., Gonzalez, A. J., Georgiopoulos, M., & DeMara, R. F. (2006). Learningtactical human behavior through observation of human performance. Systems, Man,and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 36 (1), 128–140.

Fodor, J. A. (2001). Language, thought and compositionality. Mind & Language, 16 (1),1–15.

Fodor, J. A., & Pylyshyn, Z. W. (1988). Connectionism and cognitive architecture: A criticalanalysis. Cognition, 28 (1), 3–71.

Foerster, J. N., Assael, Y. M., de Freitas, N., & Whiteson, S. (2016). Learning to communi-cate to solve riddles with deep distributed recurrent Q-networks.. arXiv. 1602.02672v1[cs.AI] 8 Feb 2016.

Franca, M., Zaverucha, G., & Garcez, A. (2014). Fast relational learning using bottomclause propositionalization with artificial neural networks. Machine Learning, 94 (1),81–104.

Fransen, E., Tahvildari, B., Egorov, A. V., Hasselmo, M. E., & Alonso, A. A. (2006). Mecha-nism of graded persistent cellular activity of entorhinal cortex layer v neurons. Neuron,49 (5), 735–746.

Funahashi, K.-I. (1989). On the approximate realization of continuous mappings by neuralnetworks. Neural Networks, 2 (3), 183–192.

Garcez, A., Besold, T. R., de Raedt, L., Foldiak, P., Hitzler, P., Icard, T., Kuhnberger,K.-U., Lamb, L., Miikkulainen, R., & Silver, D. (2015). Neural-Symbolic Learningand Reasoning: Contributions and Challenges. In AAAI Spring 2015 Symposiumon Knowledge Representation and Reasoning: Integrating Symbolic and Neural Ap-proaches, Vol. SS-15-03 of AAAI Technical Reports. AAAI Press.

Garcez, A., Broda, K., & Gabbay, D. M. (2002). Neural-Symbolic Learning Systems: Foun-dations and Applications. Perspectives in Neural Computing. Springer.

Garcez, A., & Gabbay, D. M. (2004). Fibring neural networks. In Proceedings of 19thNational Conference on Artificial Intelligence (AAAI-2004), pp. 342–347. AAAI Press.

Garcez, A., Lamb, L. C., & Gabbay, D. M. (2009). Neural-Symbolic Cognitive Reasoning.Cognitive Technologies. Springer.

Garcez, A., & Zaverucha, G. (1999). The connectionist inductive learning and logic pro-gramming system. Applied Intelligence, 11 (1), 59–77.

Graves, A., Wayne, G., & Danihelka, I. (2014). Neural turing machines.. arXiv. 1410.5401v2[cs.NE], 10 Dec 2014.

Gust, H., Kuhnberger, K.-U., & Geibel, P. (2007). Learning models of predicate logical the-ories with neural networks based on topos theory. In Perspectives of Neural-SymbolicIntegration, pp. 233–264. Springer.

Hammer, B., & Hitzler, P. (Eds.). (2007). Perspectives of Neural-Symbolic Integration.Springer.

51

Besold et al.

Hasselmo, M. E., & Wyble, B. P. (1997). Free recall and recognition in a network modelof the hippocampus: simulating effects of scopolamine on human memory function.Behavioural brain research, 89 (1), 1–34.

Heinke, D., & Humphreys, G. W. (2003). Attention, spatial representation, and visualneglect: Simulating emergent attention and spatial memory in the selective attentionfor identification model (SAIM). Psychological Review, 110 (1), 29–87.

Heuvelink, A. (2009). Cognitive Models for Training Simulations (Dissertation). Ph.D.thesis, Dutch Graduate School for Information and Knowledge Systems (SIKS), VrijeUniversiteit Amsterdam, Amsterdam, The Netherlands.

Hinton, G. E., Osindero, S., & Teh, Y.-W. (2006). A fast learning algorithm for deep beliefnets. Neural Computation, 18 (7), 1527–1554.

Hitzler, P., Holldobler, S., & Seda, A. K. (2004). Logic programs and connectionist networks.Journal of Applied Logic, 2 (3), 245 – 272.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9, 1735–1780.

Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane currentand its application to conduction and excitation in nerve. The Journal of physiology,117 (4), 500–544.

Holldobler, S. (1990). A structured connectionist unification algorithm. In Proceedings ofthe 8th National Conference of the American Association on Artificial Intelligence(AAAI-1990), Vol. 90, pp. 587–593.

Holldobler, S., & Kalinke, Y. (1994). Toward a new massively parallel computational modelfor logic programming. In Proceedings of the Workshop on Combining Symbolic andConnectionist Processing, held at ECAI-94, pp. 68–77.

Holldobler, S., Kalinke, Y., & Storr, H.-P. (1999). Approximating the semantics of logicprograms by recurrent neural networks. Applied Intelligence, 11 (1), 45–58.

Holldobler, S., & Kurfess, F. (1992). CHCL: A connectionist inference system. Springer.

Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention forrapid scene analysis. IEEE Transactions on Pattern Analysis & Machine Intelligence,20 (11), 1254–1259.

Jackendorff, R. (2002). Foundations of Language: Brain, Meaning, Grammar. OxfordUniversity Press.

Jaeger, H. (2014). Controlling recurrent neural networks by conceptors.. arXiv. 1403.3369v1[cs.NE], 13 Mar 2014.

Ji, S., Xu, W., Yang, M., & Yu, K. (2013). 3d convolutional neural networks for humanaction recognition. Pattern Analysis and Machine Intelligence, IEEE Transactionson, 35 (1), 221–231.

Johnson-Laird, P. N. (1983). Mental models: Towards a cognitive science of language, in-ference, and consciousness. No. 6 in Cognitive Science Series. Harvard UniversityPress.

52

Neural-Symbolic Learning and Reasoning

Karpathy, A., & Fei-Fei, L. (2015). Deep visual-semantic alignments for generating imagedescriptions.. arXiv. 1412.2306v2 [cs.CV], 14 Apr 2015.

Karpathy, A., Johnson, J., & Fei-Fei, L. (2015). Visualizing and understanding recurrentnetworks.. arXiv. 1506.02078v2 [cs.LG], 17 Nov 2015.

Kiddon, C., & Domingos, P. (2011). Coarse-to-fine inference and learning for first-orderprobabilistic models. In Proceedings of the Twenty-Fifth AAAI Conference on Artifi-cial Intelligence (AAAI-2011), San Francisco, CA. AAAI Press.

Kieras, D., Meyer, D., Mueller, S., & Seymour, T. (1999). Insights into working memory fromthe perspective of the epic architecture for modemodel skilled perceptual-motor andcognitive human performance. In Miyake, A., & Shah, P. (Eds.), Models of WorkingMemory. Cambridge University Press.

Kowalski, R. (2011). Computational Logic and Human Thinking: How to be ArtificiallyIntelligent. Cambridge University Press.

Krogel, M.-A., Rawles, S., Zelezny, F., Flach, P. A., Lavrac, N., & Wrobel, S. (2003).Comparative evaluation of approaches to propositionalization. In Horvath, T., &Yamamoto, A. (Eds.), Inductive Logic Programming, Vol. 2835 of Lecture Notes inComputer Science, pp. 197–214. Springer.

Lamb, L. C., Borges, R. V., & Garcez, A. (2007). A connectionist cognitive model for tem-poral synchronisation and learning. In Proceedings of the 22nd National Conferenceon Artificial Intelligence (AAAI-2007), pp. 827–832. AAAI Press.

Lehmann, J., Bader, S., & Hitzler, P. (2010). Extracting reduced logic programs fromartificial neural networks. Applied Intelligence, 32 (3), 249–266.

Leitgeb, H. (2005). Interpreted dynamical systems and qualitative laws: From neural net-works to evolutionary systems. Synthese, 146, 189–202.

Li, Z. (2001). Computational design and nonlinear dynamics of a recurrent network modelof the primary visual cortex. Neural computation, 13 (8), 1749–1780.

Lima, P. (2000). Resolution-Based Inference on Artificial Neural Networks. Ph.D. thesis,Imperial College London, London, UK.

Lowd, D., & Domingos, P. (2007). Efficient weight learning for Markov logic networks.In Proceedings of the Eleventh European Conference on Principles and Practice ofKnowledge Discovery in Databases, pp. 200–211, Warsaw, Poland. Springer.

Maass, W. (2002). Paradigms for computing with spiking neurons. In Models of NeuralNetworks IV, pp. 373–402. Springer.

Marcus, G. F. (2003). The algebraic mind: Integrating connectionism and cognitive science.MIT press.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in neuralnets. Bulletin of Mathematical Biophysics, 5, 115–133.

Muggleton, S., & de Raedt, L. (1994). Special issue: Ten years of logic programming induc-tive logic programming: Theory and methods. The Journal of Logic Programming,19, 629–679.

53

Besold et al.

Nath, A., & Domingos, P. (2015). Learning relational sum-product networks.. In Proceedingsof the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-2015), Austin,TX. AAAI Press.

Nersessian, N. (2010). Creating scientific concepts. MIT press.

Newell, A. (1994). Unified theories of cognition. Harvard University Press.

Norman, D. A., & Shallice, T. (1986). Attention to action. In Davidson, R., Schwartz, G.,& Shapiro, D. (Eds.), Consciousness and Self-Regulation, pp. 1–18. Springer.

Norman, K. A., & O’Reilly, R. C. (2003). Modeling hippocampal and neocortical contribu-tions to recognition memory: a complementary-learning-systems approach.. Psycho-logical review, 110 (4), 611.

O’Reilly, R., Busby, R., & Soto, R. (2003). Three forms of binding and their neural sub-strates: Alternatives to temporal synchrony. In Cleeremans, A. (Ed.), The unity ofconsciousness: Binding, integration, and dissociation, pp. 168–192. Oxford UniversityPress Oxford.

O’Reilly, R. C., & Munakata, Y. (2000). Computational explorations in cognitive neuro-science: Understanding the mind by simulating the brain. MIT Press.

Page, M. (2000). Connectionist modelling in psychology: A localist manifesto. Behavioraland Brain Sciences, 23 (04), 443–467.

Pereira, L. M. (2012). Turing is among us. Journal of Logic and Computation, 22 (6),1257–1277.

Pinkas, G. (1991a). Constructing syntactic proofs in symmetric networks. In Advances inNeural Information Processing Systems (NIPS-91), pp. 217–224.

Pinkas, G. (1991b). Symmetric neural networks and propositional logic satisfiability. NeuralComputation, 3, 282–291.

Pinkas, G., Lima, P., & Cohen, S. (2012). Dynamic binding mechanism for retrievingand unifying complex predicate-logic knowledge. In Proceedings of the InternationalConference on Artificial Neural Networks (ICANN) 2012.

Pinkas, G., Lima, P., & Cohen, S. (2013). Representing, binding, retrieving and unifyingrelational knowledge using pools of neural binders. Journal of Biologically InspiredCognitive Architectures (BICA), 6, 87–95.

Pinkas, G. (1995). Reasoning, nonmonotonicity and learning in connectionist networks thatcapture propositional knowledge. Artificial Intelligence, 77 (2), 203–247.

Pinkas, G., & Dechter, R. (1995). Improving connectionist energy minimization. Journalof Artificial Intelligence Research, 3, 223–248.

Pinker, S., Nowak, M. A., & Lee, J. J. (2008). The logic of indirect speech. Proceedings ofthe National Academy of Sciences USA, 105, 833–838.

Pinker, S., & Ullman, M. T. (2002). The past and future of the past tense. Trends incognitive sciences, 6 (11), 456–463.

Plate, T. (1995). Holographic reduced representations. Neural networks, IEEE transactionson, 6 (3), 623–641.

54

Neural-Symbolic Learning and Reasoning

Pnueli, A. (1977). The temporal logic of programs. In Foundations of Computer Science,1977., 18th Annual Symposium on, pp. 46–57.

Pollack, J. B. (1990). Recursive distributed representations. Artificial Intelligence, 46 (1),77–105.

Poon, H., & Domingos, P. (2011). Sum-product networks: A new deep architecture. InProceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence(UAI) 2011, pp. 337–346.

Poon, H., Domingos, P., & Sumner, M. (2008). A general method for reducing the complexityof relational inference and its application to mcmc. In Proceedings of the 23rd AAAIConference on Artificial Intelligence (AAAI 2008).

Raizada, R. D., & Grossberg, S. (2003). Towards a theory of the laminar architecture ofcerebral cortex: Computational clues from the visual system. Cerebral Cortex, 13 (1),100–113.

Reed, S. E., Zhang, Y., Zhang, Y., & Lee, H. (2015). Deep visual analogy-making. In Cortes,C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., & Garnett, R. (Eds.), Advancesin Neural Information Processing Systems 28, pp. 1252–1260. Curran Associates, Inc.

Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine Learning, 62,107–136.

Riedel, S. (2008). Improving the accuracy and efficiency of MAP inference for Markovlogic. In Proceedings of the Twenty-Fourth Conference on Uncertainty in ArtificialIntelligence (UAI-2008), pp. 468–475, Helsinki, Finland. AUAI Press.

Rigotti, M., Rubin, D. B. D., Wang, X.-J., & Fusi, S. (2010). Internal representation oftask rules by recurrent dynamics: the importance of the diversity of neural responses.Frontiers in computational neuroscience, 4.

Rumelhart, D. E., McClelland, J. L., & PDP Research Group (Eds.). (1986). ParallelDistributed Processing: Explorations in the Microstructure of Cognition (Vol. 1 & 2).MIT Press.

Salakhutdinov, R., & Hinton, G. (2009). Deep boltzmann machines. In Proceedings of theInternational Conference on Artificial Intelligence and Statistics, pp. 448–455.

Sandercock, J. (2004). Lessons for the construction of military simulators: a comparison ofartificial intelligence with human-controlled actors. Tech. rep. TR-1614, DSTO.

Scherer, K. (1999). Appraisel theory. In Dalgleish, T., & Power, M. (Eds.), Handbook ofcognition and emotion, pp. 637–663. John Wiley & Sons.

Serre, T., Oliva, A., & Poggio, T. (2007). A feedforward architecture accounts for rapidcategorization. Proceedings of the National Academy of Sciences, 104 (15), 6424–6429.

Shallice, T., & Cooper, R. (2011). The organisation of mind. Oxford University Press.

Shastri, L. (2007). Shruti: A neurally motivated architecture for rapid, scalable inference. InHammer, B., & Hitzler, P. (Eds.), Perspectives of Neural-Symbolic Integration, Vol. 77of Studies in Computational Intelligence, pp. 183–203. Springer.

55

Besold et al.

Shastri, L., & Ajjanagadde, V. (1993). From simple associations to systematic reasoning: Aconnectionist representation of rules, variables and dynamic bindings using temporalsynchrony. Behavioral and brain sciences, 16 (03), 417–451.

Siegelmann, H. (1999). Neural Networks and Analog Computation: Beyond the TuringLimit. Birkhauser.

Silver, D., Huang, A., Maddison, C., Guez, A., Sifre, L., van den Driessche, G., Schrittwieser,J., Antonoglou, I., Panneershelvam, V., Lanctot, M., Dieleman, S., Grewe, D., Nham,J., Kalchbrenner, N., Sutskever, I., Lillicrap, T., Leach, M., Kavukcuoglu, K., Graepel,T., & Hassabis, D. (2016). Mastering the game of Go with deep neural networks andtree search. Nature, 529, 484–489.

Singla, P., & Domingos, P. (2006). Memory-efficient inference in relational domains. InProceedings of the 21st National Conference on Artificial Intelligence (AAAI-2006),pp. 488–493. AAAI Press.

Singla, P., & Domingos, P. (2008). Lifted first-order belief propagation. In Proceedingsof the 23rd AAAI Conference on Artificial Intelligence (AAAI-2008), pp. 1094–1099.AAAI Press.

Sloman, A., & Logan, B. (1999). Building cognitively rich agents using the sim agent toolkit.Communications of the ACM, 42 (3), 71–ff.

Smith, E., & Kosslyn, S. (2006). Cognitive psychology: Mind and brain. Upper Saddle River,NJ: Prentice-Hall Inc.

Smolensky, P. (1986). Information processing in dynamical systems: Foundations of har-mony theory. In Rumelhat, D. E., & McClelland, J. L. (Eds.), Parallel DistributedProcessing: Volume 1: Foundations. MIT Press.

Sourek, G., Aschenbrenner, V., Zelezny, F., & Kuzelka, O. (2015). Lifted relational neu-ral networks. In Proceedings of the Workshop on Cognitive Computation: Integrat-ing neural and symbolic approaches 2015, Vol. 1583 of CEUR Workshop Proceedings.CEUR-WS.org.

Stewart, T. C., & Eliasmith, C. (2008). Building production systems with realistic spikingneurons. In Proceedings of the 30th annual meeting of the Cognitive Science Society.Austin, TX: Cognitive Science Society.

Stolcke, A. (1989). Unification as constraint satisfaction in structured connectionist net-works. Neural Computation, 1 (4), 559–567.

Sun, R. (1994). A neural network model of causality. Neural Networks, IEEE Transactionson, 5 (4), 604–611.

Sun, R. (2002). Hybrid systems and connectionist implementationalism. In Encyclopediaof Cognitive Science, pp. 697–703. Nature Publishing Group (MacMillan).

Sun, R. (2003). Duality of the Mind: A Bottom-up Approach towards Cognition. LawrenceErlbaum Associates.

Sun, R. (2009). Theoretical status of computational cognitive modeling. Cognitive SystemsResearch, 10 (2), 124–140.

56

Neural-Symbolic Learning and Reasoning

Sutskever, I., Hinton, G. E., & Taylor, G. W. (2009). The recurrent temporal restrictedboltzmann machine. In Advances in Neural Information Processing Systems, pp. 1601–1608.

Taatgen, N. A., Juvina, I., Schipper, M., Borst, J. P., & Martens, S. (2009). Too muchcontrol can hurt: A threaded cognition model of the attentional blink. Cognitivepsychology, 59 (1), 1–29.

Tabor, W. (2009). A dynamical systems perspective on the relationship between symbolicand non-symbolic computation. Cognitive Neurodynamics, 3 (4), 415–427.

Taskar, B., Chatalbashev, V., & Koller, D. (2004). Learning associative markov networks. InProceedings of the 21st International Conference on Machine Learning (ICML-2004),pp. 102–111. ACM.

Thagard, P. (2010). How brains make mental models. In Model-Based Reasoning in Scienceand Technology, pp. 447–461. Springer.

Thagard, P., & Stewart, T. C. (2011). The aha! experience: Creativity through emergentbinding in neural networks. Cognitive science, 35 (1), 1–33.

Tieleman, T. (2008). Training restricted Boltzmann machines using approximations to thelikelihood gradient. In Proceedings of the Twenty-Fifth International Conference onMachine Learning (ICML-2008), pp. 1064–1071. ACM.

Towell, G. G., & Shavlik, J. W. (1994). Knowledge-based artificial neural networks. Artificialintelligence, 70 (1), 119–165.

Tran, S., & Garcez, A. (2012). Logic extraction from deep belief networks. In In Proceedingsof the ICML Workshop on Representation Learning, held at ICML-2012.

Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27 (11),1134–1142.

Valiant, L. G. (2000). A neuroidal architecture for cognitive computation. Journal of theACM (JACM), 47 (5), 854–882.

Valiant, L. G. (2003). Three problems in computer science. Journal of the ACM (JACM),50 (1), 96–99.

Valiant, L. G. (2008). Knowledge infusion: In pursuit of robustness in artificial intelligence..In FSTTCS, pp. 415–422. Citeseer.

Van Dalen, D. (2002). Intuitionistic logic. In Handbook of philosophical logic, pp. 1–114.Springer.

van den Bosch, K., Riemersma, J. B., Schifflett, S., Elliott, L., Salas, E., & Coovert, M.(2004). Reflections on scenario-based training in tactical command. In Schiflett,S., Elliott, L., E., S., & M., C. (Eds.), Scaled Worlds: Development, validation andapplications, pp. 1–21. Ashgate.

Van der Velde, F., & De Kamps, M. (2006). Neural blackboard architectures of combinatorialstructures in cognition. Behavioral and Brain Sciences, 29 (01), 37–70.

van Emde Boas, P. (1990). Machine models and simulations. In Handbook of TheoreticalComputer Science A. Elsevier.

57

Besold et al.

van Rooij, I. (2008). The tractable cognition thesis. Cognitive Science, 32, 939–984.

Vardi, M. Y. (1996). Why is modal logic so robustly decidable?. In Descriptive Complexityand Finite Models, Proceedings of a DIMACS Workshop 1996, Princeton, New Jersey,USA, January 14-17, 1996, pp. 149–184.

Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2014). Show and Tell: A Neural ImageCaption Generator.. arXiv. 1411.4555v1 [cs.CV], 17 Nov 2014.

Von Neumann, J. (1945). First draft of a report on the EDVAC. Tech. rep..

Weston, J., Chopra, S., & Bordes, A. (2015). Memory networks.. arXiv. 1410.3916v9 [cs.AI],9 Apr 2015.

Wooldridge, M. (2009). An introduction to multiagent systems. John Wiley & Sons.

Wyble, B., & Bowman, H. (2006). A neural network account of binding discrete items intoworking memory using a distributed pool of flexible resources. Journal of Vision,6 (6), 33–33.

58