Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive...

15
Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive Knowledge Browsing Gwendolin Wilke, Sandro Emmenegger, Jonas Lutz, and Michael Kaufmann Abstract The Lokahi Enterprise Knowledge Browser provides an intuitive and flexi- ble way to query a company’s intranet knowledge. In addition to conventional search capabilities, it allows the user to browse through a semi-automatically generated knowledge map that visualizes intranet knowledge as a network/graph structure of semantic relations that are extracted top-down from structured documents, as well as bottom-up from unstructured documents. This paper describes the underlying fuzzy graph data structure, the method for extracting concepts and associations from text documents, and the merging of the resulting data structure with a predefined enterprise ontology. Keywords Enterprise search · Knowledge browsing · Knowledge map · Knowledge graph · Fuzziness · Association rule mining · Connectivism G. Wilke(B ) Institute of Business Information Management IWI, Lucerne University of Applied Sciences and Arts, Zentralstr. 9, 2940, 6002 Lucerne, Switzerland e-mail: [email protected] S. Emmenegger · J. Lutz Institute for Information Systems, University of Applied Sciences and Arts Northwestern Switzerland, Riggenbachstr. 16, 4600 Olten, Switzerland e-mail: {sandro.emmenegger,jonas.lutz}@fhnw.ch M. Kaufmann School of Engineering and Architecture, Lucerne University of Applied Sciences and Arts, Technikumstr. 21, 6048 Horw, Switzerland e-mail: [email protected] © Springer International Publishing Switzerland 2016 T. Andreasen et al. (eds.), Flexible Query Answering Systems 2015, Advances in Intelligent Systems and Computing 400, DOI: 10.1007/978-3-319-26154-6_34 445

Transcript of Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive...

Page 1: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-DownKnowledge Graphs for Intuitive KnowledgeBrowsing

Gwendolin Wilke, Sandro Emmenegger, Jonas Lutz,and Michael Kaufmann

Abstract TheLokahiEnterpriseKnowledgeBrowser provides an intuitive andflexi-ble way to query a company’s intranet knowledge. In addition to conventional searchcapabilities, it allows the user to browse through a semi-automatically generatedknowledge map that visualizes intranet knowledge as a network/graph structure ofsemantic relations that are extracted top-down from structured documents, as wellas bottom-up from unstructured documents. This paper describes the underlyingfuzzy graph data structure, the method for extracting concepts and associations fromtext documents, and the merging of the resulting data structure with a predefinedenterprise ontology.

Keywords Enterprise search ·Knowledge browsing ·Knowledgemap ·Knowledgegraph · Fuzziness · Association rule mining · Connectivism

G. Wilke(B)

Institute of Business Information Management IWI, Lucerne University of Applied Sciencesand Arts, Zentralstr. 9, 2940, 6002 Lucerne, Switzerlande-mail: [email protected]

S. Emmenegger · J. LutzInstitute for Information Systems, University of Applied Sciences and Arts NorthwesternSwitzerland, Riggenbachstr. 16, 4600 Olten, Switzerlande-mail: {sandro.emmenegger,jonas.lutz}@fhnw.ch

M. KaufmannSchool of Engineering and Architecture, Lucerne University of Applied Sciences and Arts,Technikumstr. 21, 6048 Horw, Switzerlande-mail: [email protected]

© Springer International Publishing Switzerland 2016T. Andreasen et al. (eds.), Flexible Query Answering Systems 2015,Advances in Intelligent Systems and Computing 400,DOI: 10.1007/978-3-319-26154-6_34

445

Page 2: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

446 G. Wilke et al.

1 Introduction

Value creation in companies relies increasingly on knowledge work; that is, creatingmanaging and exchanging information. Intranet search engines are often indispens-able tools to make explicitly stored information accessible for knowledge workers.Yet, often, a desired piece of information can only be found if context-specific key-words are known, and, as a result, much of the know-how and the stored informationoften remains inaccessible. To address this issue,Kaufmann et. al. [9] proposed a con-ceptual framework for an enterprise knowledge browser, which, in addition to con-ventional search functionality, provides theuserwith thepossibility tovisuallybrowsethrough a company’s knowledge landscape, which is represented as a semantic graphstructure. A prototype of the enterprise knowledge browser including an intuitive, in-teractive graphical user interface (GUI) - is currently under development as part of theresearch projectLOKAHI Inside in collaborationwith the software companyFIVE In-formatikAG1, andwecall it theLokahi Enterprise Knowledge Browser (LEKB).Thispaper contributes amoredetaileddescriptionof theconcepts introduced in [9], namelythe method for extracting concepts and associations, and the method for merging thisextracted information with a predefined enterprise ontology.

The paper is structured as follows: Section 2 gives a brief overview of the compo-nents and design principles of the LEKB as far as they are relevant for the underlyingLokahi knowledge graph discussed in this paper. Section 3 gives a brief accountof related approaches to networked knowledge representation. Section 4 introducesthe Lokahi Knowledge Graph (LKG) as an extension of the graph representationunderlying the Entity-Relationship-Model [3]. Sections 5 and 6 describe the Lokahibottom-up and top-down knowledge extraction approaches, respectively, togetherwith their prototypical implementations and formal representations asLokahiKnowl-edge Graphs. The merging of the two graphs is described in section 7, and the resultis the graph structure underlying the LEKB. Finally, section 8 concludes the paperand gives an outlook on further work.

2 The Lokahi Enterprise Knowledge Browser

The project Lokahi is currently developing a search interface that is based on aninteractive fuzzy graph of keyword associations to enable flexible query answering.A screenshot of the current implementation is presented in Figure 1.

The user can search for keywords, and the system displays (1) a list of associatedfiles and (2) a visualization of a browsable interactive graph that contains associatedkeywords. If the user enters a search keyword, the system returns not only a standardlist of relevant documents, but also a knowledge map that indicates other keywordsassociated with the search. Thereby, the user can browse and explore the knowledgelandscape. By double-clicking on nodes in the graph, the user can add keywords tothe filter. Single-clicking on a node navigates the display is to show the selected node

1 http://www.fiveinfo.ch

Page 3: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 447

Fig. 1 Screenshot of the current Lokahi Enterprise Knowledge Browser GUI prototype.

as the new central concept. The example in figure 1 is based on bottom-up extractionof keywords and associations from a sample of Wikipedia pages, using the methoddescribed in section 5.

The Lokahi Enterprise Knowledge Browser comprises three major components:Thefirstcomponent,knowledgerepresentation,istheLokahiKnowledgeGraph(LKG),whichisafuzzydirected labeledmultigraph. It is theunderlyingdatastructureandcon-ceptual basis for the Lokahi Enterprise Knowledge Browser and provides the formalrepresentation of the visualized semantic network of interconnected keywords.

The second component is the knowledge extraction approach, which generates aconcrete instantiation of an LKG from data. This approach consists of three steps:In the first step, a so-called bottom-up Lokahi Knowledge Graph is generated by au-tomatic information extraction from unstructured documents (text files, wiki pages,emails, websites, etc.). In the second step, structured data, such as data available fromenterprise application systems, are enrichedwith generic knowledge on enterprise ar-chitectures to define an enterprise ontology. The result is a so-called top-down LokahiKnowledge Graph. In the third step, both LKGs are merged, and the resulting LKG isthe semantic network that is visualized by theLokahi EnterpriseKnowledgeBrowser.

The third component is the GUI design, which is of paramount importance foruser acceptance of the LEKB: A semantic network like the LKG is usually verycomplex, comprising numerous keywords and semantic connections. As a result, itusually cannot be represented as a planar graph, and, furthermore, it is usually far toobig to be displayed as a whole. Therefore, we only display clippings (subgraphs) ofthe underlying LKG that each represent a local semantic context of a user-selectedkeyword. Here, it is essential to provide an intuitiveway of visually browsing through(and editing) the complete underlying knowledge graph based on single clippings.

Page 4: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

448 G. Wilke et al.

In [9], we described the conceptual framework, design principles and generalarchitecture of theLokahiEnterpriseKnowledgeBrowser.While the third component(GUI design) will be detailed in a separate publication, the present paper elaboratesin greater detail the first and second components, namely the underlying graph datastructure and the knowledge engineering approach used to enrich it with data, cf.sections 5-7.

2.1 Supporting the Connectivist Learning Paradigm

Connectivism [18] is a relatively new learning theory that was developedwith a focuson understanding knowledge in present-day socio-technological interactions. It ex-plains learning in terms of connecting new information within a network of alreadyknown concepts. While the classic learning theories (behaviorism, cognitivism andconstructivism) adopt the view that learning only occurs inside a person’s brain, con-nectivism additionally addresses learning that occurs in the external world. Knowl-edge, in that understanding, can be seen as a subclass of information that is usable inthe sense of Shadbolt [17]. Using technology, connection making can be representedoutside the human brain, e.g., in computer systems. Therefore, connectivism canexplain organizational learning in terms of making connections in the organizationalmemory. A knowledge management system that supports the process of connectivistlearning can be called a connective knowledge management system (CKMS), andit is this idea that underlies the Lokahi Enterprise Knowledge Browser (LEKB). ACKMS empowers individuals and organizations to cope with large amounts of fastchanging information by providing them with tools to interactively and iterativelybuild up a personal or organizational knowledge network structure that embeds sin-gle pieces of information in a conceptual context. It thus provides a human-centeredapproach to knowledge management. The LEKB implements the idea of a CKMS byallowing the user to visually browse (and edit) the organizational knowledge networkstructure and to access related documents directly in the GUI.

2.2 Top-Down Knowledge Structures ComplementBottom-Up Knowledge Extraction

The knowledge network structure underlying the LEKB is generated using both,bottom-up and top-down knowledge engineering approaches. In the bottom-up ap-proach, automated keyword extraction from unstructured documents (e.g., text files,emails, or wiki-pages) is used together with fuzzy association rule mining to gener-ate a bottom-up fuzzy knowledge graph. In the complimentary top-down approach, anRDFS representation of an enterprise ontology is semi-automatically generated fromstructured documents (e.g., froma customer relationshipmanagement system) and anenterprise architecture framework. The result is a top-down (crisp) knowledge graph.Both graphs are merged. The merge is driven by the usually much smaller top-downgraph, so that the top-down structure (which may be specified by the management)

Page 5: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 449

forms the structural skeleton of the LEKB. It also helps linking company internal vo-cabulary to general terms and technical language, thereby mitigating the problem ofintuitive labelling in pure bottom-up approaches.The automatic generation of bottom-up keywords and their association with the top-down skeleton in turn mitigates theproblem of expensive manual maintenance present in pure top-down approaches.

2.3 Fuzzy Associations for Intuitive Browsing

The Lokahi knowledge graph is a fuzzy graph. The semantic associations betweenkeywords are represented as fuzzy directed arcs, and the fuzzy membership degreereflects the strength of their semantic connection. In the (crisp) visualization in theGUI, the fuzziness of semantic connections is used to implement a semantic zoomingfunctionality that resembles zooming in a geographic map: Zooming out emphasizestermswith stronger semantic connections and blanks termswithweaker connections;as one zooms in, more detail becomes visible. As a consequence, users can carryout interaction with the system on a suitable level of information granularity, whichprevents them from getting lost in a flood of detail. Since the top-down graph is crisp,the semantic connections between top-down terms are always visible and providethe visual “skeleton” of the map.

3 Related Approaches

Network structures for knowledge representation have a longstanding history in dif-ferent application areas and research fields, and we give only a brief account of someprominent approaches that are related to our approach. One of the first to proposenetwork structures for capturing knowledge was C. S. Peirce, who in 1882 proposedexistential graphs as a form of visual notation for logical expressions [14]. In 1962,M. Masterman [12] introduced semantic networks as a form of graph-based nota-tion for linguistic expressions that abstract from the “base language.” Based on thisidea, J. Sowa [19] devised conceptual graphs (CG), which map natural languagequestions and assertions to a relational database. CGs are bipartite graphs, in whichconcepts (entities) and associations (relationships) are represented by two differenttypes of nodes.

Another approach is the semantic networks of M. R. Quillian [16], which areintended to represent the semantics of English words in the form of a network. In1976, P. Chen wrote his seminal paper in which he introduced the entity relationshipmodel for knowledge representation [3]. J. D. Novak and D. B. Gowin [13] didresearch in the domain of human learning and knowledge construction. Their studiesled to the development of concept maps, which enable users to visualize knowledgethat is easily comprehensible by others. Ten years later, [2] introduced topic maps,which led to the ISO Standard 13250 of the workgroup ISO JTC1/SC34/WG32. Aselaborated in the following section, the knowledge network representation in the

2 http://www.topicmaps.org/xtm/

Page 6: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

450 G. Wilke et al.

form of a fuzzy labeled dimultigraph as proposed here is derived from Chen’s entityrelationship model.

4 The Lokahi Knowledge Graph and Its Relation to theER-Model

In Chen’s entity relationship (ER) model [3], an entity is “a ‘thing’ that can bedistinctly identified”, and a relationship is “an association among entities.” Basedon that, a simplified version that omits attributes, an ER-graph can be defined as adirected weighted graph G = (V, A, W, p, w) according to [10, pp.1-2], with setsV, A and W , an incidence mapping p : A → V 2 and a weight function w. Here, Grepresents a knowledge network with entity weights and relationships. Every entityis represented by a vertex v ∈ V ; every relationship is represented by an arc a ∈ A,and every relationship role is a symbolic weight (or label) w ∈ W . The incidencemapping p defines two mappings s, t : A → V by (s(a), t (a)) := p(a), wheres(a) is called the origin (start, head) of a, and t(a) is the tail of a. For every arc a,the symbolic weight w(a) represents the role of the entity t (a) in the relationship.Accordingly, arcs represent knowledge of typed associations between entities.

In order to build a knowledge network structure according to the design principleslisted in subsections 2.1-2.3, we extend the above notion of a directed weighted(labeled) graph by a fuzzy membership function for arcs.

Definition 1. The Lokahi Knowledge Graph (LKG) is a fuzzy labeled dimultigraphL = (�, V, A, μA, σ, τ, λ), where

– � is a finite alphabet.– V ⊆ �� is the vertex set of L (where vertices are strings over �).– A is the arc set of L.– μA : A → [0, 1] is a fuzzy membership function that indicates the degree ofmembership of an arc a ∈ A in L;

– σ : A → V and τ : A → V are two maps that indicate the source and targetvertex of an arc;

– λ : V ∪ A → �� is a labeling function for vertices and arcs.

Notice that the arcs of a Lokahi Knowledge Graph are fuzzy, while the vertex setis crisp.

5 The Bottom-Up Lokahi Knowledge Graph

The bottom-up part of the Lokahi knowledge extraction method automatically ex-tracts keywords from a given corpus of text documents in a company. These keywordsand their relations represent implicit knowledge, which is usually distributed over amultitude of documents in the form of unstructured and semi-structured data. Weassume that there is an intersubjective semantic distance measure between keywords

Page 7: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 451

that derives from an organization’s collective body of knowledge. For any givenkeyword b, the set of keywords that are semantically close to b (w.r.t. this distancemeasure) can be said to establish the semantic context of b. It is the goal of thebottom-up part of the Lokahi knowledge extraction to quantify the semantic distanceof keywords within a corpus of documents and then visualize their resulting semanticcontexts as a browsable visual knowledge graph.

In order to quantify semantic distance, we use the normalized likelihood ratio(NLR)measure, introduced by [8, p.55]. Portmann et. al. [15] have proposed to applythe NLR to text mining as a semiotic and inductive approach to knowledge retrieval.The NLR is based on the likelihood measure L(b j |bi ): Given a document d that con-tains a keyword of interest bi , L(b j |bi ) expresses the strength of evidence that a docu-mentd that containsbi also containsb j . According to the lawof likelihood [5], it is theratio of likelihoods that is relevant for comparing two hypotheses, not the likelihoodsthemselves. Instead of the likelihood L(b j |bi ) itself, Kaufmann uses the normalizedlikelihood ratio li j to measure semantic distance of keywords in the corpus:

li j := L(b j |bi )

L(b j |bi ) + L(−b j |bi )∈ [0, 1]. (1)

li j is a normalized comparison of the hypotheses b j and −b j under evidence bi .Here, L(b j |bi ) is measured as the number of documents in the corpus that containboth keywords, bi and b j , while L(−b j |bi ) is measured as the number of documentsthat contain bi but not b j . Notice that li j is not necessarily symmetric, and thatthe property expresses the fact that an intuitive notion of semantic closeness is notnecessarily symmetric as well. For example, if John Doe is a Java programmer inthe company, the keyword Java is very closely associated with John Doe, because itis his main occupation. Yet, John Doe may not be the only Java programmer in thecompany and the company may be specialized in Java-based software development.In this case, the semantic binding of the keyword John Doe to Java would intuitivelybe much less close.

The following subsection 5.1 gives a brief outline of a prototypical implementationof the bottom-up knowledge extraction method, and subsection 5.2 provides theformalization of its output as a Lokahi knowledge graph.

5.1 Implementation

Our prototypical implementation of the bottom-up knowledge extraction methodconsists of three parts:

1. Keyword Extraction: In the first step, a set B ⊆ �� of keywords is extractedfrom the corpus C of documents. Here, �� denotes the set of words over theunderlying alphabet � (the Unicode character set). We use the Apache Lucene

Page 8: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

452 G. Wilke et al.

library MoreLikeThis (MLT)3 to extract for each document d the set κ(d) of topn most relevant keywords of the document and then define the set of bottom upconcepts B as the union of all top-n keywords over all documents in the corpus,B = ∪d∈Cκ(d).

2. Frequent Item Set Mining: The second step is a pre-processing step that is usedto decrease the number of keyword-pairs to be processed in the computationallyintensive step three. Using Agrawal’s Apriori algorithm [1], we extract fromthe keyword set B a set F of two-element subsets of keywords. F is definedby F := {{bi , b j }|bi , b j ∈ B, bi �= b j , δ(bi , b j ) ≥ t}. This set is called thefrequent keyword set of C . Here, {bi , b j } is in F , if bi and b j frequently occurin the same document, i.e., if the number of documents δ(bi , b j ) that containboth keywords, bi and b j , is greater than a given threshold t . In our prototypeimplementation, the parameter t was set to t = 0.005, i.e., 0.5% of all documentsmust have contained both keywords.

3. Fuzzy Association Rule Mining: In the third and last step, every two-keywordset {bi , b j } ∈ F is used to generate two fuzzy association rules li j/(bi → b j )

and l j i/(b j → bi ). Here, li j is the normalized likelihood ratio defined in (1),and li j/(bi → b j ) refers to the association rule (bi → b j ) having a fuzzy degreeof membership of li j in the set R := {(bi → b j )|bi , b j ∈ f ∈ F, bi �= b j } ofcrisp association rules.

5.2 Graph Representation

The output of the Lokahi bottom-up knowledge extraction approach described in theforegoing subsection can be formally represented as the fuzzy set R := (R, μR)

of fuzzy association rules between keywords. Here, R is the set of association rulesgenerated in step 3, and μR : R → [0, 1], μR(bi → b j ) = li j , is the correspondingfuzzy membership function. Together with the underlying alphabet �, R can berepresented as a Lokahi Knowledge Graph:

Definition 2. Let � be a finite alphabet, B ⊆ �� and R = (R, μR) a fuzzy set ofassociation rules μR/(bi → b j ) with (bi → b j ) ∈ R, bi , b j ∈ B and μR : R →[0, 1]. The Bottom-Up Lokahi Knowledge Graph (L K G) induced by the pair (�,R)

is the Lokahi Knowledge Graph L = (�, V , A, μA, σ , τ , λ) with

– � := �.– V := head(R) ∪ tail(R) ⊆ �� is the vertex set of L. Here, for every associationrule (bi → b j ) ∈ R, head((bi → b j )) := bi and tail((bi → b j )) := b j .

– A := arc(R) is the arc set ofL. Here, arc is a bijection thatmaps every associationrule (bi → b j ) ∈ R to a unique arc identifier ai j ∈ A.

– μA := μR ◦ arc−1 : A → [0, 1] is the fuzzy membership function that associatesevery arc ai j ∈ A with the fuzzy membership degree μA(ai j ) = μR((bi → b j )).

3 https://lucene.apache.org/core/4_8_0/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html (viewed on2015-04-25)

Page 9: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 453

– σ := head ◦ arc−1 : A → V and τ := tail ◦ arc−1 : A → V indicate sourceand target vertex of an arc, respectively.

– λ : V ∪ A → �� labels every vertex with itself and every arc with the empty string

λ(x) :={

x if x ∈ V ,

” ” if x ∈ A.

Notice that the bottom-up Lokahi Knowledge graph L is not a “true” multigraphbecause every pair of vertices (bi , b j ) ∈ V × V can be mapped to exactly one arcai, j ∈ A. Yet, the property is needed later to merge the bottom-up with the top-downknowledge graph, which can have more than one arc between the same vertices.

6 The Top-Down Lokahi Knowledge Graph

The top-down part of the Lokahi knowledge extraction approach consists of twoknowledge engineering steps: In the first step, the existing enterprise meta ontologyARCHIMeo4 is manually extended by an enterprise application ontology that is de-rived from the company’s business context, cf. figure 2(a). Using ARCHIMeo putsthe company specific structure in the context of an enterprise architecture frame-work that serves as a structural “skeleton” in the Lokahi Knowledge Browser5. Inthe second step, the ontology model is semi-automatically enriched with data fromthe company’s application software systems and exported into a simple Java basedrepresentation of a Lokahi knowledge graph.

Fig. 2 (a) The parts of the ARCHIMeo meta ontology. (b) Example details of ARCHIMeoafter step 1.

4 ARCHIMeo has been developed at the University of Applied Sciences NorthwesternSwitzerland (FHNW) as an approach toward relating enterprise ontologies with enterprisearchitectures [6, 7, 20]. It has been successfully applied in risk management, contract man-agement and software integration support [4, 11, 21].

5 We also intend to use it for inferring additional information from themerged graph structuresin future research.

Page 10: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

454 G. Wilke et al.

6.1 Implementation

The existing enterprise ontology ARCHIMeo includes a top-level ontology (includ-ing classes such as PhysicalLocation), as well as a formalization of the enterprise ar-chitecture framework ArchiMate6 as an enterprise upper ontology (modeling classessuch as BusinessActor). In the first step, we manually extend it by an enterpriseapplication ontology, cf. figure 2(b). In the second step, the resulting ontology issemi-automatically enriched with data and transformed into a Java-based graph rep-resentation. Our prototypical implementation of this step follows a common extract,transform and load (ETL) process with some post-processing:

1. Extract from Source Systems:The database schema ismanually extracted fromexisting enterprise systems7 into a file with a character-separated values (CSV)format. Here, one line represents one instance (record) of the source system’sobject type, such as a specific employee.

2. Transform and Load Into Enterprise Ontology:Simple Java-adapters for eachdata source file read the data records and store them as instances of ontologyclasses in the enterprise ontology. Here, each instance and class is uniquely iden-tified by a URI (uniform resource identifier) that is generated from the respectiverecord’s key attribute values as specified in the source system. We also derivedatatype properties (representing directed relations between instances of classesand RDF literals) and object properties (representing directed relations betweeninstances of two classes) from the source systems’ record sets.

3. Infer Human Readable Keywords: In the ontology, URIs are used as represen-tatives of classes and instances. Yet URIs cannot be used as their representativesin the graphical user interface of the Lokahi knowledge browser, because theydo not make sense to a human user.We therefore infer human readable keywordsusing SPIN8 (SPARQL9 Inferencing Notation) rules. We also infer human read-able keywords for properties. For instance, persons are labeled with their firstand family names (e.g., John andDoe), and if JohnDoe is employed by companyXY, the respective object property is labeled employee.

4. Transform Into Java-Based Graph Representation: The relevant parts of theontology are transformed into a simple in-memory graph representation that isbased on the two Java classesNode andAssociation. To theNode classwemap allinstances of the ontology, all classes that contain an instance, and all literals. Allproperties are mapped to the Association class. A node is characterized by the IDattribute. In the case of a class or instance node, the ID is its URI; in the case of aliteral node, the ID is the literal itself. The label attribute contains the keywordsinferred in step 3. Every association is attached to a node and is characterizedby the label attribute, which again captures the inferred keyword, and the value

6 www.opengroup.org/subjectareas/enterprise/archimate7 In our prototype, these are a human resource (HR) system, a customer relationship manage-ment (CRM) system and a project management system.

8 http://spinrdf.org/9 http://www.w3.org/TR/rdf-sparql-query/

Page 11: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 455

attribute, which refers to the ID of the instance or literal the association pointsto.10

Notice that we cannot represent the top-down knowledge graph using only labels(keywords) without losing the ability to distinguish two entities with the same label(e.g., two persons with the same first and family names). This is one of the keyfeatures of the top-down knowledge structure that can help enhance a pure bottom-upapproach. We use a node’s label attribute only to provide a human-understandablekeyword in the GUI instead of URIs. Using IDs (URIs) in the knowledge graphallows us to display the same label multiple times in different semantic contexts.For example, one John Doe may be working in HR, while another John Doe may beemployed as a Java programmer in the same company. The user cannot distinguishthem based on their label, but he can distinguish them based of the label’s semanticcloseness to the keywords HR department and Java, respectively. Distinguishingand finding information based on semantic context is the core feature of the LokahiKnowledge Browser.

Notice also that associations do not have IDs, in spite of the fact that two nodesmay be connected by more than one association (i.e., “multiple associations” arepossible). Here, we assume that association labels are sufficient to distinguish multi-ple associations. E.g., John Doe may be programmer and project leader in projectXY,and the two associations may be labeled with programmer and project leader, re-spectively.

6.2 Graph Representation

The output of the Lokahi top-down knowledge extraction approach described inthe previous subsection can be formally represented by the triple (N , labelN ,P),where N is the set of node-IDs (an ID being sufficient for uniquely specifying anode), labelN : N → �� the function that assigns to a node the label generatedfor it via a SPIN rule, and P is the set of triples of type id × value × label, eachuniquely characterizing an association (i.e., a property). It can formally be describedas a Lokahi Knowledge Graph:

Definition 3. Let � be a finite alphabet, N ⊆ ��, labelN : N → ��, and P ⊆N × N × �� a crisp set of labeled associations. The top-down Lokahi knowledgegraph (L K G) induced by the triple (N , labelN ,P) is the Lokahi knowledge graphL = (�, V , A, μ A, σ , τ , λ) with

– � := �.

– V := N ⊆ �� is the vertex set of L.– A := arc(P) is the arc set of L, where arc is a bijection that maps every (i, j, k) ∈P to a unique arc identifier ai jk ∈ A.

10 In our prototypical implementation, we actually use additional attributes to optimize user-friendly GUI interaction. Since these attributes do not affect the underlying graph structure,we do not introduce them here.

Page 12: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

456 G. Wilke et al.

– μ A :≡ 1 : A → [0, 1] is the constant function indicating full membership degreeof all arcs.

– σ := p1 ◦ arc−1 : A → V and τ = p2 ◦ arc−1 : A → V indicate the source andtarget vertex of an arc, respectively.

– λ := V ∪ A → �� is the labelling function for vertices and arcs defined by

λ(x) :={

labelN (x) if x ∈ V ,

p3(x) if x ∈ A.

Notice that the top-down Lokahi Knowledge Graph is not a “true” fuzzy graph,since all membership degrees are equal to 1.

7 The Merged Lokahi Knowledge Graph

This section discusses how bottom-up and top-down Lokahi knowledge graphs aremerged to form the graph structure that underlies the Lokahi Knowledge Browser.

7.1 Implementation

To build the merged Lokahi knowledge graph, we again use the Java-based in-memory graph representation introduced in subsection 6.1, which is based on theJava classes Node and Association. The merge is driven by the much smaller top-down graph: For each node of the top-down graph, an identical bottom-up node islooked up in the bottom-up graph. In the case of a match, a single “merged” node isbuilt and labeled with the top-down label11.

While we replace identical bottom-up and top-down entities (nodes) by a singleentity (node), bottom-up and top-down associations between identical node pairs aredifferentiated, and the merge is implemented by building multiple associations. Therationale for this approach is that bottom-up and top-down nodes can be identical(both represent unique entities in the world), while bottom-up and top-down asso-ciations always have a different semantic: A bottom-up association expresses therelevance of one entity for another based on likelihood of co-occurence in a docu-ment, and is expressed as a fuzzy degee of membership in the entity’s context. Incontrast, top-down associations are user-modeled crisp associations with a specifiedsemantic (such as “is project leader of”).

11 In practice, class and instance nodes represented by URIs will not realistically be matchedwith in the bottom-up graph, i.e., the matching focuses on literal notes (values of datatypeproperties such as a person’s first or last name). If a literal node has a match, the correspond-ing labels also match, since they are identical to the literals, and choosing the top-down labelis valid. If indeed a class or instance node, i.e., a URI, should have a match in the bottom-upgraph, the top-down label is a human-readable keyword, while the bottom-up label is theURI itself. In this case, it is sensible to choose the top-down label.

Page 13: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 457

7.2 Graph Representation

The merging of the bottom-up and top-down knowledge extraction approaches asdescribed in the foregoing subsection can be formally represented as a “merge” ofthe corresponding Lokahi Knowledge Graphs:

Definition 4. Let L = (�, V , A, μA, σ , τ , λ) and L = (�, V , A, μ A, σ , τ , λ) bea bottom-up and a top-down Lokahi knowledge graph, respectively. We define themerge of L and L by

– �′ := � ∪ �.

– V ′ = V ∪ V .– A′ := A ∪ arc′( A), where arc′ is a bijection that makes A and arc′( A) disjoint.– μ′

A′ : A′ → [0, 1] with μ′A′ |A := μA and μ′

A′ | A := μ A.– σ ′ : A′ → V ′ with σ ′|A := σ and σ ′| A := σ .– λ′ := λ .

The output of the merging algorithm is again a Lokahi Knowledge Graph:

Corollary 1. Let L and L be a bottom-up and a top-down Lokahi knowledge graph,respectively. Then L′ as specified in definition 4 is a Lokahi Knowledge Graph, andwe call it the merged Lokahi Knowledge Graph induced by L and L.

Notice that L and L are not induced subgraphs of L′, since bottom-up and top-down arcs between identical node pairs are not identified, but included as multiplearcs in order to retain the different semantic of bottom-up and top-down associations.

8 Conclusions and Further Work

We introduced the Lokahi Knowledge Graph as the underlying data structure ofthe Lokahi Knowledge Browser. We also briefly recapitulated the design princi-ples introduced in [9] for the construction of a concrete instantiation of the LokahiKnowledge Graph. In accordance with these principles, we proposed three knowl-edge engineering steps: 1. the construction of a bottom-up knowledge graph fromunstructured documents by automatic information extraction based on fuzzy asso-ciation rule mining, 2. the semi-automatic construction of a top-down knowledgegraph from structured data using an enterprise meta ontology, and 3. the merging ofthe two graphs. In a separate publication, we will elaborate on the GUI design ofthe Lokahi Knowledge Browser, which is ongoing work. Future work comprises theactive use of the top-down ontology structure to enrich the Lokahi knowledge graphwith information that is inferred from bottom-up knowledge.

Page 14: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

458 G. Wilke et al.

Acknowledgments Part of the presented research has been funded by Five Informatik AGand the Swiss Commission for Technology and Innovation (CTI) as part of the applied researchproject LOKAHI Inside, CTI-No. 16152.1 PFES-ES.

References

1. Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of itemsin large databases. In: Jajodia, P.B.S. (ed.) Proceedings of the 1993 ACM SIGMODInternational Conference on Management of Data, vol. 2. ACM, New York (1993)

2. Biezunski,M.: Introduction to topicmapping. In: SGMLEuropeGCAConference (1997)3. Chen, P.P.S.: The entity-relationship model - toward a unified view of data. ACM Trans.

Database Syst 1(1), 9–36 (1976)4. Emmenegger, S., Laurenzini, E., Thönssen, B.: Improving supply-chain-management

based on semantically enriched risk descriptions. In: Proceedings of the Interna-tional Conference on Knowledge Management and Information Sharing, KMIS 2012,pp. 70–80 (2012)

5. Hawthorne, J.: Inductive logic. In: StanfordEncyclopedia of Philosophy. TheMetaphysicsResearch Lab, Stanford University, Stanford (2008)

6. Hinkelmann, K., Merelli, E., Thönssen, B.: The role of content and context in enterpriserepositories. In: Proceedings of the 2nd International Workshop on Advanced EnterpriseArchitecture and Repositories, AER 2010 (2010)

7. Kang, D., Lee, J., Choi, S., Kim, K.: An ontology-based enterprise architecture. ExpertSyst. Appl. 37(2), 1456–1464 (2010). http://dx.doi.org/10.1016/j.eswa.2009.06.073

8. Kaufmann, M.: Inductive Fuzzy Classification in Marketing Analytics. Doctoral Thesis,Université de Fribourg, Fribourg, Switzerland (2012)

9. Kaufmann,M.,Wilke, G., Portmann, E., Hinkelmann, K.: Combining bottom-up and top-down generation of interactive knowledge maps for enterprise search. In: Buchmann, R.,Kifor,C.V.,Yu, J. (eds.)KSEM2014.LNCS, vol. 8793, pp. 186–197. Springer,Heidelberg(2014)

10. Knauer, U.: Algebraic Graph Theory: Morphisms, Monoids and Matrices, 1st edn. DeGruyter, Berlin (2011)

11. Martin, A., Emmenegger, S., Wilke, G.: Integrating an enterprise architecture ontologyin a case-based reasoning approach for project knowledge. In: Proceedings of the FirstEnterprise Systems Conference (ES 2013) (2013)

12. Masterman,M.: Semanticmessage detection formachine translation, using an interlingua.In: Proc. 1961 International Conf. on Machine Translation, pp. 438–475 (1961)

13. Novak, J.D., Gowin, D.B.: Learning How to Learn. Cornell University (1984). ISBN:9780521319263

14. Peirce, C.S.: On junctures and fractures in logic. In: Writings of Charles S. Peirce:1879–1884, p. 391. Harvard University Press (1882)

15. Portmann, E., Kaufmann, M.A., Graf, C.: A distributed, semiotic-inductive, and human-oriented approach to web-scale knowledge retrieval. In: Proceedings of the 2012 Inter-national workshop on Web-scale knowledge Representation, Retrieval and Reasoning,pp. 1–8. ACM, New York (2012)

16. Quillian, M.R.: Semantic Memory. Ph.D. thesis, Carnegie Institute of Technology (nowCMU) (1966), abridged version in Minsky, pp. 227–270 (1968)

Page 15: Merging Bottom-Up and Top-Down Knowledge Graphs for Intuitive …static.tongtianta.site/paper_pdf/0025b2a2-5ced-11e9-9dd7... · 2019-04-12 · (GUI design) will be detailed in a separate

Merging Bottom-Up and Top-Down Knowledge Graphs 459

17. Shadbolt, N.: Knowledge Technologies. Ingenia 8, 58–61 (2001)18. Siemens, G.: Connectivism: Learning as network creation (2005). http://www.

elearnspace.org/Articles/networks.htm19. Sowa, J.: Conceptual graphs. In: Handbook on Architectures of Information Systems,

pp. 287–311. Springer (1998)20. Thönssen, B.: An enterprise ontology building the bases for automatic metadata gen-

eration. In: Sánchez-Alonso, S., Athanasiadis, I.N. (eds.) MTSR 2010. CCIS, vol. 108,pp. 195–210. Springer, Heidelberg (2010)

21. Thönssen, B., Lutz, J.: Semantically enriched obligation management. In: Proceedingsof 4th Conference on Knowledge Management and Information Sharing (KMIS2012)(2012)