Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant...

28
Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia

Transcript of Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant...

Page 1: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

 Speeding Up Batch Alignment of Large Ontologies Using

MapReduce

Uthayasanker Thayasivam and Prashant Doshi

Dept. of Computer Science University of Georgia

Page 2: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Introduction Ontology: formalize the knowledge of a domain

by means of defining concepts and properties that relate them

Page 3: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Introduction: Ontology Alignment

Page 4: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Introduction: Ontology Alignment

Page 5: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Introduction: Ontology Alignment

Page 6: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Problem Definition: Ontology Alignment

Ontology V: Set of Labeled Vertices E: Set of Edges

Set of ordered 2-subset of V L: Mapping from each edge to its label

A correspondence maα between xa ϵ O1 and yα ϵ O2

Relation

Confidence

},,{ r

]1,0[c

find a set of correspondences between two ontologies

O1 = < V1, E1, L1 > and O2 = < V2, E2, L2 >.

𝑚𝑎𝛼=⟨𝑥𝑎 , 𝑦𝛼 ,𝑟𝑎𝛼 ,𝑐𝑎𝛼 ⟩

The ontology alignment problem:

Page 7: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Ontology Alignment Challenges

Improving the Alignment Quality Structural & lexical disparity

Improving the Alignment EfficiencyQuickly producing quality alignment

Improving the Scalability

Ontology Sizes

Efficiency / Quality

Resources

Efficiency / Quality

Page 8: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Space of Alignments

m11 m12 … m1|V2|

m21 m22 … m2|V2|

… … … …

m|V1|1 m|V1|2 … m|V1||V2|

x1

x2

..

x|V1|

y1 y2 … y|V2|

Alignment between

many-to-many

Alignment Space Size:

one-to-many one-to-one

Evaluating An Alignment: Cartesian Product of entities

2¿𝑉 1∨¿𝑉 2∨¿¿ (|𝑉 1|+1)¿𝑉 2∨¿¿ (|𝑉 1|+1 ) !(|𝑉 1|−|𝑉 2|)

|𝑉 1|≥|𝑉 2|

|𝑉 1|×|𝑉2|

&

Page 9: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Space of Alignments

m11 m12 … m1|V2|

m21 m22 … m2|V2|

… … … …

m|V1|1 m|V1|2 … m|V1||V2|

x1

x2

..

x|V1|

y1 y2 … y|V2|

Alignment between

many-to-many

Alignment Space Size:

one-to-many one-to-one

Evaluating An Alignment: Cartesian Product of entities

2¿𝑂1∨¿𝑂2∨¿¿ (|𝑂1|+1)¿𝑂 2∨¿ ¿ (|𝑂1|+1 ) !(|𝑂1|−|𝑂2|)

|𝑂1|≥|𝑂2|

|𝑂1|×|𝑂2|

&

Bipartite graph

Page 10: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Large Ontology Matching

Reduction of alignment spaceEarly pruning of dissimilar element pairs

aflood (Hanif and Masaki ‘09)

Partition based matching Falcon-AO (Jian et. al. ‘05)

Parallel matchingMapPSO (Bock and Hettenhausen ‘10)

VDoc+ (Zhang ‘12)

O2

O1

P11

P12

P13

P21 P2

2 P23

4 blocks

Page 11: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Batch Alignment of Large Ontologies

Scalability is challengingOAEI 2012 - Very Large Biomedical Ontology Track

8 out of 21 tools completed

Ontology repositories (e.g., NCBO at Stanford)Batch alignment of ontologies

New ontologies postedOntologies get updated

Approach allows any alignment algorithm to be utilized on a MapReduce architecture

Page 12: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Contributions: Batch Alignment of Large Ontologies

General & Novel ApproachTo speed up batch alignment of large ontologies using MapReduce

No impact to alignment quality for some algorithmsBenefits ontology repositories

Page 13: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

Page 14: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

output

Key-> Value Key-> <Value1, Value2>Key-> Output

Value

Key identifies a subproblem

Page 15: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

O1O2

O11 O2

1

O31

O12

O22

Page 16: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

O1O2

O11 O2

1

O31

O12

O22

Page 17: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

O1O2

O11 O2

1

O31

O12

O22

Page 18: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

MapReduce Framework

O1O2

O11 O2

1

O31

O12

O22

Page 19: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Mapper & Reducer Algorithms

MAP 1. ← parse the Value in

the record2. emit()3. emit(,)

REDUCE 1. ← align using an

alignment algorithm2. emit(,)

Page 20: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Identifying Alignment Subproblems

Approach: Hamdi et al. 2010 Identify anchors: entity pairs with identical

names or labels Cluster concepts around the anchors

Using structural neighborhoodEntities from one cluster are predominantly in correspondence with entities in one other cluster

Page 21: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Merging Subproblem Alignments

Crisscross mappings Correspondence1: Correspondence2: & is a subclass of and is a subclass of

inconsistent We remove the one with the lower

confidence score while merging.

Redundant mappings Correspondence1: Correspondence2: & is a subclass of

inconsistent We remove

Page 22: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Performance Evaluation

DatasetsConference track from OAEI (120 pairs)Large ontologies from OAEI (SNOMED, NCI, ... 5 pairs)New biomedical ontology testbed (50 pairs from NCBO)

Algorithms

Compare F-measure & runtime Default setup on a single nodeMapReduce setup using Hadoop (12 nodes each with

24 2GB & 2GHz Intel Xeon processors)

Falcon-AO Optima+ LogMap YAM++

Page 23: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Results – 3 Datasets

Algos. Speedup

Confer. LargeOAEI Biomed

Falcon 2 15 5

LogMap 9 16 5

Optima+ 11 64 110

Yam++ 4 22 7

Conference Large OAEI

Biomedical

Page 24: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Results – Large OAEI ontologies

Conference Track No partitioning

No change in output

Ontology Pairs

MapRed./Def.Falcon-AO

MapReduceLogMap

MapRed./Def.Optima+

MapReduceYAM++

Default LogMap

Default YAM++

P R F P R F P R F P R F P R F P R F

mouse, human 73 74 73 96 75 84 78 73 76 95 77 85 92 85 88 94 86 90

STW, TheSoz 57 50 53 57 51 54 18 40 25 55 52 53 69 64 67 60 75 66

fma,nci 95 81 88 95 83 89 96 83 89 97 84 90 95 86 90 98 85 91fma, snomed 85 63 72 85 63 72 84 61 71 86 63 73 97 66 78 97 70 81

snomed, nci 69 58 63 67 58 62 70 58 63 71 58 64 90 64 75 95 60 74

Other Datasets LogMap & Yam++ :

Tradeoff is in the alignment quality

Falcon-AO & Optima+: No change in output

Page 25: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Speedup with # of nodes in the Hadoop cluster

Page 26: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Discussion

First inter-matcher parallelization approachEspecially using MapReduce

Exhibits significant speedup for batch alignmentSome algorithms may find small reduction in alignment

quality due to the partitioning

Significant speedup for single ontology pairFalcon-AO, Optima+ & YAM++

Any alignment algorithm can fit in our framework

Page 27: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Thank you

Questions ?

Page 28: Speeding Up Batch Alignment of Large Ontologies Using MapReduce Uthayasanker Thayasivam and Prashant Doshi Dept. of Computer Science University of Georgia.

Parallel Alignment of Large Ontologieson A Computing Cluster

Current Divide and Conquer ApproachesHeavily rely on structureSize based partitioning techniques are not effective

Current Parallel Matching algorithmsParallelize the process within the algorithmsDo not support multi node – cluster architecture