Speeding Up Batch Alignment of Large Ontologies Using
MapReduce
Uthayasanker Thayasivam and Prashant Doshi
Dept. of Computer Science University of Georgia
Introduction Ontology: formalize the knowledge of a domain
by means of defining concepts and properties that relate them
Introduction: Ontology Alignment
Introduction: Ontology Alignment
Introduction: Ontology Alignment
Problem Definition: Ontology Alignment
Ontology V: Set of Labeled Vertices E: Set of Edges
Set of ordered 2-subset of V L: Mapping from each edge to its label
A correspondence maα between xa ϵ O1 and yα ϵ O2
Relation
Confidence
},,{ r
]1,0[c
find a set of correspondences between two ontologies
O1 = < V1, E1, L1 > and O2 = < V2, E2, L2 >.
𝑚𝑎𝛼=⟨𝑥𝑎 , 𝑦𝛼 ,𝑟𝑎𝛼 ,𝑐𝑎𝛼 ⟩
The ontology alignment problem:
Ontology Alignment Challenges
Improving the Alignment Quality Structural & lexical disparity
Improving the Alignment EfficiencyQuickly producing quality alignment
Improving the Scalability
Ontology Sizes
Efficiency / Quality
Resources
Efficiency / Quality
Space of Alignments
m11 m12 … m1|V2|
m21 m22 … m2|V2|
… … … …
m|V1|1 m|V1|2 … m|V1||V2|
x1
x2
..
x|V1|
y1 y2 … y|V2|
Alignment between
many-to-many
Alignment Space Size:
one-to-many one-to-one
Evaluating An Alignment: Cartesian Product of entities
2¿𝑉 1∨¿𝑉 2∨¿¿ (|𝑉 1|+1)¿𝑉 2∨¿¿ (|𝑉 1|+1 ) !(|𝑉 1|−|𝑉 2|)
|𝑉 1|≥|𝑉 2|
|𝑉 1|×|𝑉2|
&
Space of Alignments
m11 m12 … m1|V2|
m21 m22 … m2|V2|
… … … …
m|V1|1 m|V1|2 … m|V1||V2|
x1
x2
..
x|V1|
y1 y2 … y|V2|
Alignment between
many-to-many
Alignment Space Size:
one-to-many one-to-one
Evaluating An Alignment: Cartesian Product of entities
2¿𝑂1∨¿𝑂2∨¿¿ (|𝑂1|+1)¿𝑂 2∨¿ ¿ (|𝑂1|+1 ) !(|𝑂1|−|𝑂2|)
|𝑂1|≥|𝑂2|
|𝑂1|×|𝑂2|
&
Bipartite graph
Large Ontology Matching
Reduction of alignment spaceEarly pruning of dissimilar element pairs
aflood (Hanif and Masaki ‘09)
Partition based matching Falcon-AO (Jian et. al. ‘05)
Parallel matchingMapPSO (Bock and Hettenhausen ‘10)
VDoc+ (Zhang ‘12)
O2
O1
P11
P12
P13
P21 P2
2 P23
4 blocks
Batch Alignment of Large Ontologies
Scalability is challengingOAEI 2012 - Very Large Biomedical Ontology Track
8 out of 21 tools completed
Ontology repositories (e.g., NCBO at Stanford)Batch alignment of ontologies
New ontologies postedOntologies get updated
Approach allows any alignment algorithm to be utilized on a MapReduce architecture
Contributions: Batch Alignment of Large Ontologies
General & Novel ApproachTo speed up batch alignment of large ontologies using MapReduce
No impact to alignment quality for some algorithmsBenefits ontology repositories
MapReduce Framework
MapReduce Framework
output
Key-> Value Key-> <Value1, Value2>Key-> Output
Value
Key identifies a subproblem
MapReduce Framework
O1O2
O11 O2
1
O31
O12
O22
MapReduce Framework
O1O2
O11 O2
1
O31
O12
O22
…
MapReduce Framework
O1O2
O11 O2
1
O31
O12
O22
MapReduce Framework
O1O2
O11 O2
1
O31
O12
O22
Mapper & Reducer Algorithms
MAP 1. ← parse the Value in
the record2. emit()3. emit(,)
REDUCE 1. ← align using an
alignment algorithm2. emit(,)
Identifying Alignment Subproblems
Approach: Hamdi et al. 2010 Identify anchors: entity pairs with identical
names or labels Cluster concepts around the anchors
Using structural neighborhoodEntities from one cluster are predominantly in correspondence with entities in one other cluster
Merging Subproblem Alignments
Crisscross mappings Correspondence1: Correspondence2: & is a subclass of and is a subclass of
inconsistent We remove the one with the lower
confidence score while merging.
Redundant mappings Correspondence1: Correspondence2: & is a subclass of
inconsistent We remove
Performance Evaluation
DatasetsConference track from OAEI (120 pairs)Large ontologies from OAEI (SNOMED, NCI, ... 5 pairs)New biomedical ontology testbed (50 pairs from NCBO)
Algorithms
Compare F-measure & runtime Default setup on a single nodeMapReduce setup using Hadoop (12 nodes each with
24 2GB & 2GHz Intel Xeon processors)
Falcon-AO Optima+ LogMap YAM++
Results – 3 Datasets
Algos. Speedup
Confer. LargeOAEI Biomed
Falcon 2 15 5
LogMap 9 16 5
Optima+ 11 64 110
Yam++ 4 22 7
Conference Large OAEI
Biomedical
Results – Large OAEI ontologies
Conference Track No partitioning
No change in output
Ontology Pairs
MapRed./Def.Falcon-AO
MapReduceLogMap
MapRed./Def.Optima+
MapReduceYAM++
Default LogMap
Default YAM++
P R F P R F P R F P R F P R F P R F
mouse, human 73 74 73 96 75 84 78 73 76 95 77 85 92 85 88 94 86 90
STW, TheSoz 57 50 53 57 51 54 18 40 25 55 52 53 69 64 67 60 75 66
fma,nci 95 81 88 95 83 89 96 83 89 97 84 90 95 86 90 98 85 91fma, snomed 85 63 72 85 63 72 84 61 71 86 63 73 97 66 78 97 70 81
snomed, nci 69 58 63 67 58 62 70 58 63 71 58 64 90 64 75 95 60 74
Other Datasets LogMap & Yam++ :
Tradeoff is in the alignment quality
Falcon-AO & Optima+: No change in output
Speedup with # of nodes in the Hadoop cluster
Discussion
First inter-matcher parallelization approachEspecially using MapReduce
Exhibits significant speedup for batch alignmentSome algorithms may find small reduction in alignment
quality due to the partitioning
Significant speedup for single ontology pairFalcon-AO, Optima+ & YAM++
Any alignment algorithm can fit in our framework
Thank you
Questions ?
Parallel Alignment of Large Ontologieson A Computing Cluster
Current Divide and Conquer ApproachesHeavily rely on structureSize based partitioning techniques are not effective
Current Parallel Matching algorithmsParallelize the process within the algorithmsDo not support multi node – cluster architecture
Top Related