S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California,...
-
Upload
anabel-flynn -
Category
Documents
-
view
213 -
download
0
Transcript of S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California,...
![Page 1: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/1.jpg)
S. Nishimura S. Nishimura (NEC Service Platforms Labs.), (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. AbbadiS. Das, D. Agrawal, A. Abbadi
(University of California, Santa Barbara)(University of California, Santa Barbara)
MD-HBase: A Scalable Multi-dimensional Data MD-HBase: A Scalable Multi-dimensional Data Infrastructure for Location Aware ServicesInfrastructure for Location Aware Services
Presenter: Zhuo Liu
![Page 2: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/2.jpg)
Page 2
Overview
▐ A Motivating Story
▐ Existing Technologies
▐ Our proposal
▐ Evaluation
▐ Conclusion
![Page 3: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/3.jpg)
Page 3
Motivating Scenario: Mobile Coupon Distribution
Coupon
CurrentLocation Current
LocationCurrentLocation
Distribution Policy
• Area• # of coupons
Mobile CouponDistributer
![Page 4: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/4.jpg)
Page 4
Motivating Scenario: Mobile Coupon Distribution
CurrentLocation
CurrentLocation
CurrentLocation
CurrentLocation
CurrentLocation Current
Location
CurrentLocation
CurrentLocation
CurrentLocation
CurrentLocation
CurrentLocation Current
Location
Distribution Policy• Area• # of coupons
CouponCouponCoupon
Large amounts of DataHigh Throughput
System Scalability
Multi-Dimensional QueryNearest Neighbors Query
Efficient Complex Queries
125,000,000 subscribersin Japan
![Page 5: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/5.jpg)
Page 5
Existing Technologies
Multi-dimensional
QueriesScalability
Relational DBs
Spatial DBs
Commercial products
but expensive
Open source products
Key-Value Stores
What We Want
at a reasonable price
![Page 6: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/6.jpg)
Page 6
Ordered Key-Value Stores
key00
key11
keynn
key00
key01
key0X
value00
value01
value0X
key11
key12
key1Y
value11
value12
value1Y
keynn valuenn
Index
BucketsSorted by key
Good at 1-D Range Query
LongitudeTime
Latit
ude
But, our target is multi-dimensional…
ex. BigTable HBase
![Page 7: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/7.jpg)
Page 7
Naïve Solution: Linearlization
key00
key11
keynn
key00
key01
key0X
value00
value01
value0X
key11
key12
key1Y
value11
value12
value1Y
keynn valuenn
Projects n-D space to 1-D space
Simple, but problematic…
Apply a Z-ordering curve…
5 7 13 15
4 6 12 14
1 3 9 11
0 2 8 10
![Page 8: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/8.jpg)
Page 8
Problem: False positive scans
▐ MD-query on Linearized spaceTranslate a MD-query to
linearized range query.• Ex. Query from 2 to 9.
Scan queried linearized range.Filter points out of the queried area.
• ex. blue-hatched area (4 to 7)
Require the boundary information of
the original space.
5 7 13 15
4 6 12 14
1 3 9 11
0 2 8 102
9
![Page 9: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/9.jpg)
Page 9
Build a Multi-dimensional Index Layer on top of an Ordered Key-Value store
Our Approach: MD-HBase
Single Dimensional IndexMulti-Dimensional Index
Ordered Key-Value Storeex. BigTable, HBase, …
MD-HBase
![Page 10: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/10.jpg)
Page 10
Introduce Multi-dimensional Index
▐ Multi-dimensional Index (ex. The K-d tree, The Quad tree)Divide a space into subspaces containing almost same # of pointsOrganize subspaces as tree
Efficient subspace pruning → to avoid false positive scans
Divide into Organize as
![Page 11: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/11.jpg)
Page 11
Space Partition By the K-d tree
0101 0111 1101 1111
0100 0110 1100 1110
0001 0011 1001 1011
0000 0010 1000 1010
Binary Z-ordering space
00 01 10 11
11
10
01
00
0101 0111 1101 1111
0100 0110 1100 1110
0001 0011 1001 1011
0000 0010 1000 1010
00 01 10 11
11
10
01
00
Partitioned space bythe K-d tree
How do we represent these subspaces?
bitwise interleavingex. x=00, y=11 → 0101
![Page 12: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/12.jpg)
Page 12
Key Idea: The longest common prefix naming scheme
0101 0111 1101 1111
0100 0110 1100 1110
0001 0011 1001 1011
0000 0010 1000 1010
00 01 10 11
11
10
01
00
000* 1***
Subspaces represented as the longest common prefix of keys!
Remarkable Property• Preserve boundary information
of the original space
1***
Left-bottomcorner
Right-topcorner
1000 1111
*→0 *→1
(10, 00) (11, 11)
![Page 13: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/13.jpg)
Page 13
Build an index with the longest common prefix of keys
0101 0111 1101 1111
0100 0110 1100 1110
0001 0011 1001 1011
0000 0010 1000 1010
00 01 10 11
11
10
01
00000* 001*
01**
1***
000*
001*
01**
1***
Index
Buckets
allocate per subspace
000*
001*
01**
1***
![Page 14: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/14.jpg)
Page 14
Reconstruct the boundary Info. &Check whether intersecting the queried area
Multi-dimensional Range Query
0101 0111 1101 1111
0100 0110 1100 1110
0001 0011 1001 1011
0000 0010 1000 1010
00 01 10 11
11
10
01
00
000*
001*
01**
10**
11**
Index
Filter
001*
000*
001*
10**
11**
01**
10**
Scan
Scan
Subspace Pruning
Scan 0010 -1001on the index
![Page 15: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/15.jpg)
Page 15
K Nearest Neighbors Query
▐ The best first algorithm can be applied. the most efficient technique in practical case
▐ Check the detail in our paper
1 2
4
3
5
![Page 16: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/16.jpg)
Variations of Storage Layer
Table Share Model Uses single table, Maintain bucket boundary Most space efficiency Bucket co-location may cause
disk access congestions
Table per Bucket Model Allocates a table per bucket Most flexible mapping
One-to-one, one-to-many, many-to-one Bucket split is expensive
Copy all points to the new buckets.
Region per Bucket Model Allocates a region per bucket Most bucket split efficiency
Asynchronous bucket split Requires modification of HBase
![Page 17: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/17.jpg)
Page 17
Experimental Results: Multi-dimensional Range Query
Dataset: 400,000,000 points Queries: select objects within MD ranges and change selectivity Cluster size: 4 nodes MD-HBase responses 10~100 times faster than others
and responses proportional time to selectivity.
1
10
100
1000
0.01 0.1 1 10
Selectivity (%)
Res
po
nse
Tim
e (S
ec)
MD-HBase HBase(ZOrder) MapReduce
![Page 18: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/18.jpg)
Page 18
Experimental Results: k Nearest Neighbors Query
Dataset: 400,000,000 points Queries: choose a point and change the number of neighbors Cluster size: 4 nodes MD-HBase responses 1.5 sec where k 100, ≦
and 11 sec even if k = 10,000
0
2
4
6
8
10
12
1 10 100 1000 10000
k: Number of Neighbors
Res
po
nse
Tim
e (S
ec)
![Page 19: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/19.jpg)
Page 19
Experimental Results: Insert
Dataset: spatially skewed data generated by zipfian distribution MD-HBase shows good scalability without significant overhead.
0
50,000
100,000
150,000
200,000
250,000
0 4 8 12 16 20
Number of nodes
Th
ou
gh
pu
t(r
eco
rds/
sec)
MD-HBase
Hbase(Zorder)
![Page 20: S. Nishimura (NEC Service Platforms Labs.), S. Das, D. Agrawal, A. Abbadi (University of California, Santa Barbara) MD-HBase: A Scalable Multi-dimensional.](https://reader030.fdocuments.us/reader030/viewer/2022033108/56649f285503460f94c4149e/html5/thumbnails/20.jpg)
Page 20
Conclusions
Designed a scalable multi-dimensional data store. Scalability & Efficient multi-dimensional queries Key Idea: indexing the longest common prefix of keys Easily extend general ordered key-value stores.
Demonstrated scalable insert throughput and excellent query performance.
Range Query: 10-100 times faster than existing technologies. kNN Query: 1.5 s when k 100.≦ Insert: 220K inserts/sec on 16 nodes cluster without overhead
Thank you. Any Questions?