Secondary Indexing in Phoenix
description
Transcript of Secondary Indexing in Phoenix
![Page 1: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/1.jpg)
Secondary Indexing in Phoenix
Jesse YatesHBase CommitterSoftware Engineer
LA HBase User Group – September 4, 2013
![Page 2: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/2.jpg)
2
Agenda• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
LA HUG – Sept 2013
https://www.madison.k12.wi.us/calendars
![Page 3: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/3.jpg)
3
About me
• Developer at Salesforce– System of Record, Phoenix
• Open Source– Phoenix– HBase– Accumulo
LA HUG – Sept 2013
![Page 4: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/4.jpg)
4
Phoenix
• Open Source– https://github.com/forcedotcom/phoenix
• “SQL-skin” on HBase– Everyone knows SQL!
• JDBC Driver– Plug-and-play
• Faster than HBase– in some cases
LA HUG – Sept 2013
![Page 5: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/5.jpg)
5
Why Index?
• HBase is only sorted on 1 “axis”
• Great for search via a single pattern
Example!LA HUG – Sept 2013
![Page 6: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/6.jpg)
6
Example
name:type:
subtype:date:
major:minor:
quantity:
LA HUG – Sept 2013
![Page 7: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/7.jpg)
7
Secondary Indexes
• Sort on ‘orthogonal’ axis
• Save full-table scan
• Expected database feature
• Hard in HBase b/c of ACID considerations
LA HUG – Sept 2013
![Page 8: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/8.jpg)
8
Agenda• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
LA HUG – Sept 2013
![Page 9: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/9.jpg)
9 LA HUG – Sept 2013http://www.wired.com/wiredenterprise/2011/10/microsoft-and-hadoop/
![Page 10: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/10.jpg)
10
Other (Major) Indexing Frameworks
• HBase SEP– Side-Effects Processor– Replication-based– https://github.com/NGDATA/hbase-sep
• Huawei – Server-local indexes– Buddy regions– https://github.com/Huawei-Hadoop/hindex
LA HUG – Sept 2013
![Page 11: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/11.jpg)
11
Agenda• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
LA HUG – Sept 2013
![Page 12: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/12.jpg)
12
Immutable Indexes
• Immutable Rows
• Much easier to implement
• Client-managed
• Bulk-loadable
LA HUG – Sept 2013
![Page 13: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/13.jpg)
13
Bulk Loading
phoenix-hbase.blogspot.com
LA HUG – Sept 2013
![Page 14: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/14.jpg)
14
Index Bulk Loading
Identity Mapper
Custom Phoenix Reducer
LA HUG – Sept 2013
HFile Output Format
![Page 15: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/15.jpg)
15
Index Bulk LoadingPreparedStatement statement = conn.prepareStatement(dmlStatement); statement.execute();
String upsertStmt = "upsert into core.entity_history(organization_id,key_prefix,entity_history_id, created_by, created_date)\n" + "values(?,?,?,?,?)";
statement = conn.prepareStatement(upsertStmt);… //set values
Iterator<Pair<byte[],List<KeyValue>>> dataIterator = PhoenixRuntime.getUncommittedDataIterator(conn);
LA HUG – Sept 2013
![Page 16: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/16.jpg)
16
Agenda• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
LA HUG – Sept 2013
![Page 17: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/17.jpg)
17
The “fun” stuff…
LA HUG – Sept 2013
![Page 18: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/18.jpg)
18
1.5 years
LA HUG – Sept 2013
![Page 19: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/19.jpg)
19
Mutable Indexes
• Global Index
• Change row state– Common use-case– “expected” implementation
• Covered Columns
LA HUG – Sept 2013
![Page 20: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/20.jpg)
20
Usage
• Just SQL!
• Baby name popularity
• Mock demo
LA HUG – Sept 2013
![Page 21: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/21.jpg)
21
Usage• Selects the most popular name for a given yearSELECT name,occurrences FROM baby_names WHERE year=2012 LIMIT 1;
• Selects the total occurrences of a given name across all years SELECT /*+ NO_INDEX */ name,sum(occurrences) FROM baby_names
WHERE name='Jesse' GROUP BY name;
• Selects the total occurrences of a given name across all years allowing an index to be used
SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;
LA HUG – Sept 2013
![Page 22: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/22.jpg)
22
Usage• Update rows due to census inaccuracy
– Will only work if the mutable indexing is workingUPSERT INTO baby_names SELECT year,occurrences+3000,sex,name FROM
baby_names WHERE name='Jesse';
• Selects the now updated data (from the index table)
SELECT name,sum(occurrences) FROM baby_names WHERE name='Jesse' GROUP BY NAME;
• Index table still used in scansEXPLAIN SELECT name,sum(occurrences) FROM baby_names WHERE
name='Jesse' GROUP BY NAME;
LA HUG – Sept 2013
![Page 23: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/23.jpg)
23
Agenda• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes in Phoenix
• Mutable Indexing Internals
• Roadmap
LA HUG – Sept 2013
![Page 24: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/24.jpg)
24
Internals
• Index Management– Build index updates– Ensures index is ‘cleaned up’
• Recovery Mechanism– Ensures index updates are “ACID”
LA HUG – Sept 2013
![Page 25: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/25.jpg)
25
“There is no magic”
- Every programming hipster (chipster)
LA HUG – Sept 2013
![Page 26: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/26.jpg)
26
Mutable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
![Page 27: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/27.jpg)
27
Mutable Indexing: Standard Write Path
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
![Page 28: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/28.jpg)
28
Mutable Indexing
RegionCoprocessor
Host
WAL
RegionCoprocessor
Host
Indexer Builder
WAL Updater
Durable!
IndexerIndex Table
Index TableIndex Table
Codec
LA HUG – Sept 2013
![Page 29: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/29.jpg)
29
Index Management
• Lives within a RegionCoprocesorObserver• Access to the local HRegion• Specifies the mutations to apply to the index
tables
public interface IndexBuilder {public void setup(RegionCoprocessorEnvironment env);public Map<Mutation, String> getIndexUpdate(Put put);public Map<Mutation, String> getIndexUpdate(Delete delete);
}
LA HUG – Sept 2013
![Page 30: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/30.jpg)
30
Why not write my own?
• Managing Cleanup – Efficient point-in-time correctness– Performance tricks
• Abstract access to HRegion– Minimal network hops
• Sorting correctness– Phoenix typing ensures correct index sorting
LA HUG – Sept 2013
![Page 31: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/31.jpg)
31
Example: Managing Cleanup
• Updates can arrive out of order– Client-managed timestamps
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
![Page 32: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/32.jpg)
32
Example: Managing Cleanup
Index Table
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Val1|Row1 Index Fam:Qual 10
Val1|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
![Page 33: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/33.jpg)
33
Example: Managing Cleanup
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
Row1 Fam Qual 11 val4
![Page 34: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/34.jpg)
34
Example: Managing Cleanup
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS VALUE
Row1 Fam Qual 10 val1
Row1 Fam Qual 11 val4
Row1 Fam2 Qual2 12 val2
Row1 Fam Qual 13 val3
![Page 35: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/35.jpg)
35
Example: Managing Cleanup
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:QualFam2:Qual2
12
Va1l|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
![Page 36: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/36.jpg)
36
Example: Managing Cleanup
LA HUG – Sept 2013
ROW FAMILY QUALIFIER TS
Va1|Row1 Index Fam:Qual 10
Val4|Row1 Index Fam:Qual 11
Val4|Val2|Row1 Index Fam:QualFam2:Qual2
12
Va1l|Val2|Row1 Index Fam:QualFam2:Qual2
12
Val3|Val2|Row1 Index Fam:QualFam2:Qual2
13
![Page 37: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/37.jpg)
37
Managing Cleanup
• History “roll up”• Out-of-order Updates• Point-in-time correctness• Multiple Timestamps per Mutation• Delete vs. DeleteColumn vs. DeleteFamily
Surprisingly hard!LA HUG – Sept 2013
![Page 38: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/38.jpg)
38
Phoenix Index Builder
• Much simpler than full index management• Hides cleanup considerations• Abstracted access to local state
LA HUG – Sept 2013
public interface IndexCodec{public void initialize(RegionCoprocessorEnvironment env);public Iterable<IndexUpdate> getIndexDeletes(TableState state;public Iterable<IndexUpdate> getIndexUpserts(TableState state);
}
![Page 39: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/39.jpg)
39
Phoenix Index Codec
LA HUG – Sept 2013
![Page 40: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/40.jpg)
40
Dude, where’s my data?
LA HUG – Sept 2013
Ensuring Correctness
![Page 41: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/41.jpg)
41
HBase ACID
• Does NOT give you:– Cross-row consistency– Cross-table consistency
• Does give you:– Durable data on success– Visibility on success without partial rows
LA HUG – Sept 2013
![Page 42: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/42.jpg)
Key Observation
“Secondary indexing is inherently an easier problem than full transactions… secondary index updates are idempotent.”
- Lars Hofhansl
42 LA HUG – Sept 2013
![Page 43: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/43.jpg)
43
Idempotent Index Updates
• Doesn’t need full transactions
• Replay as many times as needed
• Can tolerate a little lag– As long as we get the order right
LA HUG – Sept 2013
![Page 44: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/44.jpg)
44
Failure Recovery• Custom WALEditCodec– Encodes index updates– Supports compressed WAL
• Custom WAL Reader– Replay index updates from WAL
LA HUG – Sept 2013
<property><name>hbase.regionserver.wal.codec</name> <value>o.a.h.hbase.regionserver.wal.IndexedWALEditCodec</value>
</property><property>
<name>hbase.regionserver.hlog.reader.impl</name> <value>o.a.h.hbase.regionserver.wal.IndexedHLogReader</value>
</property>
![Page 45: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/45.jpg)
45
Failure Situations
• Any time before WAL, client replay
• Any time after WAL, HBase replay
• All-or-nothing
LA HUG – Sept 2013
![Page 46: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/46.jpg)
46
Failure #1: Before WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
![Page 47: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/47.jpg)
47
Failure #1: Before WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
No problem! No data is stored in the WAL, client just retries entire update.
LA HUG – Sept 2013
![Page 48: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/48.jpg)
48
Failure #2: After WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
LA HUG – Sept 2013
![Page 49: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/49.jpg)
49
Failure #2: After WAL
Client HRegion
RegionCoprocessorHost
WAL
RegionCoprocessorHost
MemStore
WAL replayed via usual replay mechanisms
LA HUG – Sept 2013
![Page 50: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/50.jpg)
50
Agenda
• About
• Other Indexing Frameworks
• Immutable Indexes
• Mutable Indexes
• Roadmap
LA HUG – Sept 2013
![Page 51: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/51.jpg)
51
Roadmap
• Next release of Phoenix
• Performance testing
• Increased adoption
• Adding to HBase (?)
LA HUG – Sept 2013
![Page 52: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/52.jpg)
52
Open Source!
• Main: https://github.com/forcedotcom/phoenix
• Indexing:https://github.com/forcedotcom/phoenix/tree/mutable-si
LA HUG – Sept 2013
![Page 53: Secondary Indexing in Phoenix](https://reader036.fdocuments.us/reader036/viewer/2022062310/56816772550346895ddc6011/html5/thumbnails/53.jpg)
(obligatory hiring slide)
We’re Hiring!