Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

22
Coprocessors – Uses, Abuses Solutions 26 // SEPTEMBER // 2016 COPYRIGHT 2016 BLOOMBERG FINANCE L.P. ALL RIGHTS RESERVED. Esther Kundin (With guest appearance by Clay Baenziger)

Transcript of Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Page 1: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Coprocessors – Uses, Abuses Solutions

• 26 // SEPTEMBER // 2016

COPYRIGHT 2016 BLOOMBERG FINANCE L.P. ALL RIGHTS RESERVED.

Esther Kundin(With guest appearance by Clay Baenziger)

Page 2: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Coprocessors

Page 3: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

What is a coprocessor?– Custom jar loaded into HBase daemon process– Endpoint – like a stored procedure– Observer – like a trigger

Page 4: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Observers– Region Observer

• preGet• postGet• prePut• postPut

– WAL Observer– Master Observer

• runs in HBase master• Create, Delete, Modify table

Page 5: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Why use a coprocesor?– Simple filter or aggregation run on your data– Reduces amount of data being sent to the client– NOT for complex data analysis– Ex: Apache Phoenix (“We put the SQL back in

NoSQL”)

Page 6: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

PORT – A sample use case

Page 7: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Post-Get example

RegionServer

postGet

Key Col1 Col2 Col3 Col4 Col5Key1Abc 1 4 5

Key1Def 2 2 2Key1Xyz 10 11 12

Key1 Abc-col1

Def-col2 Abc-col3

Abc-col4

Xyz-col5

Key1 1 2 4 5 12

Table Representation:

Coprocessor Result:

Page 8: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Problems and Solutions

Page 9: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Coprocessors crash regionservers– Exceptions (other than IOExceptions) in the

coprocessor bring down the RegionServer– In other cases, the coprocessor silently unloads

Page 10: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Solution – catch all exceptionspublic final void prePut(...) throws IOException { try { prePutImpl(…); } catch(IOException ex) { // Allow IOExceptions to propagate // They won't cause an unload throw ex; } catch(Throwable ex) { // Wrap other exceptions as IOException LOG.error("prePut: caught ", ex); throw new IOException(ex); }}

Page 11: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Coprocessors can hog memory– Memory is shared with RegionServer memory and

coprocessor memory– Memory hogging slows RegionServer Performance

Page 12: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Solutions - defensive Java code– Profile all coprocessor code for memory usage

• Use a generic profiler with a driver for your coprocessor

– Use common Java tricks for limiting memory usage• Use primitive types and underlying arrays where

possible• Use immutable objects• StringBuilder vs String concatenation

Page 13: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Problems with deployment– Manual Deployment

• disable table• assign new coprocessor• enable table

– Rollout of non-backward-compatible coprocessor difficult

Page 14: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Solutions– HBASE-7639 – online schema update is enabled,

perhaps it will work– Hard-code jar path in hbase-site.xml

• Used by Apache Phoenix• Not the best approach for user-defined coprocessors

Page 15: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Logging and metrics tips– Update log4j.properties file with a separate log

parameter for coprocessors– Use MDC context to pass parameters to all parts of

the coprocessor(http://www.slf4j.org/api/org/slf4j/MDC.html)

– Create an extra column in a Result to pass back an object populated with metrics

Page 16: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Unsolved issues– Bad request can bring down the whole cluster– Missing jar will bring down the RegionServerERROR org.apache.hadoop.hbase.coprocessor.CoprocessorHost: The coprocessor fooCoprocessor threw java.io.FileNotFoundException: File does not exist: /path/to/corprocessor.jar java.io.FileNotFoundException: File does not exist: /path/to/corprocessor.jar

Page 17: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

(Preventing) Abuses

Page 18: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Load Failures– Affects all region servers – one at a time– Affect some operations and not others (e.g. scan works, not get)– HTable descriptors contain coprocessor class:

Clean-up can be messy HBASE-14190 - Assign system tables ahead of user region assignment

– Set table property:hbase.coprocessor.abortonerror to false

2016-09-24 02:32:07,366 ERROR org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost: Failed to load coprocessor net.clayb.hbase.coprocessor.RegionObserverjava.io.FileNotFoundException: File does not exist: hdfs://Test/user/foo/clayCoprocessor.jar(Region server stays alive only table stays disabled)

Page 19: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Handler Failure– RPC starvation is simple and non-obvious failure:

public class RegionObserverInfinity extends BaseRegionObserver { public void preGetOp(…) throws IOException { for(;;){ LOG.trace(“Off I go…”); }}

– Use jstack to see what is up in a region server:clay@hbase-regionserver:~$ sudo jstack 3990[…]net.clayb.RegionObserverInfinity.preGetOp(…) @bci=12, line=28 (Compiled frame; information may be imprecise)

Page 20: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Coprocessor Whitelisting– Coprocessors are key to HBase operation:

• security.access.AccessController• security.token.TokenProvider• security.access.SecureBulkLoadEndpoint• security.access.AccessController• MultiRowMutationEndpoint

– hbase.coprocessor.user.enabled – disables all user coprocessors (e.g. Apache Phoenix)

– HBASE-16700 – “Allow for coprocessor whitelisting” or abuse HBASE-15686

Page 21: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Recap– Coprocessors are dangerous:

Coprocessors are an advanced feature of HBase and are intended to be used by system developers only. – HBase Book

– Write defensive code!– Needed from the community

• Story for coprocessor deployment• Process isolation• JMX metrics

Page 22: Coprocessors - Uses, Abuses, Solutions - presented at HBaseCon East 2016

Thank you!