Tanel Poder Connecting Hadoop and Oracle · gluent.com 1 Connecting Hadoop and Oracle Tanel Poder a...
Transcript of Tanel Poder Connecting Hadoop and Oracle · gluent.com 1 Connecting Hadoop and Oracle Tanel Poder a...
gluent.com 1
ConnectingHadoopandOracle
TanelPoderalongtimecomputerperformancegeek
gluent.com 2
Intro:Aboutme
• TanelPõder• OracleDatabasePerformancegeek(18+years)• ExadataPerformancegeek• LinuxPerformancegeek• HadoopPerformancegeek
• CEO&co-founder:
ExpertOracleExadatabook
(2nd editionisoutnow!)
Instantpromotion
gluent.com 3
AllEnterpriseDataAvailableinHadoop!
GluentHadoop
Gluent
MSSQL
Tera-data
IBMDB2
BigDataSources
Oracle
AppX
AppY
AppZ
gluent.com 4
GluentOffloadEngine
Gluent
Hadoop
AccessanydatasourceinanyDB&APP
MSSQL
Tera-data
IBMDB2
BigDataSources
Oracle
AppX
AppY
AppZ
PushprocessingtoHadoop
gluent.com 5
BigDataPlumbers
gluent.com 6
Didyouknow?
• ClouderaImpalanowhasanalyticfunctions andcanspillworkarea bufferstodisk• …sincev2.0inOct2014
• Hivehas"storageindexes"(Impalanotyet)• Hive0.11withORCformat,ImpalapossiblywithnewKudustorage
• Hivesupportsrowlevelchanges &ACIDtransactions• …sincev0.14inNov2014• Hiveusesbase+deltatable approach(notforOLTP!)
• SparkSQL isthenewkidontheblock• Actually,Sparkisoldnewsalready->ApacheFlink
gluent.com 7
Scalabilityvs.Features(Hiveexample)$ hive (version 1.2.1 HDP 2.3.1)
hive> SELECT SUM(duration) > FROM call_detail_records> WHERE> type = 'INTERNET‘> OR phone_number IN ( SELECT phone_number> FROM customer_details> WHERE region = 'R04' );
FAILED: SemanticException [Error 10249]: Line 5:17 Unsupported SubQuery Expression 'phone_number': Only SubQuery expressions that are top levelconjuncts are allowed
WeOracleusershavebeenspoiled
withverysophisticatedSQLengine foryears:)
gluent.com 8
Scalabilityvs.Features(Impalaexample)$ impala-shellSELECT SUM(order_total) FROM ordersWHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers
WHERE customer_class = 'Prime');
Query: select SUM(order_total) FROM orders WHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers WHERE customer_class = 'Prime')
ERROR: AnalysisException: Subqueries in OR predicates arenot supported: order_mode = 'online' OR customer_id IN (SELECT customer_id FROM soe.customers WHERE customer_class = 'Prime')
Cloudera:CDH5.3Impala2.1.0
gluent.com 9
Scalabilityvs.Features(Hiveexample2)$ hive (version 1.2.1 HDP 2.3.1)
hive> SELECT > *> FROM> table1> WHERE> ID IN (SELECT id FROM table2 WHERE id<100)> AND ID IN (SELECT id FROM table3 WHERE id<100)> AND ID <100;
FAILED: SemanticException [Error 10249]: Line 7:6 Unsupported SubQuery Expression 'ID': Only 1 SubQuery expression is supported.
gluent.com 10
Hadoopvs.OracleDatabase:Onewaytolookatit
Hadoop Oracle
Cheap,Scalable,Flexible
Sophisticated,Out-of-the-box,
Mature
BigDataPlumbing?
gluent.com 11
3majorbigdataplumbingscenarios
1. Load datafromHadooptoOracle
2. Offload datafromOracletoHadoop
3. Query HadoopdatainOracle
Abigdifferencebetweendatacopy andon-
demanddataquery tools
gluent.com 12
Righttoolfortherightproblem!Howtoknowwhat'sright?
gluent.com 13
UseCase:MoveDatato/fromHadoop:Sqoop
• AparallelJDBC<->HDFSdatacopy tool• Apache2.0licensed(likemostotherHadoopcomponents)• GeneratesMapReducejobsthatconnecttoOraclewithJDBC
OracleDB
MapReduceMapReduceMapReduceMapReduce JDBC
SQOOP
file
Hadoop/HDFSfilefile
file
file
Hivemetadata
gluent.com 14
Example:CopyDatatoHadoopwithSqoop
• Sqoopis"just"aJDBCDBclientcapableofwritingtoHDFS• Itisveryflexible
sqoop import --connect jdbc:oracle:thin:@oel6:1521/LIN112 --username system--null-string '' --null-non-string '' --target-dir=/user/impala/offload/tmp/ssh/sales--append -m1 --fetch-size=5000 --fields-terminated-by '','' --lines-terminated-by ''\\n'' --optionally-enclosed-by ''\"'' --escaped-by ''\"'' --split-by TIME_ID --query "\"SELECT * FROM SSH.SALES WHERE TIME_ID < TO_DATE('
1998-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS','NLS_CALENDAR=GREGORIAN') AND \$CONDITIONS\""
Justanexampleofsqoop syntax
gluent.com 15
Sqoopusecases
1. CopydatafromOracletoHadoop• Entireschemas,tables,partitionsoranySQLqueryresult
2. CopydatafromHadooptoOracle• ReadHDFSfiles,converttoJDBCarraysandinsert
3. Copychanged rows fromOracletoHadoop• SqoopsupportsaddingWHEREsyntaxtotablereads:• WHERE last_chg > TIMESTAMP'2015-01-10 12:34:56'
• WHERE ora_rowscn > 123456789
gluent.com 16
SqoopwithOracleperformanceoptimizations- 1
• IscalledOraOOP• WasaseparatemodulebyGuyHarrisonteamatQuest(Dell)• IncludedinstandardSqoopv1.4.5+ andSqoop2comingtoo!
• sqoop import direct=true …
• ParallelSqoopOraclereadsdonebyblockrangesorpartitions• Previouslyrequiredwideindexrangescans toparallelizeworkload
• ReadOracledatawithfulltablescans anddirectpathreads• MuchlessIO,CPU,buffercacheimpactonproductiondatabases
gluent.com 17
SqoopwithOracleperformanceoptimizations- 2
• DataloadingintoOracleviadirectpathinsert ormerge• Bypassbuffercacheandmostredo/undogeneration
• HCCcompressionon-the-flyforloadedtables• ALTER TABLE fact COMPRESS FOR QUERY HIGH
• Allfollowingdirectpathinsertswillbecompressed
• Sqoopisprobablythebesttoolfordatacopy/move usecase• Forbatchdatamovementneeds• IfusedwiththeOracleoptimizations
gluent.com 18
Whataboutreal-timechangereplication?
• OracleChangeDataCapture• Supportedin11.2– butnotrecommendedbyOracleanymore• Desupportedin12.1
• OracleGoldenGateorDBVisitReplicate
• Somemanualwork/scriptinginvolvedwithallthesetools:
1. SqoopthesourcetablesnapshotstoHadoop2. Replicatechangesto"delta"tablesonHDFSorinHBase3. Usea"merge"viewonHive/Impalatomergebasetablewithdelta
gluent.com 19
“Applying”updates/changesinHadoop
• HDFSfilesareappend-only bydesign(scalabilityreasons)• NorandomwritesorchangingofexistingHDFSfiledata
• Basetable+deltatableapproach1. Abasetable containsoriginal(batchoffloaded)data2. Adeltatable containsanynewversionsofdata3. Amergeviewperformsanouterjointoshowlatestdatafrombase
tableordeltatable(ifany)4. Optional“compaction”–mergingbase+delta togetherinbackground
• HiveACIDtransactionsusethisapproachunderthehood• Cloudera’sKUDU projectusessimilarconceptwithoutHDFS
• …butdeltascanbeinmemoryandstoredinflash(ifavailable)
gluent.com 20
References:IncrementaldataupdatesinHadoop
• Cloudera(Impala)• http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-
fast-data-with-impala-without-kudu/
• Cloudera(Kudu)• https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-
storage-for-fast-analytics-on-fast-data/
• Hortonworks (Hive)• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-
2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html
gluent.com 21
UseCase:Query Hadoopdataasifitwasin(Oracle)RDBMS
Theultimategoal(formeatleast:)
• LowcostandscalabilityofHadoop…
• …togetherwiththesophisticationofOracleSQLengine
• Keepusingyourexisting(Oracle)applicationswithoutre-engineering&rewritingSQL
gluent.com 22
WhatdoSQLqueriesdo?
1. Retrievedata
2. Processdata
3. Returnresults
DiskreadsDecompressionFilteringColumnextraction&pruning
JoinParalleldistributeAggregateSort
FinalresultcolumnprojectionSenddatabackovernetwork…ordumptodisk
gluent.com 23
Howdoesthequeryoffloadingwork?(Ext.tab)
-----------------------------------------------------------| Id | Operation | Name |-----------------------------------------------------------
| 0 | SELECT STATEMENT | || 1 | SORT ORDER BY | ||* 2 | FILTER | || 3 | HASH GROUP BY | ||* 4 | HASH JOIN | ||* 5 | HASH JOIN | ||* 6 | EXTERNAL TABLE ACCESS FULL| CUSTOMERS_EXT |
|* 7 | EXTERNAL TABLE ACCESS FULL| ORDERS_EXT || 8 | EXTERNAL TABLE ACCESS FULL | ORDER_ITEMS_EXT |-----------------------------------------------------------
Dataretrievalnodes
Dataprocessingnodes
• Dataretrieval nodesdowhattheyneedtoproducerows/cols• ReadOracledatafiles,readexternaltables,callODBC,readmemory
• Intherestoftheexecutionplan,everythingworksasusual
gluent.com 24
Howdoesthequeryoffloadingwork?
Dataretrievalnodesproducerows/columns
Dataprocessingnodesconsumerows/columns
Dataprocessing nodesdon'tcarehowandwheretherowswerereadfrom,thedataflowformatisalwaysthesame.
gluent.com 25
Howdoesthequeryoffloadingwork?(ODBC)
--------------------------------------------------------| Id | Operation | Name | Inst |IN-OUT|
--------------------------------------------------------| 0 | SELECT STATEMENT | | | || 1 | SORT ORDER BY | | | |
|* 2 | FILTER | | | || 3 | HASH GROUP BY | | | ||* 4 | HASH JOIN | | | ||* 5 | HASH JOIN | | | |
| 6 | REMOTE | orders | IMPALA | R->S || 7 | REMOTE | customers | IMPALA | R->S || 8 | REMOTE | order_items | IMPALA | R->S |
--------------------------------------------------------Remote SQL Information (identified by operation id):----------------------------------------------------
6 - SELECT `order_id`,`order_mode`,`customer_id`,`order_status` FROM `soe`.`orders` WHERE `order_mode`='online' AND `order_status`=5 (accessing 'IMPALA.LOCALDOMAIN' )
7 – SELECT `customer_id`,`cust_first_name`,`cust_last_name`,`nls_territory`,
`credit_limit` FROM `soe`.`customers` WHERE `nls_territory` LIKE 'New%' (accessing 'IMPALA.LOCALDOMAIN' )
8 - SELECT `order_id`,`unit_price`,`quantity` FROM `soe`.`order_items` (accessing'IMPALA.LOCALDOMAIN' )
Dataretrievalnodes
RemoteHadoopSQL
Dataprocessingbusinessasusual
gluent.com 26
EvaluateHadoop->OracleSQLprocessing
• Whoreadsdisk?
• Whoandwhenfilterstherowsandprunescolumns?
• Whodecompressesdata?
• Whoconvertstheresultset toOracleinternaldatatypes?
• WhopaystheCPUprice?!
gluent.com 27
Sqoopingdataaround
HDFS
Hadoop Oracle
OracleDB
Sqoop
Map-reduceJobs
IO
JDBCScan,join,aggregate
Datatypeconversion
Loadtable
ReadIOWriteIO+Filter
gluent.com 28
OracleSQLConnectorforHDFS
HDFS
Hadoop Oracle
OracleDB
Joins,processing
etc
libhdfs
libjvm
JavaHDFSclient
ExternalTable
Filter+Datatypeconversion
Nofilterpushdown intoHadoop.
ParseTextfileorreadpre-convertedDataPumpfile
gluent.com 29
QueryusingDBlinks&ODBCGateway
HDFS
Hadoop Oracle
OracleDB
Impala/Hive
Decompress,Filter,Project
IO+filter
Thriftprotocol
ODBCGateway Joins,
processingetc
Datatypeconversion
ODBCdriver
gluent.com 30
BigDataSQL
OracleBDAHDFS
Hadoop Oracle
OracleExadataDB
OracleStorageCellDecompress,Filter,Project
IO+filter
iDB
HiveExternalTable
Joins,processing
etcDatatypeconversion
rows
gluent.com 31
Oracle<->Hadoopdataplumbingtools- capabilities
Tool LOADDATA
OFFLOADDATA
ALLOWQUERYDATA
OFFLOADQUERY
PARALLELEXECUTION
Sqoop Yes Yes Yes
OracleLoaderforHadoop
Yes Yes
OracleSQLConnectorforHDFS
Yes Yes Yes
ODBCGateway
Yes Yes Yes
Big DataSQL
Yes Yes Yes Yes
GluentOffloadEngine
Yes Yes Yes Yes Yes
gluent.com 32
Oracle<->Hadoopdataplumbingtools- overhead
Tool Load datatouse
DecompressCPU
FilteringCPU DatatypeConversion
Sqoop Oracle Hadoop Oracle Oracle
OracleLoaderforHadoop
Oracle Hadoop Oracle Hadoop
OracleSQLConnectorforHDFS
Oracle Oracle /Hadoop*
ODBCGateway
Hadoop Hadoop Oracle
Big DataSQL
Hadoop Hadoop Hadoop
GluentOffloadEngine
Hadoop Hadoop Hadoop
ParsetextfilesonHDFSorreadpre-convertedDataPumpbinary
Oracle12c+BDA+Exadataonly
OKforoccasionalarchivequery
Oracle11g,12c,noplatformrestrictions
gluent.com 33
ExampleUseCase
DataWarehouseOffload
gluent.com 34
DimensionalDWdesign
Hotnessofdata
DWFACTTABLESinTeraBytes
HASH(customer_id)
RANG
E(order_d
ate)
OldFactdatararelyupdated
Facttablestime-partitioned
DDDD
D
D
DIMENSIONTABLESinGigaBytes
Months toyearsofhistory
Aftermultiple joinsondimension tables–afullscanisdoneon
thefacttable
Somefilterpredicatesdirectlyonthefacttable(timerange),countrycodeetc)
gluent.com 35
DWDataOffloadtoHadoop
CheapScalableStorage(e.g HDFS+Hive/Impala)
HotDataDim.Tables
HadoopNode
HadoopNode
HadoopNode
HadoopNode
HadoopNode
ExpensiveStorage
ExpensiveStorage
Time-partitionedfacttable HotDataColdDataDim.Tables
gluent.com 36
1.CopyDatatoHadoopwithSqoop
• Sqoopis"just"aJDBCDBclientcapableofwritingtoHDFS• Itisveryflexible
sqoop import --connect jdbc:oracle:thin:@oel6:1521/LIN112 --username system--null-string '' --null-non-string '' --target-dir=/user/impala/offload/tmp/ssh/sales--append -m1 --fetch-size=5000 --fields-terminated-by '','' --lines-terminated-by ''\\n'' --optionally-enclosed-by ''\"'' --escaped-by ''\"'' --split-by TIME_ID --query "\"SELECT * FROM SSH.SALES WHERE TIME_ID < TO_DATE('
1998-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS','NLS_CALENDAR=GREGORIAN') AND \$CONDITIONS\""
Justanexampleofsqoop syntax
gluent.com 37
2.LoadDataintoHive/Impalainread-optimizedformat
• Theadditionalexternaltablestepgivesbetterflexibilitywhenloadingintothetargetread-optimizedtable
CREATE TABLE IF NOT EXISTS SSH_tmp.SALES_ext (PROD_ID bigint, CUST_ID bigint, TIME_ID timestamp, CHANNEL_ID bigint, PROMO_ID bigint, QUANTITY_SOLD bigint, AMOUNT_SOLD bigint
)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '"' LINES TERMINATED BY '\n'
STORED AS TEXTFILE LOCATION '/user/impala/offload/tmp/ssh/sales'
INSERT INTO SSH.SALES PARTITION (year) SELECT t.*, CAST(YEAR(time_id) AS SMALLINT) FROM SSH_tmp.SALES_ext t
gluent.com 38
3a.QueryDataviaaDBLink/HS/ODBC
• (I'mnotcoveringtheODBCdriverinstallandconfig here)
SQL> EXPLAIN PLAN FOR SELECT COUNT(*) FROM ssh.sales@impala;Explained.
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);--------------------------------------------------------------| Id | Operation | Name | Cost (%CPU)| Inst |IN-OUT|--------------------------------------------------------------| 0 | SELECT STATEMENT | | 0 (0)| | || 1 | REMOTE | | | IMPALA | R->S |--------------------------------------------------------------
Remote SQL Information (identified by operation id):----------------------------------------------------
1 - SELECT COUNT(*) FROM `SSH`.`SALES` A1 (accessing 'IMPALA’)
TheentirequerygetssenttoImpalathankstoOracleHeterogenous Services
gluent.com 39
3b.QueryDataviaOracleSQLConnectorforHDFSCREATE TABLE SSH.SALES_DP( "PROD_ID" NUMBER,
"CUST_ID" NUMBER,"TIME_ID" DATE,"CHANNEL_ID" NUMBER,
"PROMO_ID" NUMBER,"QUANTITY_SOLD" NUMBER,"AMOUNT_SOLD" NUMBER
)
ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER
DEFAULT DIRECTORY "OFFLOAD_DIR"
ACCESS PARAMETERS( external variable data
PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream')
LOCATION( 'osch-tanel-00000', 'osch-tanel-00001', 'osch-tanel-00002', 'osch-tanel-00003',
'osch-tanel-00004', 'osch-tanel-00005' )
)REJECT LIMIT UNLIMITED
NOPARALLEL
Theexternalvariabledataoptionsaysit'saDataPumpformatinputstream
hdfs_stream istheJavaHDFSclientappinstalledintoyourOracleserveraspartoftheOracleSQLConnectorforHadoop
Thelocationfilesaresmall.xmlconfig filescreatedbyOracleLoaderforHadoop(tellingwhereinHDFSourfilesare)
gluent.com 40
3b.QueryDataviaOracleSQLConnectorforHDFS
• ExternalTableaccessisparallelizable!• MultipleHDFSlocationfilesmustbecreatedforPX(donebyloader)
SQL> SELECT /*+ PARALLEL(4) */ COUNT(*) FROM ssh.sales_dp;
----------------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | || 1 | SORT AGGREGATE | | | | || 2 | PX COORDINATOR | | | | || 3 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) |
| 4 | SORT AGGREGATE | | Q1,00 | PCWP | || 5 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 6 | EXTERNAL TABLE ACCESS FULL| SALES_DP | Q1,00 | PCWP | |
----------------------------------------------------------------------------------
OracleParallelSlavescanaccessadifferentDataPumpfilesonHDFS
Anyfilterpredicatesarenotoffloaded.Alldata/columnsareread!
gluent.com 41
3c.QueryDataviaBigDataSQL
SQL> CREATE TABLE movielog_plus2 (click VARCHAR2(40))3 ORGANIZATION EXTERNAL
4 (TYPE ORACLE_HDFS5 DEFAULT DIRECTORY DEFAULT_DIR6 ACCESS PARAMETERS (7 com.oracle.bigdata.cluster=bigdatalite8 com.oracle.bigdata.overflow={"action":"truncate"}9 )
10 LOCATION ('/user/oracle/moviework/applog_json/')
11 )12 REJECT LIMIT UNLIMITED;
• Thisisjustonesimpleexample:
ORACLE_HIVEwoulduseHivemetadatatofigureoutdatalocationandstructure(butstoragecellsdothediskreadingfromHDFSdirectly)
gluent.com 42
4.QueryDataviaGluentSmartConnector
• Iwillstaycivilizedandwillnotturnthisintoasalespresentation…
• …butcheckouthttp://gluent.comformoreinfo;-)
gluent.com 43
Howtoquerythefulldataset?
• UNIONALL• LatestpartitionsinOracle:(SALEStable)• OffloadeddatainHadoop:(SALES@impala orSALES_DP ext tab)
SELECT PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD
FROM SSH.SALESWHERE TIME_ID >= TO_DATE(' 1998-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS')
UNION ALL
SELECT "prod_id", "cust_id", "time_id", "channel_id", "promo_id", "quantity_sold", "amount_sold"
FROM SSH.SALES@impalaWHERE "time_id" < TO_DATE(' 1998-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS')
CREATE VIEW app_reporting_user.SALES AS
gluent.com 44
Hybridtablewithselectiveoffloading
CheapScalableStorage(e.g HDFS+Hive/Impala)
HotDataDim.Tables
HadoopNode
HadoopNode
HadoopNode
HadoopNode
HadoopNode
ExpensiveStorage
TABLEACCESSFULLEXTERNALTABLEACCESS/DBLINK
SELECT c.cust_gender, SUM(s.amount_sold)
FROM ssh.customers c, sales_v s
WHERE c.cust_id = s.cust_idGROUP BY c.cust_gender
SALESunion-allviewHASHJOIN
GROUPBY
SELECTSTATEMENT
gluent.com 45
PartiallyOffloadedExecutionPlan--------------------------------------------------------------------------| Id | Operation | Name |Pstart| Pstop | Inst |--------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | || 1 | HASH GROUP BY | | | | ||* 2 | HASH JOIN | | | | || 3 | PARTITION RANGE ALL | | 1 | 16 | || 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 16 | || 5 | VIEW | SALES_V | | | || 6 | UNION-ALL | | | | |
| 7 | PARTITION RANGE ITERATOR| | 7 | 68 | || 8 | TABLE ACCESS FULL | SALES | 7 | 68 | || 9 | REMOTE | SALES | | | IMPALA |--------------------------------------------------------------------------
2 - access("C"."CUST_ID"="S"."CUST_ID")
Remote SQL Information (identified by operation id):----------------------------------------------------
9 - SELECT `cust_id`,`time_id`,`amount_sold` FROM `SSH`.`SALES` WHERE `time_id`<'1998-07-01 00:00:00' (accessing 'IMPALA' )
gluent.com 46
Performance
• VendorX:"WeloadNTerabytesperhour"
• VendorY:"WeloadNBillionrowsperhour"
• Q:Whatdatatypes?• SomuchcheapertoloadfixedCHARs• …butthat'snotwhatyourrealdatamodellookslike.
• Q:Howwiderows?• 1widevarchar column or500numericcolumns?
• DatatypeconversionisCPUhungry!
Makesureyouknowwhereinstackyou'llbepayingtheprice
Andisitonceoralwayswhenaccessed?
gluent.com 47
Performance:DataRetrievalLatency
• Cloudera Impala• Lowlatency(subsecond)
• Hive• UselatestHive0.14+withallthebells'n'whistles (Stinger,TEZ+YARN)• Otherwiseyou'llwaitforjobsandJVMstostartupforeveryquery• ThenextHortonworks HDPrelease(2.4?)hasHiveLLAPdaemons• WithLLAP,JVMcontainerswillnotbestartedforsimplescans/filters
• OracleSQLConnectorforHDFS• Multi-secondlatencyduetohdfs_streamJVMstartup
• OracleBigDataSQL• Lowlatencythanksto"ExadatastoragecellsoftwareonHadoop"
Timetofirstrow/byte
gluent.com 48
BigDataSQLPerformanceConsiderations
• Onlydataretrieval (theTABLEACCESSFULLetc)isoffloaded!• Filterpushdownetc
• Alldataprocessing stillhappensintheDBlayer• GROUPBY,JOIN,ORDERBY,AnalyticFunctions,PL/SQLetc
• JustlikeonExadata…• Storagecellsonlyspeedupdataretrieval
gluent.com 49
Thanks!
Wearehiringdevelopers&dataengineers!!!http://gluent.com
Also,theusual:
http://[email protected]