Tanel Poder Connecting Hadoop and Oracle · gluent.com 1 Connecting Hadoop and Oracle Tanel Poder a...

gluent.com 1

ConnectingHadoopandOracle

TanelPoderalongtimecomputerperformancegeek

gluent.com 2

Intro:Aboutme

• TanelPõder• OracleDatabasePerformancegeek(18+years)• ExadataPerformancegeek• LinuxPerformancegeek• HadoopPerformancegeek

• CEO&co-founder:

ExpertOracleExadatabook

(2nd editionisoutnow!)

Instantpromotion

gluent.com 3

AllEnterpriseDataAvailableinHadoop!

GluentHadoop

Gluent

MSSQL

Tera-data

IBMDB2

BigDataSources

Oracle

AppX

AppY

AppZ

gluent.com 4

GluentOffloadEngine

Gluent

Hadoop

AccessanydatasourceinanyDB&APP

MSSQL

Tera-data

IBMDB2

BigDataSources

Oracle

AppX

AppY

AppZ

PushprocessingtoHadoop

gluent.com 5

BigDataPlumbers

gluent.com 6

Didyouknow?

• ClouderaImpalanowhasanalyticfunctions andcanspillworkarea bufferstodisk• …sincev2.0inOct2014

• Hivehas"storageindexes"(Impalanotyet)• Hive0.11withORCformat,ImpalapossiblywithnewKudustorage

• Hivesupportsrowlevelchanges &ACIDtransactions• …sincev0.14inNov2014• Hiveusesbase+deltatable approach(notforOLTP!)

• SparkSQL isthenewkidontheblock• Actually,Sparkisoldnewsalready->ApacheFlink

gluent.com 7

Scalabilityvs.Features(Hiveexample)$ hive (version 1.2.1 HDP 2.3.1)

hive> SELECT SUM(duration) > FROM call_detail_records> WHERE> type = 'INTERNET‘> OR phone_number IN ( SELECT phone_number> FROM customer_details> WHERE region = 'R04' );

FAILED: SemanticException [Error 10249]: Line 5:17 Unsupported SubQuery Expression 'phone_number': Only SubQuery expressions that are top levelconjuncts are allowed

WeOracleusershavebeenspoiled

withverysophisticatedSQLengine foryears:)

gluent.com 8

Scalabilityvs.Features(Impalaexample)$ impala-shellSELECT SUM(order_total) FROM ordersWHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers

WHERE customer_class = 'Prime');

Query: select SUM(order_total) FROM orders WHERE order_mode='online' OR customer_id IN (SELECT customer_id FROM customers WHERE customer_class = 'Prime')

ERROR: AnalysisException: Subqueries in OR predicates arenot supported: order_mode = 'online' OR customer_id IN (SELECT customer_id FROM soe.customers WHERE customer_class = 'Prime')

Cloudera:CDH5.3Impala2.1.0

gluent.com 9

Scalabilityvs.Features(Hiveexample2)$ hive (version 1.2.1 HDP 2.3.1)

hive> SELECT > *> FROM> table1> WHERE> ID IN (SELECT id FROM table2 WHERE id<100)> AND ID IN (SELECT id FROM table3 WHERE id<100)> AND ID <100;

FAILED: SemanticException [Error 10249]: Line 7:6 Unsupported SubQuery Expression 'ID': Only 1 SubQuery expression is supported.

gluent.com 10

Hadoopvs.OracleDatabase:Onewaytolookatit

Hadoop Oracle

Cheap,Scalable,Flexible

Sophisticated,Out-of-the-box,

Mature

BigDataPlumbing?

gluent.com 11

3majorbigdataplumbingscenarios

1. Load datafromHadooptoOracle

2. Offload datafromOracletoHadoop

3. Query HadoopdatainOracle

Abigdifferencebetweendatacopy andon-

demanddataquery tools

gluent.com 12

Righttoolfortherightproblem!Howtoknowwhat'sright?

gluent.com 13

UseCase:MoveDatato/fromHadoop:Sqoop

• AparallelJDBC<->HDFSdatacopy tool• Apache2.0licensed(likemostotherHadoopcomponents)• GeneratesMapReducejobsthatconnecttoOraclewithJDBC

OracleDB

MapReduceMapReduceMapReduceMapReduce JDBC

SQOOP

file

Hadoop/HDFSfilefile

file

file

Hivemetadata

gluent.com 14

Example:CopyDatatoHadoopwithSqoop

• Sqoopis"just"aJDBCDBclientcapableofwritingtoHDFS• Itisveryflexible

sqoop import --connect jdbc:oracle:thin:@oel6:1521/LIN112 --username system--null-string '' --null-non-string '' --target-dir=/user/impala/offload/tmp/ssh/sales--append -m1 --fetch-size=5000 --fields-terminated-by '','' --lines-terminated-by ''\\n'' --optionally-enclosed-by ''\"'' --escaped-by ''\"'' --split-by TIME_ID --query "\"SELECT * FROM SSH.SALES WHERE TIME_ID < TO_DATE('

1998-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS','NLS_CALENDAR=GREGORIAN') AND \$CONDITIONS\""

Justanexampleofsqoop syntax

gluent.com 15

Sqoopusecases

1. CopydatafromOracletoHadoop• Entireschemas,tables,partitionsoranySQLqueryresult

2. CopydatafromHadooptoOracle• ReadHDFSfiles,converttoJDBCarraysandinsert

3. Copychanged rows fromOracletoHadoop• SqoopsupportsaddingWHEREsyntaxtotablereads:• WHERE last_chg > TIMESTAMP'2015-01-10 12:34:56'

• WHERE ora_rowscn > 123456789

gluent.com 16

SqoopwithOracleperformanceoptimizations- 1

• IscalledOraOOP• WasaseparatemodulebyGuyHarrisonteamatQuest(Dell)• IncludedinstandardSqoopv1.4.5+ andSqoop2comingtoo!

• sqoop import direct=true …

• ParallelSqoopOraclereadsdonebyblockrangesorpartitions• Previouslyrequiredwideindexrangescans toparallelizeworkload

• ReadOracledatawithfulltablescans anddirectpathreads• MuchlessIO,CPU,buffercacheimpactonproductiondatabases

gluent.com 17

SqoopwithOracleperformanceoptimizations- 2

• DataloadingintoOracleviadirectpathinsert ormerge• Bypassbuffercacheandmostredo/undogeneration

• HCCcompressionon-the-flyforloadedtables• ALTER TABLE fact COMPRESS FOR QUERY HIGH

• Allfollowingdirectpathinsertswillbecompressed

• Sqoopisprobablythebesttoolfordatacopy/move usecase• Forbatchdatamovementneeds• IfusedwiththeOracleoptimizations

gluent.com 18

Whataboutreal-timechangereplication?

• OracleChangeDataCapture• Supportedin11.2– butnotrecommendedbyOracleanymore• Desupportedin12.1

• OracleGoldenGateorDBVisitReplicate

• Somemanualwork/scriptinginvolvedwithallthesetools:

1. SqoopthesourcetablesnapshotstoHadoop2. Replicatechangesto"delta"tablesonHDFSorinHBase3. Usea"merge"viewonHive/Impalatomergebasetablewithdelta

gluent.com 19

“Applying”updates/changesinHadoop

• HDFSfilesareappend-only bydesign(scalabilityreasons)• NorandomwritesorchangingofexistingHDFSfiledata

• Basetable+deltatableapproach1. Abasetable containsoriginal(batchoffloaded)data2. Adeltatable containsanynewversionsofdata3. Amergeviewperformsanouterjointoshowlatestdatafrombase

tableordeltatable(ifany)4. Optional“compaction”–mergingbase+delta togetherinbackground

• HiveACIDtransactionsusethisapproachunderthehood• Cloudera’sKUDU projectusessimilarconceptwithoutHDFS

• …butdeltascanbeinmemoryandstoredinflash(ifavailable)

gluent.com 20

References:IncrementaldataupdatesinHadoop

• Cloudera(Impala)• http://blog.cloudera.com/blog/2015/11/how-to-ingest-and-query-

fast-data-with-impala-without-kudu/

• Cloudera(Kudu)• https://blog.cloudera.com/blog/2015/09/kudu-new-apache-hadoop-

storage-for-fast-analytics-on-fast-data/

• Hortonworks (Hive)• http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-

2.3.0/bk_dataintegration/content/incrementally-updating-hive-table-with-sqoop-and-ext-table.html

gluent.com 21

UseCase:Query Hadoopdataasifitwasin(Oracle)RDBMS

Theultimategoal(formeatleast:)

• LowcostandscalabilityofHadoop…

• …togetherwiththesophisticationofOracleSQLengine

• Keepusingyourexisting(Oracle)applicationswithoutre-engineering&rewritingSQL

gluent.com 22

WhatdoSQLqueriesdo?

1. Retrievedata

2. Processdata

3. Returnresults

DiskreadsDecompressionFilteringColumnextraction&pruning

JoinParalleldistributeAggregateSort

FinalresultcolumnprojectionSenddatabackovernetwork…ordumptodisk

gluent.com 23

Howdoesthequeryoffloadingwork?(Ext.tab)

-----------------------------------------------------------| Id | Operation | Name |-----------------------------------------------------------

| 0 | SELECT STATEMENT | || 1 | SORT ORDER BY | ||* 2 | FILTER | || 3 | HASH GROUP BY | ||* 4 | HASH JOIN | ||* 5 | HASH JOIN | ||* 6 | EXTERNAL TABLE ACCESS FULL| CUSTOMERS_EXT |

|* 7 | EXTERNAL TABLE ACCESS FULL| ORDERS_EXT || 8 | EXTERNAL TABLE ACCESS FULL | ORDER_ITEMS_EXT |-----------------------------------------------------------

Dataretrievalnodes

Dataprocessingnodes

• Dataretrieval nodesdowhattheyneedtoproducerows/cols• ReadOracledatafiles,readexternaltables,callODBC,readmemory

• Intherestoftheexecutionplan,everythingworksasusual

gluent.com 24

Howdoesthequeryoffloadingwork?

Dataretrievalnodesproducerows/columns

Dataprocessingnodesconsumerows/columns

Dataprocessing nodesdon'tcarehowandwheretherowswerereadfrom,thedataflowformatisalwaysthesame.

gluent.com 25

Howdoesthequeryoffloadingwork?(ODBC)

--------------------------------------------------------| Id | Operation | Name | Inst |IN-OUT|

--------------------------------------------------------| 0 | SELECT STATEMENT | | | || 1 | SORT ORDER BY | | | |

|* 2 | FILTER | | | || 3 | HASH GROUP BY | | | ||* 4 | HASH JOIN | | | ||* 5 | HASH JOIN | | | |

| 6 | REMOTE | orders | IMPALA | R->S || 7 | REMOTE | customers | IMPALA | R->S || 8 | REMOTE | order_items | IMPALA | R->S |

--------------------------------------------------------Remote SQL Information (identified by operation id):----------------------------------------------------

6 - SELECT òrder_id`,òrder_mode`,`customer_id`,òrder_status` FROM `soe`.òrders` WHERE òrder_mode`='online' AND òrder_status`=5 (accessing 'IMPALA.LOCALDOMAIN' )

7 – SELECT `customer_id`,`cust_first_name`,`cust_last_name`,`nls_territory`,

`credit_limit` FROM `soe`.`customers` WHERE `nls_territory` LIKE 'New%' (accessing 'IMPALA.LOCALDOMAIN' )

8 - SELECT òrder_id`,ùnit_price`,`quantity` FROM `soe`.òrder_items` (accessing'IMPALA.LOCALDOMAIN' )

Dataretrievalnodes

RemoteHadoopSQL

Dataprocessingbusinessasusual

gluent.com 26

EvaluateHadoop->OracleSQLprocessing

• Whoreadsdisk?

• Whoandwhenfilterstherowsandprunescolumns?

• Whodecompressesdata?

• Whoconvertstheresultset toOracleinternaldatatypes?

• WhopaystheCPUprice?!

gluent.com 27

Sqoopingdataaround

HDFS

Hadoop Oracle

OracleDB

Sqoop

Map-reduceJobs

IO

JDBCScan,join,aggregate

Datatypeconversion

Loadtable

ReadIOWriteIO+Filter

gluent.com 28

OracleSQLConnectorforHDFS

HDFS

Hadoop Oracle

OracleDB

Joins,processing

etc

libhdfs

libjvm

JavaHDFSclient

ExternalTable

Filter+Datatypeconversion

Nofilterpushdown intoHadoop.

ParseTextfileorreadpre-convertedDataPumpfile

gluent.com 29

QueryusingDBlinks&ODBCGateway

HDFS

Hadoop Oracle

OracleDB

Impala/Hive

Decompress,Filter,Project

IO+filter

Thriftprotocol

ODBCGateway Joins,

processingetc

Datatypeconversion

ODBCdriver

gluent.com 30

BigDataSQL

OracleBDAHDFS

Hadoop Oracle

OracleExadataDB

OracleStorageCellDecompress,Filter,Project

IO+filter

iDB

HiveExternalTable

Joins,processing

etcDatatypeconversion

rows

gluent.com 31

Oracle<->Hadoopdataplumbingtools- capabilities

Tool LOADDATA

OFFLOADDATA

ALLOWQUERYDATA

OFFLOADQUERY

PARALLELEXECUTION

Sqoop Yes Yes Yes

OracleLoaderforHadoop

Yes Yes


Yes Yes Yes

ODBCGateway

Yes Yes Yes

Big DataSQL

Yes Yes Yes Yes

GluentOffloadEngine

Yes Yes Yes Yes Yes

gluent.com 32

Oracle<->Hadoopdataplumbingtools- overhead

Tool Load datatouse

DecompressCPU

FilteringCPU DatatypeConversion

Sqoop Oracle Hadoop Oracle Oracle

OracleLoaderforHadoop

Oracle Hadoop Oracle Hadoop


Oracle Oracle /Hadoop*

ODBCGateway

Hadoop Hadoop Oracle

Big DataSQL

Hadoop Hadoop Hadoop

GluentOffloadEngine

Hadoop Hadoop Hadoop

ParsetextfilesonHDFSorreadpre-convertedDataPumpbinary

Oracle12c+BDA+Exadataonly

OKforoccasionalarchivequery

Oracle11g,12c,noplatformrestrictions

gluent.com 33

ExampleUseCase

DataWarehouseOffload

gluent.com 34

DimensionalDWdesign

Hotnessofdata

DWFACTTABLESinTeraBytes

HASH(customer_id)

RANG

E(order_d

ate)

OldFactdatararelyupdated

Facttablestime-partitioned

DDDD

D

D

DIMENSIONTABLESinGigaBytes

Months toyearsofhistory

Aftermultiple joinsondimension tables–afullscanisdoneon

thefacttable

Somefilterpredicatesdirectlyonthefacttable(timerange),countrycodeetc)

gluent.com 35

DWDataOffloadtoHadoop

CheapScalableStorage(e.g HDFS+Hive/Impala)

HotDataDim.Tables

HadoopNode

HadoopNode

HadoopNode

HadoopNode

HadoopNode

ExpensiveStorage

ExpensiveStorage

Time-partitionedfacttable HotDataColdDataDim.Tables

gluent.com 36

1.CopyDatatoHadoopwithSqoop

• Sqoopis"just"aJDBCDBclientcapableofwritingtoHDFS• Itisveryflexible

sqoop import --connect jdbc:oracle:thin:@oel6:1521/LIN112 --username system--null-string '' --null-non-string '' --target-dir=/user/impala/offload/tmp/ssh/sales--append -m1 --fetch-size=5000 --fields-terminated-by '','' --lines-terminated-by ''\\n'' --optionally-enclosed-by ''\"'' --escaped-by ''\"'' --split-by TIME_ID --query "\"SELECT * FROM SSH.SALES WHERE TIME_ID < TO_DATE('

1998-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS','NLS_CALENDAR=GREGORIAN') AND \$CONDITIONS\""

Justanexampleofsqoop syntax

gluent.com 37

2.LoadDataintoHive/Impalainread-optimizedformat

• Theadditionalexternaltablestepgivesbetterflexibilitywhenloadingintothetargetread-optimizedtable

CREATE TABLE IF NOT EXISTS SSH_tmp.SALES_ext (PROD_ID bigint, CUST_ID bigint, TIME_ID timestamp, CHANNEL_ID bigint, PROMO_ID bigint, QUANTITY_SOLD bigint, AMOUNT_SOLD bigint

)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '"' LINES TERMINATED BY '\n'

STORED AS TEXTFILE LOCATION '/user/impala/offload/tmp/ssh/sales'

INSERT INTO SSH.SALES PARTITION (year) SELECT t.*, CAST(YEAR(time_id) AS SMALLINT) FROM SSH_tmp.SALES_ext t

gluent.com 38

3a.QueryDataviaaDBLink/HS/ODBC

• (I'mnotcoveringtheODBCdriverinstallandconfig here)

SQL> EXPLAIN PLAN FOR SELECT COUNT(*) FROM ssh.sales@impala;Explained.

SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);--------------------------------------------------------------| Id | Operation | Name | Cost (%CPU)| Inst |IN-OUT|--------------------------------------------------------------| 0 | SELECT STATEMENT | | 0 (0)| | || 1 | REMOTE | | | IMPALA | R->S |--------------------------------------------------------------

Remote SQL Information (identified by operation id):----------------------------------------------------

1 - SELECT COUNT(*) FROM `SSH`.`SALES` A1 (accessing 'IMPALA’)

TheentirequerygetssenttoImpalathankstoOracleHeterogenous Services

gluent.com 39

3b.QueryDataviaOracleSQLConnectorforHDFSCREATE TABLE SSH.SALES_DP( "PROD_ID" NUMBER,

"CUST_ID" NUMBER,"TIME_ID" DATE,"CHANNEL_ID" NUMBER,

"PROMO_ID" NUMBER,"QUANTITY_SOLD" NUMBER,"AMOUNT_SOLD" NUMBER

)

ORGANIZATION EXTERNAL( TYPE ORACLE_LOADER

DEFAULT DIRECTORY "OFFLOAD_DIR"

ACCESS PARAMETERS( external variable data

PREPROCESSOR "OSCH_BIN_PATH":'hdfs_stream')

LOCATION( 'osch-tanel-00000', 'osch-tanel-00001', 'osch-tanel-00002', 'osch-tanel-00003',

'osch-tanel-00004', 'osch-tanel-00005' )

)REJECT LIMIT UNLIMITED

NOPARALLEL

Theexternalvariabledataoptionsaysit'saDataPumpformatinputstream

hdfs_stream istheJavaHDFSclientappinstalledintoyourOracleserveraspartoftheOracleSQLConnectorforHadoop

Thelocationfilesaresmall.xmlconfig filescreatedbyOracleLoaderforHadoop(tellingwhereinHDFSourfilesare)

gluent.com 40

3b.QueryDataviaOracleSQLConnectorforHDFS

• ExternalTableaccessisparallelizable!• MultipleHDFSlocationfilesmustbecreatedforPX(donebyloader)

SQL> SELECT /*+ PARALLEL(4) */ COUNT(*) FROM ssh.sales_dp;

----------------------------------------------------------------------------------| Id | Operation | Name | TQ |IN-OUT| PQ Distrib |----------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | || 1 | SORT AGGREGATE | | | | || 2 | PX COORDINATOR | | | | || 3 | PX SEND QC (RANDOM) | :TQ10000 | Q1,00 | P->S | QC (RAND) |

| 4 | SORT AGGREGATE | | Q1,00 | PCWP | || 5 | PX BLOCK ITERATOR | | Q1,00 | PCWC | || 6 | EXTERNAL TABLE ACCESS FULL| SALES_DP | Q1,00 | PCWP | |

----------------------------------------------------------------------------------

OracleParallelSlavescanaccessadifferentDataPumpfilesonHDFS

Anyfilterpredicatesarenotoffloaded.Alldata/columnsareread!

gluent.com 41

3c.QueryDataviaBigDataSQL

SQL> CREATE TABLE movielog_plus2 (click VARCHAR2(40))3 ORGANIZATION EXTERNAL

4 (TYPE ORACLE_HDFS5 DEFAULT DIRECTORY DEFAULT_DIR6 ACCESS PARAMETERS (7 com.oracle.bigdata.cluster=bigdatalite8 com.oracle.bigdata.overflow={"action":"truncate"}9 )

10 LOCATION ('/user/oracle/moviework/applog_json/')

11 )12 REJECT LIMIT UNLIMITED;

• Thisisjustonesimpleexample:

ORACLE_HIVEwoulduseHivemetadatatofigureoutdatalocationandstructure(butstoragecellsdothediskreadingfromHDFSdirectly)

gluent.com 42

4.QueryDataviaGluentSmartConnector

• Iwillstaycivilizedandwillnotturnthisintoasalespresentation…

• …butcheckouthttp://gluent.comformoreinfo;-)

gluent.com 43

Howtoquerythefulldataset?

• UNIONALL• LatestpartitionsinOracle:(SALEStable)• OffloadeddatainHadoop:(SALES@impala orSALES_DP ext tab)

SELECT PROD_ID, CUST_ID, TIME_ID, CHANNEL_ID, PROMO_ID, QUANTITY_SOLD, AMOUNT_SOLD

FROM SSH.SALESWHERE TIME_ID >= TO_DATE(' 1998-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS')

UNION ALL

SELECT "prod_id", "cust_id", "time_id", "channel_id", "promo_id", "quantity_sold", "amount_sold"

FROM SSH.SALES@impalaWHERE "time_id" < TO_DATE(' 1998-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS')

CREATE VIEW app_reporting_user.SALES AS

gluent.com 44

Hybridtablewithselectiveoffloading

CheapScalableStorage(e.g HDFS+Hive/Impala)

HotDataDim.Tables

HadoopNode

HadoopNode

HadoopNode

HadoopNode

HadoopNode

ExpensiveStorage

TABLEACCESSFULLEXTERNALTABLEACCESS/DBLINK

SELECT c.cust_gender, SUM(s.amount_sold)

FROM ssh.customers c, sales_v s

WHERE c.cust_id = s.cust_idGROUP BY c.cust_gender

SALESunion-allviewHASHJOIN

GROUPBY

SELECTSTATEMENT

gluent.com 45

PartiallyOffloadedExecutionPlan--------------------------------------------------------------------------| Id | Operation | Name |Pstart| Pstop | Inst |--------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | | | || 1 | HASH GROUP BY | | | | ||* 2 | HASH JOIN | | | | || 3 | PARTITION RANGE ALL | | 1 | 16 | || 4 | TABLE ACCESS FULL | CUSTOMERS | 1 | 16 | || 5 | VIEW | SALES_V | | | || 6 | UNION-ALL | | | | |

| 7 | PARTITION RANGE ITERATOR| | 7 | 68 | || 8 | TABLE ACCESS FULL | SALES | 7 | 68 | || 9 | REMOTE | SALES | | | IMPALA |--------------------------------------------------------------------------

2 - access("C"."CUST_ID"="S"."CUST_ID")

Remote SQL Information (identified by operation id):----------------------------------------------------

9 - SELECT `cust_id`,`time_id`,`amount_sold` FROM `SSH`.`SALES` WHERE `time_id`<'1998-07-01 00:00:00' (accessing 'IMPALA' )

gluent.com 46

Performance

• VendorX:"WeloadNTerabytesperhour"

• VendorY:"WeloadNBillionrowsperhour"

• Q:Whatdatatypes?• SomuchcheapertoloadfixedCHARs• …butthat'snotwhatyourrealdatamodellookslike.

• Q:Howwiderows?• 1widevarchar column or500numericcolumns?

• DatatypeconversionisCPUhungry!

Makesureyouknowwhereinstackyou'llbepayingtheprice

Andisitonceoralwayswhenaccessed?

gluent.com 47

Performance:DataRetrievalLatency

• Cloudera Impala• Lowlatency(subsecond)

• Hive• UselatestHive0.14+withallthebells'n'whistles (Stinger,TEZ+YARN)• Otherwiseyou'llwaitforjobsandJVMstostartupforeveryquery• ThenextHortonworks HDPrelease(2.4?)hasHiveLLAPdaemons• WithLLAP,JVMcontainerswillnotbestartedforsimplescans/filters

• OracleSQLConnectorforHDFS• Multi-secondlatencyduetohdfs_streamJVMstartup

• OracleBigDataSQL• Lowlatencythanksto"ExadatastoragecellsoftwareonHadoop"

Timetofirstrow/byte

gluent.com 48

BigDataSQLPerformanceConsiderations

• Onlydataretrieval (theTABLEACCESSFULLetc)isoffloaded!• Filterpushdownetc

• Alldataprocessing stillhappensintheDBlayer• GROUPBY,JOIN,ORDERBY,AnalyticFunctions,PL/SQLetc

• JustlikeonExadata…• Storagecellsonlyspeedupdataretrieval

gluent.com 49

Thanks!

Wearehiringdevelopers&dataengineers!!!http://gluent.com

Also,theusual:

http://[email protected]

Tanel Poder Connecting Hadoop and Oracle · gluent.com 1 Connecting Hadoop and Oracle Tanel Poder a...

Documents

Transcript of Tanel Poder Connecting Hadoop and Oracle · gluent.com 1 Connecting Hadoop and Oracle Tanel Poder a...