Brisk hadoop june2011_sfjava
-
Upload
srisatish-ambati -
Category
Technology
-
view
2.563 -
download
1
description
Transcript of Brisk hadoop june2011_sfjava
![Page 1: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/1.jpg)
Brisk: Truly peertopeer Hadoop
srisatish.ambati AT gmail.com Apache Cassandra/OpenJDK @srisatish
![Page 2: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/2.jpg)
Brisk: Hive + Hadoop + Cassandra
@srisatish
![Page 3: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/3.jpg)
Map Reduce
@srisatish
![Page 4: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/4.jpg)
Have large sets of data & you can work on small pieces in parallel.
@srisatish
![Page 5: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/5.jpg)
Map Reduce@srisatish
![Page 6: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/6.jpg)
Multicore map reduce framework, Kunle, et al
@srisatish
![Page 7: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/7.jpg)
Parallel Execution View @srisatish
![Page 8: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/8.jpg)
@srisatish
![Page 9: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/9.jpg)
@srisatish
![Page 10: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/10.jpg)
JobTrackerNameNode
HDFS
@srisatish
![Page 11: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/11.jpg)
Writeoncereadmany!File once created, written & closed need change
@srisatish
![Page 12: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/12.jpg)
Move computation, not data
@srisatish
![Page 13: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/13.jpg)
@srisatish
![Page 14: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/14.jpg)
DataNodes: Read, Write Blocks
@srisatish
![Page 15: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/15.jpg)
NameNode: Single Master nodeSingle Machine Address spaceSingle Point of failure
![Page 16: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/16.jpg)
Enter the Cassandra:High Scale
Peertopeer
@srisatish
When “it” does not fit in a single node!… Enter the distributed dragon!
![Page 17: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/17.jpg)
NameNode
DataNodes
![Page 18: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/18.jpg)
Onekindofnode!
![Page 19: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/19.jpg)
Cassandra:High Scale
Peertopeer
@srisatish
![Page 20: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/20.jpg)
Portfolio DemoLow latency
Live tick prices for stocks.Batch Analytics
Historical EOD prices.Value at Risk.
http://www.datastax.com/docs/0.8/brisk/brisk_demo
![Page 21: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/21.jpg)
http://ec250194143.compute1.amazonaws.com:8888/opscenter/index.htmlhttp://ec26720212176.compute1.amazonaws.com:50030/jobdetails.jsp?jobid=job_201105310219_0008&refresh=30http://ec250194143.compute1.amazonaws.com:8983/portfolio/
Demo URLs (good for this demo only)
![Page 22: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/22.jpg)
Bigtable, 2006Dynamo, 2007
OSS, 2008
Incubator, 2009 TLP, 2010
![Page 23: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/23.jpg)
A
LT
W
F
P
YKey “C”
U
Cassandra:High Scale
PeertopeerNo SPOF
@srisatish
![Page 24: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/24.jpg)
“dynamic” columnfamilies
zznate
driftx
thobbs
jbellis
driftx: thobbs:
driftx: thobbs:mdennis: zznate:
Following
zznate:
pcmanus: xedin:
![Page 25: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/25.jpg)
![Page 26: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/26.jpg)
![Page 27: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/27.jpg)
![Page 28: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/28.jpg)
Brisk
@srisatish
![Page 29: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/29.jpg)
BriskHowStuffWorks version
@srisatish
![Page 30: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/30.jpg)
YDH security edition (soon to be Apache)Apache Hive – Access via SQL likeCassandra 0.8CQL InterfaceApache Thrift
![Page 31: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/31.jpg)
Use ColumnFamiliesinodesblock
@srisatish
![Page 32: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/32.jpg)
String keyspace = “cfs”;
CfDef cf = new CfDef(); cf.setName(inodeDefaultCf); cf.setComparator_type("BytesType");…
cf.setName(sblockDefaultCf); cf.setKey_cache_size(1M); cf.setComment(
"Stores blocks of information associated with a inodeStores blocks of information associated with a inode");
cf.setKeyspace(keyspace);
@srisatish
![Page 33: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/33.jpg)
![Page 34: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/34.jpg)
Consistency: R + W > N
"brisk.consistencylevel.read", "QUORUM";"brisk.consistencylevel.write", "QUORUM";
@srisatish
![Page 35: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/35.jpg)
Hadoop: job tracker, task tracker
@srisatish
![Page 36: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/36.jpg)
BriskSnitch: brisk nodes, cassandra nodes
@srisatish
![Page 37: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/37.jpg)
BriskSimpleSnitch.java
if(TrackerInitializer.isTrackerNode) { myDC = BRISK_DC; logger.info("Detected Hadoop trackers are enabled, setting my DC to " + myDC); } else { myDC = CASSANDRA_DC;
logger.info("Looks like Vanilla Cassandra nodes, setting my DC to " + myDC); } @srisatish
![Page 38: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/38.jpg)
Hive: SQLlike accesscli, hwi, jdbc, metastorePushdown predicates (v beta2)
@srisatish
![Page 39: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/39.jpg)
hive> CREATE TABLE invites (foo INT, bar STRING)PARTITIONED BY (ds STRING);
hive> LOAD DATA LOCAL INPATH '$BRISK_HOME/resources/hive/examples/files/kv2.txt' OVERWRITE INTO TABLE invites PARTITION (ds='20080815');
hive> SELECT count(*), ds FROM invites GROUP BY ds;
http://www.datastax.com/docs/0.8/brisk/about_hive @srisatish
![Page 40: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/40.jpg)
ETLRealtime
Cassandra CFsDataCenters
Scale
@srisatish
![Page 41: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/41.jpg)
@srisatish
![Page 42: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/42.jpg)
No me in team!
● Ben Coverston
● Ben Werther
● Brandon Williams
● Cathy Daw
● Daria Hutchinson
● Eric Gilmore
● Jackson Chung
● Jake Luciani
● Joaquin Casares
● Jonathan Ellis
● Michael Allen
● Mike Bulman
● Nate McCall
● Nick M Bailey
● Patricio Echague
● Tyler Hobbs
● SriSatish Ambati
● Yewei Zhang
@srisatish
![Page 43: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/43.jpg)
@srisatish100node Brisk Cluster on Opscenter
![Page 44: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/44.jpg)
OSS, 2008
+
+ +
Brisk
Cassandra
Incubator 2009
Bigtable, 2006Dynamo, 2007
TLP, 2010
![Page 45: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/45.jpg)
Git started:git clone [email protected]:riptano/brisk.githttp://www.datastax.com/product/briskGetting Started via Brisk AMI.Thank You.
@srisatish
![Page 46: Brisk hadoop june2011_sfjava](https://reader034.fdocuments.us/reader034/viewer/2022051610/548532b1b47959d30c8b4de0/html5/thumbnails/46.jpg)
References● MapReduce: Simplified Data Processing on Large Clusters, 2004, Jeffrey Dean and
Sanjay Ghemawat, http://bit.ly/googmr_pdf
● Multicore MapReduce, Kunle, et al. http://bit.ly/iRJd1n
@srisatish