Apache bigtopwg7142013

download Apache bigtopwg7142013

If you can't read please download the document

Transcript of Apache bigtopwg7142013

  1. 1. Apache Bigtop Working Group 7/14/2013 Basic Skills Hadoop Pipelines (Roman's/Ron's Idea) Career positioning
  2. 2. Basic Skills Working group, you set your own goals. Structure: do a demo in front of the class. Focus on skills employers are looking for. Cluster skills using AWS; create instances, ec2-api, will have to extend this using scripts or your own code. Have to demo some skill Goal:Manage multiple instances. You can do this manually but the number of keystrokes goes up exponentially as you add new components. Need some automation or code. Bash scripts are good b/c they are used in Bigtop init.d files and Roman's code, e.g. copy the mkdir commands into script and run them.
  3. 3. Basic Skills Hadoop*, all the features of 2.0.0. No training course can give this to you. You will have to manually do this. Use 2.0.X unit test code as a base
  4. 4. Hadoop 2.0.0 Basic FS Review: Copy On Write Write Through/Write Back, FSCK Inodes/BTrees, NN/DN
  5. 5. Working Group Not a class which gives you answers. The answers classes give you are too simple to be valuable. E.g.; Does YARN/Hadoop 2.0.X support multitenancy? Multiple users/companies cant see each other's data and if they run a query, they can't crash the cluster for other users. This isn't the case now.
  6. 6. Hadoop 2.0.X Zookeeper in HDFS, requires some administration. Do you need to do a rollback of zookeeper logs when a zk cluster fails?
  7. 7. Bigtop Basic Skills Run Bigtop in AWS in distributed mode, start w/HDFS Create Hadoop* pipelines (Roman's/Ron's idea) Ron: book. Great idea!!!!! Run mvn verify/learn to debug and write tests hers Will take months, demo driven. People do demos.
  8. 8. Career positioning Choose where to spend time. Bigdata = Devops App development (Astyanax) Internals Don't get distracted into 3). Not enough time to do all well. Let Cloudera ppl help you. Do something new that people care about Don't try to be better than people w/the same job skill Learn efficiently, practice, practice, practice, Can't learn by watching
  9. 9. Big Company vs. Small Big: Interpolate Cloudera's strategy. Hadoop 2.0.X runs in the cloud, users access from Desktop via browser, can run Hive/Pig on YOUR data, if you need to ingest data like w/flume a sys admin has to set this up. e.g. Don't spend time getting flume to work in Hue. But make sure you know 2.0.x security models/LDAP, pipeline debugging when things get stuck, failover, application development HUE != Ambari. Why? Value to building apps in HUE or w/HUE. Approach for webapps changing away from HUE to something like Ambari which is a simpler user defined MVC pattern. User defined MVC better. Why? Think like a manager and what happens as Django adds more complicated features? e.g. Jetty/J2EE example
  10. 10. Small Do everything, use BT, get to working app as fast as possible. 1) and 2) very important. Have to do things quickly. You decide how to spend your own time
  11. 11. Structure Schedule 3x meetings after this every 2 weeks Individual demos Install Bigtop, demo WC, PI, demo components and pipelines. Turn pipeline demos into integration tests Test on pseudo distributed mode and cluster Listen to Roman: Hue....
  12. 12. HBase/Hadoop HBase requirements: R/S 48GB, 8-12 cores/node Memory: M/R 1-2GB+, R/S 32GB+, OS 4-8GB, HDFS Disk: 25% for shuffle files for HDFS,