Big data Hadoop an Introduction Online Training @ Training Icon
-
Upload
training-icon -
Category
Education
-
view
336 -
download
4
description
Transcript of Big data Hadoop an Introduction Online Training @ Training Icon
HADOOP ONLINE TRAINING @ TRAINING ICON
Online Training | Corporate Training CONTACT US:TRAINING ICONINDIA +91-9666900051USA : [email protected]
Why and What Hadoop ?
A tool to process big data
What is BIG Data ?
Facebook, Google+ etc.,
Machines too generate lots of data
We are having a online discussion now , certainly how many of us are in this conference will also be recorded as data.
What is BIG Data ? ..continued
Exponential growth of data challenges to Google, Yahoo, Microsoft, Amazon
Need to go through TBs and PBs of data ?
Which websites and books were popular ? What kind of Ads appeal to them ?
Existing tools became inadequate to process such large data sets.
Why is the data so BIG ?
Till Couple of decade back Floppy disks
From then on CD/DVD Drives
Half a decade back Hard drives (500 GB)
Now Hard Drives(I TB) are available in abundance
Why is the data so BIG ?
So WHAT ?
Even the technology to read has taken a leap.
Why is the data so BIG ?
Year Device VolumeData
Transfer speed
Time to process
1990 Optical Drive 1370 MB 4.4 MB/s 5 minutes
2012 1 TB SATA Drives 1 TB 100 MB/s 2.5 Hrs
How to handle such BIG ?
BIG elephant Numerous small chicken ?
How to handle such BIG ?Concept of Torrents
Reduce time to read by reading it from multiple sources simultaneously.
Imagine if we had 100 drives, each holding one hundredth of the data. Working in parallel, we could read the data in less than two minutes.
How to handle such BIG ? -- Issues
How to handle a system up and downs ?
How to combine the data from all the systems ?
Problem1 : System’s Ups and Downs Commodity hard ware for data storage and analysis
Chances of failure are very high
So, have a redundant copy of the same data across some machines
In case of eventuality of one machine, you have the other
Google came up with a file system GFS (Google File System) which implemented all these details.
Problem 2 : How to combine the data ?
Analyze data across different machines , But how do we merge them to get a meaningful outcome ?
Yes, all (some) of the data has to travel across network. Then only merging of the data can occur.
Doing this is notoriously challenging
Again Google Map—Reduce
Map ReduceProvides a programming model abstracts the problem of
disk reads and writes transforming in to a computation of keys and values.
Two phases
Map
Reduce
So what is Hadoop ? An operating system ?
Provides
1. A reliable shared storage system
2. Analysis system
History of Hadoop
Google was the first to launch GFS and MapReduce
They published a paper in 2004 announcing the world a brand new technology
This technology was well proven in Google by 2004 itself
MapReduce paper by Google
History of Hadoop
Doug Cutting saw an opportunity and led the charge to develop an open source version of this MapReduce system called Hadoop .
Soon after, Yahoo and others rallied around to support this effort.
Now Hadoop is core part in : Facebook, Yahoo, LinkedIn, Twitter …
History of Hadoop
GFS HDFS
MapReduce MapReduce
HDFS -- A BriefDesign Streaming very large files on commodity cluster.
1. Very Large FilesMBs to PBs
2. Streaming Write once read many approachAfter huge data being placed We tend to use the data not modify itTime to read the whole data is more important
3. Commodity ClusterNo High end ServersYes, high chance of failure (But HDFS is tolerant enoguh)Replication is done
MapReduce -- A BriefLarge scale data processing in parallel.
MapReduce provides:Automatic parallelization and distributionFault-toleranceI/O schedulingStatus and monitoring
Two phases in MapReduceMapReduce
MapReduce -- A Brief
Map phase map (in_key, in_value) -> list(out_key, intermediate_value) Processes input key/value pair Produces set of intermediate pairs
Reduce Phase reduce (out_key, list(intermediate_value)) -> list(out_value) Combines all intermediate values for a particular key Produces a set of merged output values (usually just one)
http://www.excelonlineclasses.co.nr/
MapReduce -- A Brief
Hadoop Cluster
Hadoop Ecosystems
Pre-Requisites
Core-Java
Acquaintance with LINUX will help.
For better experience :- Linux installation on your machines.
We offer you:
1. Interactive Learning at Learners convenience 2. Industry Savvy Trainers 3. “Real Time" Practical scenarios 4. Learn Right from Your Place 5. Customized Course Curriculum 6. 24/7 Server Access 7. Support after Training and Certification Guidance 8. Resume Preparation and Interview assistance 9. Recorded version of sessions
Thank you Your feedback is highly important to improve our course
material.
For Free Demo Please Contact Rathan, INDIA: +91-9666900051, US: +1 408-791-8864, Email id: [email protected] http://www.trainingicon.com
Disclaimer
Training Icon Online classes acknowledges the proprietary rights of the trademarks and product names of other companies mentioned in any of the training material including but not limited to the handouts, written material, videos, power point presentations, etc. All such training materials are provided to our students for learning purposes only. Students shall not use such materials for their private gain nor can they sell any such materials to a third party. Some of the examples provided in any such training materials may not be owned by us and as such we does not claim any proprietary rights for the same. We does not guarantee nor is it responsible for such products and projects. We acknowledges that any such information or product that has been lawfully received from any third party source is free from restriction and without any breach or violation of law whatsoever.