Introduction to Big Data Hadoop
-
Upload
datatorrent -
Category
Technology
-
view
63 -
download
2
Transcript of Introduction to Big Data Hadoop
![Page 1: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/1.jpg)
Dr. Sandeep G. Deshmukh
Introduction to
![Page 2: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/2.jpg)
Contents
❑ Big Data ❑ Distributed Systems❑ Hadoop
➢ Hadoop Distributed File System (HDFS)➢ MapReduce
2
![Page 3: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/3.jpg)
Show of Hands
![Page 4: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/4.jpg)
Introduction to Big Data
![Page 5: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/5.jpg)
Big data is data that exceeds the processing capacity of
conventional database systems.
The data is too big, moves too fast, or doesn’t fit the strictures of
your database architectures.
To gain value from this data, you must choose an alternative way
to process it.
https://www.oreilly.com/ideas/what-is-big-data
Definition
![Page 6: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/6.jpg)
Quantity of data
Data sets too large to store and analyze using traditional databases
Volume
![Page 7: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/7.jpg)
Velocity
Speed at which data is generated
Speed at which data is moving around and analyzed
Analyze data while it is being generated without even putting it into databases
![Page 8: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/8.jpg)
Variety
Different types of data that we can use
![Page 9: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/9.jpg)
Veracity
Messiness or trustworthiness of the data
Volume makes up for quality
Eg. Tweets with spelling mistakes, short words ( u -> you, thr-> there)
![Page 10: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/10.jpg)
Value
Getting value out of Big Data!!!
![Page 11: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/11.jpg)
Definition
“Big data” is
high-volume, -velocity and -variety information assets
that demand cost-effective, innovative forms of information processing
for enhanced insight and decision making
By Gartner
![Page 12: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/12.jpg)
Definition
Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate
Challenges include analysis, capture, data curation, search,sharing, storage, transfer, visualization, querying, updating and information privacy.
The term often refers simply to the use of predictive analytics or certain other advanced methods to extract value from data, and seldom to a particular size of data set.
Accuracy in big data may lead to more confident decision making, and better decisions can result in greater operational efficiency, cost reduction and reduced risk.
Wikipedia
![Page 13: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/13.jpg)
Use Case: Big Data in Oil & Gas Drilling
http://analytics-magazine.org/images/stories/novdec12/big-data.jpg
![Page 14: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/14.jpg)
Use Case: Uber - Pay Surge Pricing if Battery is Low
![Page 15: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/15.jpg)
● A Brief History of Big Data Everyone Should Read
● Beyond Volume, Variety and Velocity is the Issue of Big Data Veracity
● What is big data? - OpenSource.com
● What is big data? - O’Reilly
● 5 Big Data Use Cases To Watch
● Best Big Data Analytics Use Cases
● The 5 game changing big data use cases
● Big Data - The 5 Vs Everyone Must Know
● Top SlideShare Presentations on Big Data
● Google Data Center 360° Tour
Further Reading
![Page 16: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/16.jpg)
Distributed Systems
![Page 17: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/17.jpg)
A distributed system is a collection of independent computers that appears to its users as a single coherent system.
Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006
http://www.mypearsonstore.com/bookstore/distributed-systems-principles-and-paradigms-9780132392273?xid=PSED
Definition
![Page 18: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/18.jpg)
Distributed Systems: Principles and Paradigms, 2nd Edition, Andrew S. Tanenbaum, Maarten Van Steen, 2006
![Page 19: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/19.jpg)
Transparency Description
Access Hide differences in data representation and how a resource is accessed
Location Hide where a resource is located
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users
Failure Hide the failure and recovery of a resource
Forms of Transparency in Distributed Systems
![Page 20: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/20.jpg)
● A distributed system consists of components (i.e., computers) that are autonomous
● Users (be they people or programs) think they are dealing with a single system. This means that one way or the other the autonomous components need to collaborate. How to establish this collaboration lies at the heart of developing distributed systems.
![Page 21: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/21.jpg)
A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages.
The components interact with each other in order to achieve a common goal.
Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.
Wikipedia
https://www.oreilly.com/ideas/what-is-big-data
Definition
![Page 22: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/22.jpg)
● Distributed Computing - Wikipedia
● Distributed computing
● Characteristics of distributed system
Further Reading
![Page 23: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/23.jpg)
Miscellaneous Concepts
![Page 24: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/24.jpg)
Big Data Primers: Size does matter
![Page 25: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/25.jpg)
Big Data Primers: Vertical Vs Horizontal Scaling
Vertical Scaling Horizontal Scaling
![Page 26: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/26.jpg)
Big Data Primers: The scale of infrastructure
![Page 27: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/27.jpg)
Resources
27
• Apache Apex - http://apex.apache.org/• Subscribe - http://apex.apache.org/community.html• Download - https://www.datatorrent.com/download/• Twitter
ᵒ @ApacheApex; Follow - https://twitter.com/apacheapexᵒ @DataTorrent; Follow – https://twitter.com/datatorrent
• Meetups - http://www.meetup.com/topics/apache-apex• Webinars - https://www.datatorrent.com/webinars/• Videos - https://www.youtube.com/user/DataTorrent• Slides - http://www.slideshare.net/DataTorrent/presentations • Startup Accelerator Program - Full featured enterprise product
ᵒ https://www.datatorrent.com/product/startup-accelerator/
![Page 28: Introduction to Big Data Hadoop](https://reader035.fdocuments.us/reader035/viewer/2022081505/58efe9ff1a28ab6a0d8b45d7/html5/thumbnails/28.jpg)
We Are Hiring
28
• [email protected]• Developers/Architects• QA Automation Developers• Information Developers• Build and Release• Community Leaders