Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access...
Transcript of Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access...
![Page 1: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/1.jpg)
Wenming Ye
Sr. Research Program Manager
Microsoft Research Connections
Twitter: @wenmingye
![Page 2: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/2.jpg)
![Page 3: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/3.jpg)
![Page 4: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/4.jpg)
![Page 5: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/5.jpg)
![Page 6: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/6.jpg)
![Page 7: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/7.jpg)
![Page 8: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/8.jpg)
![Page 9: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/9.jpg)
![Page 10: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/10.jpg)
![Page 11: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/11.jpg)
http://www.windowsazure.com/
en-us/develop/nodejs/how-to-
guides/command-line-tools/
![Page 12: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/12.jpg)
![Page 13: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/13.jpg)
![Page 14: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/14.jpg)
![Page 15: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/15.jpg)
![Page 16: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/16.jpg)
![Page 17: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/17.jpg)
![Page 18: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/18.jpg)
![Page 19: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/19.jpg)
![Page 20: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/20.jpg)
Gallery Images Available
MicrosoftWindows Server 2008 R2
SQL Server Eval 2012
Windows Server 2012
Biztalk Server 2013 Beta
Open SourceOpenSUSE 12.2
CentOS 6.3
Ubuntu 12.04/12.10
SUSE Linux Enterprise Server 11 SP2
![Page 21: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/21.jpg)
![Page 22: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/22.jpg)
![Page 23: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/23.jpg)
![Page 24: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/24.jpg)
![Page 25: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/25.jpg)
VM with persistent drive
![Page 26: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/26.jpg)
VM with persistent drive
![Page 27: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/27.jpg)
VM with persistent drive
![Page 28: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/28.jpg)
![Page 29: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/29.jpg)
![Page 30: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/30.jpg)
Server Rack 1 Server Rack 2
![Page 31: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/31.jpg)
![Page 32: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/32.jpg)
![Page 33: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/33.jpg)
![Page 34: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/34.jpg)
![Page 35: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/35.jpg)
![Page 36: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/36.jpg)
![Page 37: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/37.jpg)
![Page 38: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/38.jpg)
![Page 39: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/39.jpg)
![Page 40: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/40.jpg)
![Page 41: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/41.jpg)
Blobs, Disks, Tables and Queues
8.5 trillion stored objects
900K request/sec on average (2.3+ trillion per month)
![Page 42: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/42.jpg)
![Page 43: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/43.jpg)
# Create containerfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob_service.create_container('taskcontainer')
# Uploadfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob_service.put_blob('taskcontainer', 'task1', file('task1-upload.txt').read(), 'BlockBlob')
#Downloadfrom azure.storage import BlobServiceblob_service = BlobService(account_name, account_key)blob = blob_service.get_blob('taskcontainer', 'task1')
![Page 44: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/44.jpg)
![Page 45: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/45.jpg)
Data centers
![Page 46: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/46.jpg)
Account
Container Blobs
Table Entities
Queue Messages
https://<account>.blob.core.windows.net/<container>
https://<account>.table.core.windows.net/<table>
https://<account>.queue.core.windows.net/<queue>
![Page 47: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/47.jpg)
![Page 48: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/48.jpg)
![Page 49: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/49.jpg)
![Page 50: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/50.jpg)
Design Goals
• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
![Page 51: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/51.jpg)
Storage Stamp
LB
Storage
Location
Service
Access blob storage via the URL: http://<account>.blob.core.windows.net/
Data access
Partition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Storage Stamp
LB
Partition Layer
Front-Ends
DFS Layer
Intra-stamp replication
Inter-stamp (Geo) replication
![Page 52: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/52.jpg)
Index
Partition Layer
![Page 53: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/53.jpg)
Partition Layer
![Page 54: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/54.jpg)
Partition Layer
![Page 55: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/55.jpg)
• Does not move data around, only reassigns what part of the index a partition server is responsible for
Partition Layer
Index
![Page 56: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/56.jpg)
Partition Layer
![Page 57: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/57.jpg)
• “Windows Azure Storage: A Highly Available Cloud Storage Service with Strong Consistency”, ACM Symposium on Operating System Principals (SOSP), Oct. 2011
![Page 58: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/58.jpg)
![Page 59: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/59.jpg)
![Page 60: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/60.jpg)
![Page 61: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/61.jpg)
and Queues (NEW)
Europe
West
North
Europe
Geo-replication
South
Central
US
North
Central
US
Geo-replication
East AsiaSouth
East Asia
Geo-replication
West US East US
Geo-replication
![Page 62: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/62.jpg)
East USWest US
Azure
DNShttp://account.blob.core.windows.net/
DNS lookup
Data access
Hostname IP Address
account.blob.core.windows.net West US
Failover
Update DNS
East US
Geo-replication
![Page 63: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/63.jpg)
![Page 64: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/64.jpg)
![Page 65: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/65.jpg)
Windows
Azure
Storage
![Page 66: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/66.jpg)
![Page 67: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/67.jpg)
![Page 68: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/68.jpg)
180
182
184
186
188
190
192
194
196
198
200
660000
665000
670000
675000
680000
685000
690000
695000
700000
Average of TransactionCount
Average of TPS
![Page 69: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/69.jpg)
0
50
100
150
200
250
300
350
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
6/2
4/2
013
6/2
4/2
013 0
:03
6/2
4/2
013 0
:06
6/2
4/2
013 0
:09
6/2
4/2
013 0
:12
6/2
4/2
013 0
:15
6/2
4/2
013 0
:18
6/2
4/2
013 0
:21
6/2
4/2
013 0
:24
6/2
4/2
013 0
:27
6/2
4/2
013 0
:30
6/2
4/2
013 0
:33
6/2
4/2
013 0
:36
6/2
4/2
013 0
:39
6/2
4/2
013 0
:42
6/2
4/2
013 0
:45
6/2
4/2
013 0
:48
6/2
4/2
013 0
:51
6/2
4/2
013 0
:54
6/2
4/2
013 0
:57
6/2
4/2
013 1
:00
Average of TransactionCount
Average of TPS
![Page 70: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/70.jpg)
J S O N
![Page 71: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/71.jpg)
http://www.nuget.org/packages/WindowsAzure.Storage
![Page 72: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/72.jpg)
![Page 73: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/73.jpg)
![Page 74: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/74.jpg)
![Page 75: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/75.jpg)
![Page 76: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/76.jpg)
![Page 77: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/77.jpg)
XL VM Uploading 512, 256MB Blobs (Total upload size = 128GB)
• C=1, P=1 => Averaged ~ 13. 2 MB/s
• C=1, P=30 => Averaged ~ 50.72 MB/s
• C=30, P=1 => Averaged ~ 96.64 MB/s
• Single TCP connection is bound by TCP
• rate control & RTT
• P=30 vs. C=30: Test completed almost
• twice as fast!
• Single Blob is bound by the limits of a
• single partition
• Accessing multiple blobs concurrently
• scales
P=1,
C=1
P=30, C
=1 P=1…
0
2000
4000
6000
8000
10000
Tim
e (
s)
![Page 78: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/78.jpg)
• XL VM Downloading 50, 256MB Blobs (Total download size = 12.5GB)
• C=1, P=1 => Averaged ~ 96 MB/s
• C=30, P=1 => Averaged ~ 130 MB/s
0
20
40
60
80
100
120
140
C=1, P=1 C=30, P=1Tim
e (
s)
![Page 79: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/79.jpg)
![Page 80: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/80.jpg)
![Page 81: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/81.jpg)
Internet of thingsAudio / Video
Log Files
Text/Image
Social Sentiment
Data Market Feeds
eGov Feeds
Weather
Wikis / Blogs
Click StreamSensors / RFID / Devices
Spatial & GPS Coordinates
WEB 2.0Mobile
Advertising CollaborationeCommerce
Digital Marketing
Search Marketing
Web Logs
Recommendations
ERP / CRM
Sales Pipeline
Payables
Payroll
Inventory
Contacts
Deal Tracking
Terabytes
(10E12)
Gigabytes
(10E9)
Exabytes
(10E18)
Petabytes
(10E15)
Velocity - Variety - variability
Vo
lum
e
1980
190,000$
2010
0.07$
1990
9,000$2000
15$Storage/GB
ERP / CRM WEB
2.0
Internet of things
![Page 82: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/82.jpg)
Big Data, BIG OPPORTUNITY
Big Data is a top priority for institutions
49% CEOs and CIOs are planning big data projects
Software Growth
1.82.5
3.44.6
0
5
2012 2013 2014 2015
Bil
lio
ns
$ 34% compound
annual growth
rate2
Services Growth
2.73.9
5.16.5
0
5
10
2012 2013 2014 2015
Bil
lio
ns
$ 39% compound
annual growth
rate2
1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012
2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast , 2012
![Page 83: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/83.jpg)
![Page 84: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/84.jpg)
![Page 85: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/85.jpg)
![Page 86: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/86.jpg)
How do I optimize my services
based on patterns of weather,
traffic. How do I build a
recommendation engine?
What’s the social sentiment
of my product?How do I better predict
future outcomes?
![Page 87: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/87.jpg)
![Page 88: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/88.jpg)
![Page 89: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/89.jpg)
![Page 90: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/90.jpg)
![Page 91: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/91.jpg)
![Page 92: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/92.jpg)
![Page 93: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/93.jpg)
Distributed Storage
(HDFS)
Query
(Hive)
Distributed Processing
(MapReduce)
OD
BC
Legend
Red = Core
Hadoop
Blue = Data
processing
Purple =
Microsoft
integration
points and
value adds
Orange = Data
Movement
Green =
Packages
![Page 94: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/94.jpg)
![Page 95: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/95.jpg)
Front
endFront
end
Stream
Layer
Partition
Layer
Name Node
de
Data Node Data Node
Front end
HDFS API
DFS (1 Data Node per Worker Role)
and Compute ClusterAzure Storage (ASV)
…
Azure Blob Storage
![Page 96: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/96.jpg)
![Page 97: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/97.jpg)
Hive, Pig, Mahout, Cascading, Scalding, Scoobi,
Pegasus…
C#, F# Map/Reduce, LINQ to Hive, .NET
management clients
JavaScript Map/Reduce, Browser hosted console,
Node.js management clients
PowerShell, Cross Platform CLI tools
![Page 98: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/98.jpg)
![Page 100: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/100.jpg)
Deploying and Interacting With HDInsight Service
demo
![Page 101: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/101.jpg)
![Page 102: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/102.jpg)
Batch Processing Interactive analysis Stream
processing
Query runtime Minutes to hours Milliseconds to minutes Never-ending
Data volume TBs to PBs GBs to PBs Continuous stream
Programming model MapReduce Queries DAG
Users Developers Analysts and developers Developers
Originating project Google MapReduce Google Dremel Twitter Storm
Open source project Hadoop / Spark Drill / Shark /Impala
Hbase / Cassandra
Storm / Apache S4 /Kafka
![Page 103: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/103.jpg)
![Page 104: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/104.jpg)
![Page 105: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/105.jpg)
![Page 106: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/106.jpg)
http://www.windowsazure.com/en-us/develop/net/
http://blogs.msdn.com/b/windowsazurestorage/
http://blogs.msdn.com/b/windowsazurestorage/archive/2011/11/20/windows-azure-storage-a-highly-available-cloud-storage-service-with-strong-consistency.aspx
![Page 107: Introduction to Big Data and Hadoop on Windows Azure data and cloud at Microsoft.pdf · Data access Hostname IP Address account.blob.core.windows.net West US Failover Update DNS East](https://reader033.fdocuments.us/reader033/viewer/2022051808/600aadb64ec0a5430962eed1/html5/thumbnails/107.jpg)
Windows Azure Python SDKWindows AzureHow to use Service Management from Pythonhttp://www.windowsazure.com/en-us/manage/linux/other-resources/command-line-tools/http://research.microsoft.com/en-us/projects/azure/