Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts
description
Transcript of Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts
![Page 1: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/1.jpg)
Alan SmithActive Solution
@alansmithwww.cloudcasts.net
Handling Big Data in Windows Azure
Storage
![Page 2: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/2.jpg)
![Page 3: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/3.jpg)
![Page 4: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/4.jpg)
![Page 5: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/5.jpg)
![Page 6: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/6.jpg)
On-Premise
Replication
On-Premise
![Page 7: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/7.jpg)
![Page 8: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/8.jpg)
Time Data30 Days 1.6 TB10 Days 4.8 TB2 Days 24.4 TB
MSDN Universal - $150
![Page 9: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/9.jpg)
![Page 10: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/10.jpg)
Implementation Challenges
Number of Articles 4,356,508Number of Indexed Words 27,765,188Total number of Index Entries 1,003,489,254Total Text Content File Size 41.4 GB
![Page 11: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/11.jpg)
Text Search ImplementationWindows Azure StorageWindows Azure Websites
Table Storage – Text Index
Blob Storage – Pages
Azure Wiki Website
![Page 12: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/12.jpg)
Text Index Table DesignPartitionKey WordRowKey (10,000 – word count on page)_PageIdPageId Numeric page ID (Integer)PageTitle Title of Page (String)
• Query on PartitionKey (word)• Ordered by RowKey (word count on page)
![Page 13: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/13.jpg)
Text Index Table ExamplePartitionKey RowKey PageId PageTitle
azure 999604_33300527 33300527 Capetian Armorial
azure 999635_23352685 23352685 Morphological classification of Czech verbs
azure 999790_25148196 25148196 Armorial of the Communes of Seine-Maritime
azure 999901_00864847 864847 Azure (color)
azure 999913_19961416 19961416 Windows Azure
azure 999913_31687088 31687088 Ministry of Defence (Spain)
azure 999920_14011854 14011854 Coats of arms of the Holy Roman Empire
azure 999926_25312186 25312186 Armorial of the Communes of Eure
azure 999930_01317679 1317679 Lancia Aurelia
azure 999935_00717434 717434 Ordinary (heraldry)
azure 999935_04644383 4644383 Characters of The Order of the Stick
![Page 14: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/14.jpg)
Uploading Page Data
Upload Page Content to Blob Storage
27 XML Content Files(41.4 GB - 4,356,508 Pages)
Windows Azure Storage
Blob Storage(4,356,508 Blobs)
![Page 15: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/15.jpg)
Creating Text Index Data
Parse Page Text
27 XML Content Files(41.4 GB - 4,356,508 Pages)
Page IDs and Titles (124 MB)
Index Entries(19,277 Files - 9.83 GB)
![Page 16: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/16.jpg)
Index Data Files
typical#2356523,1|2356987,1|2357098,1|2357186,1|2357237,1|2357704,1|2357705,1history#2375229,1|2375230,1|2375232,1|2375279,1|2375293,3|2375300,1|2375314,2renowned#2338682,1|2338841,2|2339194,1|2339509,1|2339791,1|2340298,1|2340408,1line#2372733,1|2372749,2|2372774,2|2372784,2|2372790,1|2372796,1|2372813,1varies#2316134,1|2317202,1|2318782,1|2319263,1|2319437,1|2319766,1|2319969,1moore#2348931,2|2349076,2|2349268,1|2349746,8|2349903,1|2350368,2|2350437,1journal#2371460,2|2371490,1|2371518,2|2371524,1|2371565,3|2371591,6|2371609,2elderly#2300000,2|2300127,1|2301060,1|2301207,1|2301873,1|2302199,1|2302733,1bearing#2331971,1|2332125,1|2332422,1|2332610,1|2333094,1|2333854,1|2334189,1
• Contains 1,000 lines• Each line contains 100 entries for a word (1 transaction)
![Page 17: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/17.jpg)
Insert Index EntriesWindows Azure StorageWindows Azure Storage
Blob Storage
Queue
Table Storage
Windows Azure Services
Worker Roles
![Page 18: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/18.jpg)
Insert Index Entries
![Page 19: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/19.jpg)
Windows Azure
On-Premise
Windows Azure Storage
Tables Blobs Queues
http://azurespeedtest.azurewebsites.net/
![Page 20: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/20.jpg)
Windows Azure
On-Premise
Windows Azure Storage
Tables Blobs Queues
Windows Azure Virtual Machines
VMVM
http://azurespeedtest.azurewebsites.net/
![Page 21: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/21.jpg)
![Page 22: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/22.jpg)
![Page 23: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/23.jpg)
ServicePointManager.DefaultConnectionLimit = 100;ServicePointManager.UseNagleAlgorithm = false;ServicePointManager.Expect100Continue = false;
![Page 24: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/24.jpg)
Block Blob OperationsSingle HTTP request
for blob
Sequential HTTP requests for blocks
Parallel HTTP requests for blocks
Blob UploadBlock UploadBlock Commit
![Page 25: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/25.jpg)
Tuning Block Blob OperationsSingle HTTP request
for blob
Sequential HTTP requests for blocks
Parallel HTTP requests for blocks
SingleBlobUploadThresholdInBytes
ParallelOperationThreadCount
StreamWriteSizeInBytes
Blob UploadBlock UploadBlock Commit
![Page 26: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/26.jpg)
Tuning Blob OperationsProperty Default Range DescriptionSingleBlobUploadThresholdInBytes 32 MB 1-64 MB Maximum size of a blob in bytes that may be uploaded as a
single blob.
ParallelOperationThreadCount 1 1-64 Number of blocks that may be simultaneously uploaded
Property Default Range DescriptionStreamWriteSizeInBytes (Block) 4 MB 1-4 MB Block size for writing to a block blob.
StreamWriteSizeInBytes (Page) 512 bytes – 4 MB Number of bytes to buffer when writing to a page blob stream.
StreamMinimumReadSizeInBytes 1-4 MB Minimum number of bytes to buffer when reading from a blob stream.
CloudBlobClient
CloudBlockBlob
![Page 27: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/27.jpg)
Parallel and Asynchronous UploadsParallel Blobs
Blob Container
Files
Blob Blob Blob
Parallel Blocks
Blob Container
Files
Blob
Parallel Blobs & Blocks
Blob Container
Files
Blob Blob Blob
![Page 28: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/28.jpg)
![Page 29: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/29.jpg)
![Page 30: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/30.jpg)
Storage Monitoring Tables• $MetricsCapacityBlob• $MetricsTransactionsBlob• $MetricsTransactionsTable• $MetricsTransactionsQueue
![Page 31: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/31.jpg)
Handling Outages• 29th February 2012 – Major due to certificate error– MVP Summit 2012 - February 28th – March 2nd
• 22nd February 2013 – Storage outage due to certificate error– MVP Summit 2013 – February 18th – 22nd
• MVP Summit November 2013 – November 18th – 21st
– Correlation does not mean causation!
![Page 32: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/32.jpg)
• Consider processing “In the Cloud”• Modify ServicePointManager Settings• Use Parallel and Asynchronous Actions• Tune CloudBlobClient and CloudBlockBlob properties• Fiddler is Your Friend (Especially the Timeline)• Use the Source (Windows Azure SDK on GitHub)• Understand Storage Emulator Limitations• Understand transient faults• Understand Pricing Implications• Leverage Storage Analytics
![Page 33: Alan Smith Active Solution c loudcasts@gmail @alansmith cloudcasts](https://reader036.fdocuments.us/reader036/viewer/2022062410/568163b3550346895dd4cee7/html5/thumbnails/33.jpg)
Thanks!http://wikisearch.azurewebsites.net/