4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var...

28
LINQ to HPC: Developing “Big Data” Applications on Windows HPC Server WSV205 Saptak Sen Senior Product Manager Microsoft Technical Computing

Transcript of 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var...

Page 1: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

LINQ to HPC: Developing “Big Data” Applications on Windows HPC ServerWSV205

Saptak SenSenior Product ManagerMicrosoft Technical Computing

Page 2: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Session Objectives and TakeawaysSession Objective(s):

Understand Microsoft solution for Big DataHow to use and develop LINQ to HPC applicationsDemo of LINQ to HPC/DSC on HPC, Microsoft’s solutions for unstructured Big Data

Key Takeaways:1. LINQ to HPC/DSC provide a highly productive stack for writing

big data applications.2. Demos

1. Data management with DSC2. Application development in LINQ to HPC3. Application management of LINQ to HPC applications

Page 3: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")
Page 4: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Characteristics of Big Data

Large Data Volume 100s of TBs to 10s of PBs

Large scale processing and analytics at unprecedented low cost (hardware and software)

New Economics

Distributed Parallel Processing Frameworks

Easy to Scale on commodity hardware MapReduce-style programming models

New Technologies

Unstructured Weak relational schema Text, Images, Videos, Logs

Non-Traditional data Types

Sensors Devices Traditional applications Web Servers Public data

New Data Sources

How popular is my product? What is the best ad to serve? Is this a fraudulent transaction?

New Questions & New Insights

4

Page 5: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Example: Traditional e-commerce data flow

5

Page 6: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

New exploratory e-commerce data flow

6

Page 7: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Introduction to LINQ to HPC

Developing Big Data applications for HPC Server

Page 8: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Example: find web pages from many log files

var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line);var user = from access in logentries where access.user.EndsWith(@"\sen") select access;var accesses = from access in user group access by access.page into pages select new UserPageCount(“sen", pages.Key, pages.Count());var htmAccesses = from access in accesses where access.page.EndsWith(".htm") orderby access.count descending select access;

LINQ query transformed into computation graph

Input

Compute

Compute and resort

Compute and resort

Output

2

1

3

4 5

Page 9: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

LINQ to HPC Job Directed Acyclic Graph (DAG) of vertices

Processingvertices

Edges(files)

Inputs

Outputs

Page 10: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Executes DAGs by mapping vertices to Distributed Vertex Hosts

Processingvertices

Edges(files)

Inputs

Outputs

Free Compute Resources

Page 11: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

HPC + LINQ to HPC Job Overview

Application that calls LINQ to

HPC APIs

HPC Head Node

DSC

Submit LINQ to HPC Job

1

1

The LINQ to HPC job also starts a set of parametric

sweep tasks across the rest of the nodes as DVH

2b

A LINQ to HPC job starts 1 basic task

assigning a node as the DGM

2a

2a

LINQ to HPC Vertices read and write files

3b

Graph Manager starts/stops Vertices

3a

HPC Compute Nodes

3a

3b

2b

Graph Manager

Vertex Host

Page 12: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

HPC + LINQ to HPC Job Overview

Vertices read and write files

3b

Graph Manager starts/stops Dryad Vertices

3a

HPC Compute Nodes

3a

3b

Graph Manager

Vertex Host

Vertices in logical computation graph

• Graph manager starts vertices on Vertex Hosts

• Preferentially schedules vertices near input files

When input is already on cluster, can make local IO the common case

Page 13: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

More on HPC + LINQ to HPC mechanics

Application that calls LINQ to

HPC APIs

HPC Head Node

DSC

Publish to share:1. binaries for LINQ to HPC job2. XML description of LINQ to

HPC graph

1

1

DVH loads binaries for this LINQ to HPC job from share, executes them according

to commands from DGM

DGM reads XML description of graph from share, calls DSC to locate files referenced in

XML

2a

3b

3a

HPC Compute Nodes

3a

3b

2b

LINQ to HPC Graph Manager

LINQ to HPC Vertex Host

The LINQ to HPC job also starts a set of parametric

sweep tasks across the rest of the nodes as DVH

2b

A LINQ to HPC job starts 1 basic task

assigning a node as the DGM

2a

Page 14: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Deployment Steps

DSC NODE ADD sen-cn1 /TEMPPATH:c:\Dryad\HpcTemp /DATAPATH:c:\Dryad\HpcData /SERVICE:sen-hn

Page 15: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Demo adding a new Node

demo Using the HPC Management Tool

Page 16: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

LINQ to HPC Object Model

Page 17: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Hello World!

using System;using System.Linq;using Microsoft.Hpc.Linq; namespace MyProgram { class Program { static void Main(string[] args) { var config = new HpcLinqConfiguration(“MyHpcClusterHeadNode”); var context = new HpcLinqContext(config);  var lengths = context.FromDsc<LineRecord>("MyTextData") .Select(r => r.Line.Length); Console.WriteLine("The maximum line length is {0}", lengths.Max()); } }}

Page 18: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Analyzing data using LINQ to HPC

demo

Page 19: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Managing data and HPC cluster

HPC Server administration basics: Managing the job queueHow to identify the user that submitted jobsCanceling a runaway job

Data Storage Catalog specific tasks: Monitor disk usage tracked by DSC on each nodeView how the DSC file set maps to NTFS across nodesIdentify the nodes where files are replicated

Page 20: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Quick overview of the software components that made this possible.

HPC provisioning, management, etc.

MPI SOALINQ to HPC

runtime

Windows Server

Azure*

Distributed runtimes

Cluster and cloud services

Platform

DSC (Distributed Storage Catalog)

Bind individual NTFS shares together to support the LINQ to

HPC distributed runtime

Programming models LINQ to HPC NEW

* Future support planned

Page 21: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

How LINQ to HPC and Parallel Data Warehouse complement each other

Customer needs for Big Data lie on a spectrumOne extreme is analytics targeting a traditional data warehouse. The analyst knows the cube he or she wants to build, and the analyst knows the data sources.Another extreme is analyzing raw unstructured data. The analyst does not know exactly what the data contains, nor what cube would be justified. The analyst needs to do ad-hoc analyses that may never be run again.

HPC Server targets the raw unstructured data extreme.

Page 22: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Microsoft already has great data platform assets

PowerPivot, SQL Server Integration Services (SSIS), Parallel Data Warehouse (PDW), …HPC+LINQ to HPC’s focus on raw unstructured data analytics enables new solutions that incorporate multiple assets

E.g., analyze raw unstructured data using HPC+LINQ to HPC then pipe it to SSIS and apply rest of BI stack

Page 24: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

For more information

Download HPC Server 2008 R2 Evaluation Copy Today – microsoft.com/hpc

Download Service Pack 2 Beta - connect.microsoft.com

HPC Server Hands-on Labs – microsoft.com/hpc -> Technical Resources

Product Demo Station – in the Server and Cloud Section

HPC Server Certification Exam - microsoft.com/learning/en/us/exam.aspx?ID=70-690

Find Me Later At… twitter: @saptak

Page 25: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Resources

www.microsoft.com/teched

Sessions On-Demand & Community Microsoft Certification & Training Resources

Resources for IT Professionals Resources for Developers

www.microsoft.com/learning

http://microsoft.com/technet http://microsoft.com/msdn

Learning

http://northamerica.msteched.com

Connect. Share. Discuss.

Page 26: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

Complete an evaluation on CommNet and enter to win!

Page 27: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")

© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to

be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS

PRESENTATION.

Page 28: 4 5 6 var logentries = from line in logs where !line.StartsWith("#") select new LogEntry(line); var user = from access in logentries where access.user.EndsWith(@"\sen")