The Microsoft BigData Story
-
Upload
lynn-langit -
Category
Technology
-
view
104 -
download
0
description
Transcript of The Microsoft BigData Story
Microsoft’s BigData Story@LynnLangit
April 2013 – Big Data Tech Con
Data Expertise / Lynn Langit
• Industry awards– Microsoft – MVP for SQL Server – Google – GDE for Cloud Platform– 10Gen – Master for MongoDB
• Practicing Architect• Technical author / trainer
– Pluralsight – Google Cloud Series– DevelopMentor – SQL Server Series – 2 books on SQL Server BI
• Former MSFT FTE– 4 years
In a Relationship?
BigDat
a
NoSQL
BigData, NoSQL… => No Microsoft?
• Cheap Storage • Cloud Storage • Open Source data projects (Hadoop)
Big Data => keeping / getting more data
• NoSQL data projects • Mostly open source• Sharded replicas
NoSQL => schema-lite, scalable storage
In a (Open Source) Relationship?
NoSQLHadoop
MongoDB
Neo4j
Riak
Cassandra
Cloud
AWS Heroku RackSpace OpenStack
DEMOHDINSIGHT (HADOOP)
Data Services
The Reality
BigData
Small BigData
BigData Lifecycle Management
Locate
QuantifyQualify
ReplicateProcess
Present
Locating the data
Your source
• in SQL Server• on desktops
Public source
• you find it
Private source
• you buy it
Finding Data in Data Markets
• Windows Azure Data Market• DataMarket.com• Factual.com• InfoChimps
DEMOAZURE DATAMARKET
Data Services
Database Lifecycle Management
• Evaluating current processes• Improving processes• Adding new tools– SSDT
• Data synchronization processes
Storing the data
• SQL Server – can use partitioning for scalability
Relational
• Specialized data types• XML, Hierarchy, Filestream/Filetable, Geospatial• Columnstore index
Beyond relational via relational
• OLAP cubes / Mining Models• Tabular models
Multi-dimensional / in-memory
DEMO COLUMNSTORE, XML, FILETABLE
Big Data in SQL Server 2012 – Relational Enhancements
Data Processing
Raw dataPre-processed data
Detail dataAggregate data
Views
Valuing the data
• De-duplicating• Validating• Correcting errors• Aggregating• Ranking / rating– Social rating ,i.e. Yelp-like– Social scoring, i.e. Freebase-like
DEMODATA QUALITY SERVICES
Data Services
Types of Data Quality Projects• Exact matches WHERE = , WHERE <>, WHERE IN
• LIKE % -- string matchingT-SQL scripts (boolean
match)
• CONTAINSFull-text matching
(semantic word match)
• SEMANTICSIMIALARITIESTABLESemantic Search (semantic phrase match)
• List belowSSIS tasks - (transactional,
multi-valued matching)
• Knowledge Base - rules/matches• Data Quality project - clean / correct dataDQS (KB matching)
• Versioned Entities, Attributes and RulesMDS (One view of truth)
Data Presentation
• View-only client• View & manipulate (hide-only) client• View & query (aggregate) client• View & query (drill through) client• View & mash-up (add new data) client• View & update client• Timeliness of data (latency)• Beauty of data
But, does it work in Excel?
Import Data
Mash-up data with
PowerPivot – including Hadoop via
ODBC
Clean up data with
Data Quality Services
Extract-Transform-Load with
Data Explorer
Authorize with
Master Data
Services
3rd party – Mine with Predixion
DEMOTHE POWER OF EXCEL
From Pivot tables to Visualized Data Mash-ups with Mining
What about the UDM?
• UDM / Data Mining is fully supported in SSAS• Must be installed in this mode– Mutually exclusive to Tabular mode
• But, should you use it anymore?
DEMOTABULAR MODELSDATA MINING
Big Data in SQL Server 2012 – Non-Relational Features
Data Consumability
Appr
opria
te
Reco
gniz
able
Beau
tiful
Valid
Enjo
yabl
e
(Accurate)
(Meaningful)
(Useful)
(Appealing)
(Satisfying)
DEMOPOWERVIEW
PowerView for Tabular Models
Data Fluency and Job Roles
Consumer• View and
understand
Analyzer• View,
manipulate and decide
Cleaner• Validate
and update
Artist• Visualize
and present
BigData in SQL Server 2012• Scaling via
• Partitioning for Tables, indexes• PDW• Columnstore indexes• Special Data Types
• XML, Hierarchy, Filetable
Relational engine
• OLAP Cubes• Tabular Models• Data Mining Models
Analysis service engines
• Data Quality Services• Master Data Services• StreamInsight
Other services
Other Data Services from Microsoft
Windows Azure
MarketplaceSQL Azure
Data Explorer Power Pivot
NoSQL – New Products / Betas
SSRS on Azure
HDInsight (Hadoop on Azure)
Cloud-based Data Explorer
PowerView
Semantic Search
Announced Futures
Hekatron Polybase Cloud Numerics
Infer.NET Fun (F#)
Many MSR Data Mining
Projects
The Changing Data Landscape
NoSQLRDBMS
OtherServices
www.TeachingKidsProgramming.org• Free Courseware • Do a Recipe Teach a Kid (Ages 10 ++)• Java or Microsoft SmallBasic• C# on Pluralsight
• recipes)
Toward Data Craftsmanship…
Follow me• @LynnLangit• www.LynnLangit.com• YouTube - SoCalDevGal
Hire me• To help build your BI/Big Data solution• To teach your team next gen BI• To learn more about using NoSQL solutions