Microsoft Technologies for Data Science 201601
-
Upload
mark-tabladillo -
Category
Data & Analytics
-
view
1.247 -
download
0
Transcript of Microsoft Technologies for Data Science 201601
Microsoft Technologies for Data Science
Mark Tabladillo, Ph.D.
Senior Data Scientist
LogicBlox/Predictix
Networking
Interactive
http://www.bizjournals.com/atlanta/subscriber-only/2015/07/31/top-tech-employers-2015.html
Name Atlanta Georgia Total
McKesson Corporation 3,455 3,525 36,868
Name Atlanta Georgia Total
McKesson Corporation 3,455 3,525 36,868
Verizon Wireless 3,525 4,839 NA
Name Atlanta Georgia Total
McKesson Corporation 3,455 3,525 36,868
Verizon Wireless 3,525 4,839 NA
Lockheed Martin 5,800 5,800 24,000
Name Atlanta Georgia Total
McKesson Corporation 3,455 3,525 36,868
Verizon Wireless 3,525 4,839 NA
Lockheed Martin 5,800 5,800 24,000
Cox Enterprises Inc. 7,484 7,685 50,000
Name Atlanta Georgia Total
McKesson Corporation 3,455 3,525 36,868
Verizon Wireless 3,525 4,839 NA
Lockheed Martin 5,800 5,800 24,000
Cox Enterprises Inc. 7,484 7,685 50,000
AT&T Inc. 16,794 21,084 250,790
Terms Definition
Data Science
Machine Learning
Data Mining
Applied Statistics
the automated or semi-
automated process of
discovering patterns in
data
Applied scientific method
http://www.kdnuggets.com/polls/2015/analytics-
data-mining-data-science-software-used.html
http://products.office.com/en-us/excel
http://www.microsoft.com/en-us/server-cloud/products/sql-server/
http://pytools.codeplex.com/
http://azure.microsoft.com/en-us/services/hdinsight/
http://www.revolutionanalytics.com/
Technology Choices
SQL SERVER ANALYSIS SERVICES Enterprise
Business Intelligence
EXCEL ADD-IN FOR SSAS Office 365
Office 2013 or Higher x64
SEMANTIC SEARCH Enterprise
Business Intelligence
Standard
Web
Express with Advanced Services
MICROSOFT AZURE ML Free (Size Limited)
Paid (Web Service): Experiment + Query
F# Open Source
http://download.microsoft.com/download/F/C/2/FC21C981-
4351-4434-A78A-
3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D
ata_White_Paper.pdf
SS
SQL
AS
NoSQL
Data mining add-in for business analysts
• Ease of use
• Rich data mining
• Scalable
Rowset
Output
with Scores
Varchar
NVarchar
Office
Documents
Full-Text
Keyword
Index
“FTI”
iFilters
Semantic Document
Similarity Index “DSI”
Semantic
Database
Semantic
Key Phrase
Index –
Tag Index
“TI”
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
http://download.microsoft.com/download/3/B/9/3B9FBA69-8AAD-4707-830F-6C70A545C389/Introducing_Azure_Machine_Learning.pdf
http://datacamp.com
https://github.com/jakevdp/sklearn_pycon2015
http://azure.microsoft.com/en-us/services/machine-learning/
Mutable Immutable
Classic Open
Source
Java Scala
.NETNow Open Source
C#, C++,
VB.NET
F#
Conference: http://www.kdd.org/
http://www.kdnuggets.com/2015/09/free-data-science-books.html
https://channel9.msdn.com/Blogs/Windows-Azure
https://mva.microsoft.com/
http://blogs.technet.com/b/machinelearning/
http://social.msdn.microsoft.com/forums/azure/en-US/home?forum=MachineLearning
http://sqlserverdatamining.com
http://marktab.net
http://curah.microsoft.com/342704/azure-machine-learning-videos-february-2015