Experiences Evolving a New Analytical Platform: What Works and What's Missing
-
Upload
cloudera-inc -
Category
Technology
-
view
3.114 -
download
1
Transcript of Experiences Evolving a New Analytical Platform: What Works and What's Missing
Evolving a New Analytical PlatformWhat Works and What’s Missing
Jeff HammerbacherChief Scientist, ClouderaJune 8, 2010
Saturday, June 12, 2010
My BackgroundThanks for Asking
▪ [email protected]▪ Studied Mathematics at Harvard▪ Worked as a Quant on Wall Street▪ Conceived, built, and led Data team at Facebook▪ Nearly 30 amazing engineers and data scientists▪ Several open source projects and research papers
▪ Founder of Cloudera▪ Chief Scientist▪ Also, check out the book “Beautiful Data”
Saturday, June 12, 2010
Presentation Outline▪ BI: Science for Profit▪ Need tools for whole research cycle▪ SQL Server 2008 R2: defining the platform
▪ State of the Platform Ecosystem▪ New Foundations: Hadoop▪ Boiling the Frog▪ Future developments
▪ Questions and Discussion
Saturday, June 12, 2010
Jim Gray: Science entering Fourth Paradigm“We have to do better at producing tools to
support the whole research cycle”
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting Services
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data Services
Saturday, June 12, 2010
RDBMS: SQL ServerETL: SQL Server Integration Services
Reporting: SQL Server Reporting ServicesAnalysis: SQL Server Analysis Services
Search: Full-Text Search
CEP: StreamInsight
OLAP: PowerPivot
MDM: Master Data ServicesCollaboration: SharePoint
Saturday, June 12, 2010
Platform ProvidersInfrastructure Providers
Application Developers
Content Providers
Saturday, June 12, 2010
Platform ProvidersInfrastructure Providers
Application DevelopersEnd Users
Content Providers
Saturday, June 12, 2010
Content Providers1. > 95% of enterprise data is unstructured
2. Data volumes growing rapidly
Saturday, June 12, 2010
End Users1. Move beyond reporting to analytics2. Make use of all enterprise data
Saturday, June 12, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Saturday, June 12, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Saturday, June 12, 2010
2007: Make Hadoop scaleJim Gray’s “Fourth Paradigm” lecture
Yahoo! makes Pig open source
Randy Bryant’s “DISC” lecture
Powerset makes HBase open source
Saturday, June 12, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmark
Saturday, June 12, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Saturday, June 12, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source
Saturday, June 12, 2010
2008: Make Hadoop fastFirst Hadoop Summit
Yahoo! wins Daytona terabyte sort benchmarkYahoo! builds production webmap with Hadoop
Facebook makes Hive open source“MapReduce: A Major Step Backwards”
Saturday, June 12, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYC
Saturday, June 12, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Saturday, June 12, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
Saturday, June 12, 2010
2009: Insert Hadoop into the enterpriseCloudera releases CDH
First Hadoop World NYCYahoo! sorts a petabyte with Hadoop
Cloudera adds training, support, services
“The Unreasonable Effectiveness of Data”
Saturday, June 12, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Saturday, June 12, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Saturday, June 12, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Saturday, June 12, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Teradata, Pentaho, and others integrate
Saturday, June 12, 2010
2010: Integrate Hadoop into the enterpriseIBM announces InfoSphere BigInsights
Yahoo! completes enterprise-class security
Datameer and Karmasphere funded
Teradata, Pentaho, and others integrateHive adds JDBC and ODBC
Saturday, June 12, 2010