Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

34
Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage Rob Collie

description

Presentation to NYC MSBIgData Group

Transcript of Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Page 1: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Rob Collie

Page 2: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Me

13+ years at Microsoft in Redmond

Technical design, strategic direction, and project management

Office 97, Windows Installer (MSI) v1, Excel 2003, Excel 2007, Bing

Designed much of PowerPivot v1

CTO, Pivotstream.com

PowerPivotPro.com, PowerPivotFAQ.com

Page 3: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

“Dominate” is a dramatic word

Back end storage and processing isn’t going anywhere (but it will change slightly)

Not a threat – an opportunity

Page 4: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Two Agendas

What the opportunity looks like & where Excel/PowerPivot fits

How Excel earned its stigmas and how PowerPivot dispels most of them

Will swap back and forth between them

Page 5: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Why Excel “Sucks” Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &

like it!5. No separation of presentation and logic. No “portability.”

Page 6: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

DEMO1: MILES AND MILES OF DATA

Page 7: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

300 Million Rows in One Workbook

Page 8: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

If Printed Out, Those 300M Rows Would Stretch 1,000 Miles!

Page 9: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Want Billions of Rows? Import to Tabular BISM

Page 10: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Want Billions? Import to Tabular BISM

Page 11: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Import Results – Same Formulas and UX as PowerPivot, Just a Different Frame (VS vs. Excel)

Page 12: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Updating The Checklist – PowerPivot “Fixes” Excel Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects!2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 &

like it!5. No separation of presentation and logic. No “portability.”

Page 13: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

OPPORTUNITY

Page 14: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Trend #1: Data Explosion

Library of Congress:

530 miles of bookshelves

10 Terabytes (That’s it???)

2006 2007 2008 2009 2010 20110

200

400

600

800

1000

1200

1400

1600

1800

2000

Worldwide Data Storage (EB)

Worldwide Data in Storage:

~180 Million TB in 2006

10x increase in 5 years!

~3 Libraries of Congress per US Household

Page 15: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Trend #2: BI Spending ACCELERATES in Recessions

page 15

Page 16: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

If Big Data is not accessible via the right tools, you might as well not even be storing it.

Page 17: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

DEMO: WHAT NEW YORKERS DRINK

Page 18: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Demo Screenshot: Corona Dominates NYC Beer Sales

Note that this demo is running in my browser!– No Excel or PowerPivot install required– Even runs on Mac and iPad

Very “Fisher Price” UX, not scary like Excel – just a friendly website

Page 19: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

But Stella Artois Rules Manhattan

Note that the report is sliced to Manhattan only, one click

Also note that the user cannot download the workbook, just interact with it – secure and controlled

Page 20: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

VERY Different Bestseller list in the Bronx

“Cordina” brand holds spots 2, 3, 5, and 7

This report automatically refreshes itself with the latest data on a regular schedule – no human intervention required!

Page 21: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

The Checklist Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like

it!5. No separation of presentation and logic. No “portability.”

Page 22: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Big Data is a Matter of Opinion

The v’s

<Went looking for supporting articles>

Page 23: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Confirmation!

Page 24: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Important Points/My Opinions

Decisionmakers don’t care how data is stored

Decisionmakers don’t care how big the data is– Even 1,000 rows is bigger than they can digest– Humans can digest one screen at most– They need us to give them SMALL data

Decisionmakers don’t like to learn new tools

It is pointless and counterproductive to fight any of this

Opinion: At the place where it matters, there is no difference between Big Data and BI – it’s all Insight, consumed primarily by non-technical humans

Page 25: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

But Decisionmakers are an Obstacle

Only they know what they know

Only they know what they need– They don’t even know what they want til they see what

they don’t!

They don’t know how to explain either of the above

They don’t understand your language at all – what’s easy, what’s difficult

They budget to spend about 10% of the time required with you

Page 26: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

True Story: How a week became an hour

In 2006, I hired a top-notch BI pro for a project at MS

I was the domain expert (the “decisionmaker”) but knew nothing of the toolset.

He was the technical pro (the “doer”) and knew nothing of the domain.

Writing and debugging a single formula took a full week of iteration and communication.

In 2009 I revisited the same project– But thanks to PowerPivot, this time I was both decisionmaker and doer

The same formula process now took LESS THAN ONE HOUR!– This was true even though I had forgotten every last detail of the 2006

project

Why did a week become an hour? HOW???

Page 27: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Communication: The “Dark Matter” of BI Projects

Knowledge Worker

…but person to person communication at “2400 Baud Dialup” speed BI Pro

Internal Communicationat “Broadband” Speed…

Where the Time Gets Spent Where the Time Gets Spent

Internal Communicationat “Broadband” Speed…

Never budgeted or accounted or rewarded… so they don’t commit

Page 28: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Of which, 10% create PivotTables

- Every org has them- ~7M Java Devs, 2M SQL Pros- Each supports avg of 15 BDM’s- Support majority of informed decisions in the biz world

Excel Pros – Data Pros’ New Allies

300M Users

30M Pros

But Even Better…

They intrinsically know the business as well as the decisionmakers (often, they ARE decisionmakers)

They share your (IT, development) mindset more than you’d expect

They can and will pick up PowerPivot quickly

They NEED you

They’re great teammates and are thrilled to cooperate with you

Page 29: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Traditional Model Bottlenecks on BI Pro, and Coming Soon to Big Data

Knowledge Worker /Analyst / Excel Pro BI Pro

BI Pro Intensely Engaged with One Project at a Time

Everyone Else Waits Make uninformed decisions/guesses Burn time inefficiently with

spreadsheets Make costly spreadsheet mistakes Leak sensitive information Become entrenched in spreadsheet

process, resistant to improvement once BI resources available

Page 30: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

BIgData Pro Now Can Address Multiple Projects

Page 31: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

BUDGET VS ACTUALS

Page 32: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

The Demo See blog posts:

– http://ppvt.pro/BudgetActuals1– http://ppvt.pro/BudgetActuals2

Page 33: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

The Checklist Let’s be analytical! What are the precise problems?

Think of it as a BI environment…1. 1M row capacity (and slow long before you max it out)2. Only works on single flat tables – no dimensional modeling3. Tempting place to perform “ETL” – and no autorefresh

Think of it as a programming environment…1. Files are the source projects, with no enforced “blessed” version2. Files are the distribution method!3. The runtime environment IS the development environment!4. Self-obfuscating! “AvgSales?” You wish. You will call it H7 & like

it!5. No separation of presentation and logic. No “portability.”

Page 34: Same Problems, More Zeroes: Why the Spreadsheet (and PowerPivot) Will Dominate Big Data Usage

Bonus Demos

Weather

Power View

Connection to Hadoop

UFO’s