Download - BI/Analytics on NoSQL: Review of Architectures

Transcript
Page 1: BI/Analytics on NoSQL: Review of Architectures

BI/Analytics for NoSQL:Review of Architectures

Page 2: BI/Analytics on NoSQL: Review of Architectures

What we'll answer in 50 minutes

• Who is this guy?• How do I enable AdHoc, self

service reporting on NoSQL?• How do I improve the

performance of dashboards on top of NoSQL?

• How do I integrate NoSQL data with my other data not inside NoSQL?

• How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?

Page 3: BI/Analytics on NoSQL: Review of Architectures

Nicholas Goodman

• Open Source BI thought leader– 50+ Open Source BI customer projects– Blogger, whitepapers, etc

• Entrepreneur – DynamoBI Corporation– Bayon Technologies, Inc.

• Data Geek, hacker, tinkerer, committer

GOAL: Share perspectives, research, opinions.DISCLAIMER: Your Mileage ...

Page 4: BI/Analytics on NoSQL: Review of Architectures

How do we answer those Q's?

Page 5: BI/Analytics on NoSQL: Review of Architectures

Promise of “Big Data”

• NoSQL/Hadoop/MapReduce Systems– Keep more of it– Cost effective analysis– “Massive scale” data, now accessible to everyone (elastic)– Not just SQL queries, more complex analysis

ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING

Page 6: BI/Analytics on NoSQL: Review of Architectures

Reality Check!

• Petabytes? Y• Cheap Storage? Y• Raw Processing? Y• Rich Query Languages? Y• Flexible data structures? Y• Reliable, Fault Tolerant? Y

• Fast Queries? N• Ad Hoc access? N• Accessibility to commodity BI

tools? N• Easy report authoring? N• Levels of Aggregation? N• Integrated Data? N

Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.

Page 7: BI/Analytics on NoSQL: Review of Architectures

Data Gaps too!

• Code, Developers• MR, Rich Graph/Access• Hierarchical, Unstructured

• Analysts w/ Excel, Dashboards• Simple 2D (tables, charts)• Filtering and easy analytics

Page 8: BI/Analytics on NoSQL: Review of Architectures

100 BILLION

100 MILLION

100 BILLION100 BILLION

1 MILLION1 MILLION1 MILLION

10K

Levels of Aggregation

1 ROWTO 1 BILLION ROWS

SAME DATA AT VARIOUS LEVELS OF AGGREGATION HUGELY IMPORTANT IN REAL LIFE IMPLEMENTATIONS!

Page 9: BI/Analytics on NoSQL: Review of Architectures

Architectures

• NoSQL reports• NoSQL thru and thru• NoSQL + MySQL• NoSQL as ETL Source• NoSQL programs in BI Tools• NoSQL via BI Database (SQL)

Page 10: BI/Analytics on NoSQL: Review of Architectures

NoSQL reports

• Pay Developer to build applications for reports

Apps

• 100% Richness of NoSQL• Up to date, current• Excellent performance on

large datasets• Custom built, beautiful

reports/dashboards• Single system to manage

• $$, developer driven process• No commodity BI tools• Managing rollups/summaries• Schema-less = Harder!• Hard to integrate other

reporting information

Page 11: BI/Analytics on NoSQL: Review of Architectures

NoSQL thru and thru

• Pay Developer to build FLEXIBLE applications for reports

AdvancedApps

• All of NoSQL report advantages

• Managed aggregations, rollups

• “Guided Adhoc” available inside application

• Higher performance for dashboards/summaries

• $$, developer driven process• $$, app required for aggs• No commodity BI tools• Hard to integrate other

reporting information• Limited AdHoc (only

developer built combinations)

IndicesAggs

Page 12: BI/Analytics on NoSQL: Review of Architectures

NoSQL + MySQL

• Pay Developer to build FLEXIBLE applications for reports

• Less IT $$ since developers aren't “building reports”

• Rich, NoSQL analysis left in place (ETL + NoSQL)

• Easy, Ad Hoc reporting via commodity BI tools

• Easier to understand data for self service reports

• Data freshness (24 hrs old)• Once into MySQL no rich

NoSQL application use (M/R)• BI Tool can connect ONLY to

data in MySQL, not NoSQL• Aggregations still self

managed in MySQL

MySQLETLApp

Page 13: BI/Analytics on NoSQL: Review of Architectures

Informatica

NoSQL as ETL Data Source

• NoSQL treated like any other data source

• Allows use of consolidated, BI tool for AdHoc

• Enables integrated (combined) datasets for reporting

• Aggregations Often “managed”

• Best of Breed tools

• ETL Development Expense• Data Latency• Loss of NoSQL language

richness• Traditional DW tools are $$• Scaling issues with DW

Database

Teradata

Page 14: BI/Analytics on NoSQL: Review of Architectures

NoSQL programs in BI Tools

• Write a program in BI tool that flattens data, output into report

• Rich use of NoSQL native language

• Direct, up to date access• Access to 100% of dataset• Leverage “guided” report

parameter pages• Less expensive than apps

• Developer required to write program ($$)

• Slow-er (aggs, summaries)• Lacks integration with other

datasets• Still (usually) no AdHoc

access

Page 15: BI/Analytics on NoSQL: Review of Architectures

NoSQL via BI Database (SQL)

• Enable NoSQL data access via SQL (gasp!)

• Easy reports, easy (SQL)• Integration with other data• ETL is simple INSERT/MERGEs• Live, up to date access• High performance, cached data• AdHoc access to Live + Cached• Aggregations/Summaries

• Another system in between• Still needs to be refreshed,

nightly• Not all capabilities for NoSQL

richness available via SQL

Live Query

Cached, 24hr data

Page 16: BI/Analytics on NoSQL: Review of Architectures

Mozilla: NoSQL thru and thru(DB)

• Socorro Project: Crash reports, optionally sent to Mozilla• https://crash-stats.mozilla.com

Page 17: BI/Analytics on NoSQL: Review of Architectures

X: NoSQL via SQL

• Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)• Desire to use Tableau for advanced analytics/visualization

Page 18: BI/Analytics on NoSQL: Review of Architectures

Meteor Solutions:NoSQL thru and thru

• Using Cloudant BigCouch solution (SaaS)• High performance set of multi purpose indices on pre defined

aggregations• Up to date aggregation/reports• Better fit for Social Media graph structures over relational DB• Custom built BI applications (dashboards/reports) providing a

flexible guided view through data

AdvancedApps

Page 19: BI/Analytics on NoSQL: Review of Architectures

A,B,C: NoSQL + MySQL

MySQLETLApp

• Many Many companies (3 we've worked with)• All “web related” companies (semi structured, some, mostly

volume)• Heavy lifting and storage, and “ETL/Data prepartion” inside

Hadoop• Push summarized, aggregated data into MySQL for analysis by

easy, dashboarding/BI Tools