BI/Analytics for NoSQL:Review of Architectures
What we'll answer in 50 minutes
• Who is this guy?• How do I enable AdHoc, self
service reporting on NoSQL?• How do I improve the
performance of dashboards on top of NoSQL?
• How do I integrate NoSQL data with my other data not inside NoSQL?
• How do I enable, easy to build simple reports but also preserve the ability for rich NoSQL queries?
Nicholas Goodman
• Open Source BI thought leader– 50+ Open Source BI customer projects– Blogger, whitepapers, etc
• Entrepreneur – DynamoBI Corporation– Bayon Technologies, Inc.
• Data Geek, hacker, tinkerer, committer
GOAL: Share perspectives, research, opinions.DISCLAIMER: Your Mileage ...
How do we answer those Q's?
Promise of “Big Data”
• NoSQL/Hadoop/MapReduce Systems– Keep more of it– Cost effective analysis– “Massive scale” data, now accessible to everyone (elastic)– Not just SQL queries, more complex analysis
ACCOMPLISHED: WEB SCALE, MASSIVE NEVER BEFORE SEEN SCALE OF DATA STORAGE AND PROCESSING
Reality Check!
• Petabytes? Y• Cheap Storage? Y• Raw Processing? Y• Rich Query Languages? Y• Flexible data structures? Y• Reliable, Fault Tolerant? Y
• Fast Queries? N• Ad Hoc access? N• Accessibility to commodity BI
tools? N• Easy report authoring? N• Levels of Aggregation? N• Integrated Data? N
Big Data has solved the INFRASTRUCTURE of raw/core data storage but has provided less value to what BUSINESS users want for analytics.
Data Gaps too!
• Code, Developers• MR, Rich Graph/Access• Hierarchical, Unstructured
• Analysts w/ Excel, Dashboards• Simple 2D (tables, charts)• Filtering and easy analytics
100 BILLION
100 MILLION
100 BILLION100 BILLION
1 MILLION1 MILLION1 MILLION
10K
Levels of Aggregation
1 ROWTO 1 BILLION ROWS
SAME DATA AT VARIOUS LEVELS OF AGGREGATION HUGELY IMPORTANT IN REAL LIFE IMPLEMENTATIONS!
Architectures
• NoSQL reports• NoSQL thru and thru• NoSQL + MySQL• NoSQL as ETL Source• NoSQL programs in BI Tools• NoSQL via BI Database (SQL)
NoSQL reports
• Pay Developer to build applications for reports
Apps
• 100% Richness of NoSQL• Up to date, current• Excellent performance on
large datasets• Custom built, beautiful
reports/dashboards• Single system to manage
• $$, developer driven process• No commodity BI tools• Managing rollups/summaries• Schema-less = Harder!• Hard to integrate other
reporting information
NoSQL thru and thru
• Pay Developer to build FLEXIBLE applications for reports
AdvancedApps
• All of NoSQL report advantages
• Managed aggregations, rollups
• “Guided Adhoc” available inside application
• Higher performance for dashboards/summaries
• $$, developer driven process• $$, app required for aggs• No commodity BI tools• Hard to integrate other
reporting information• Limited AdHoc (only
developer built combinations)
IndicesAggs
NoSQL + MySQL
• Pay Developer to build FLEXIBLE applications for reports
• Less IT $$ since developers aren't “building reports”
• Rich, NoSQL analysis left in place (ETL + NoSQL)
• Easy, Ad Hoc reporting via commodity BI tools
• Easier to understand data for self service reports
• Data freshness (24 hrs old)• Once into MySQL no rich
NoSQL application use (M/R)• BI Tool can connect ONLY to
data in MySQL, not NoSQL• Aggregations still self
managed in MySQL
MySQLETLApp
Informatica
NoSQL as ETL Data Source
• NoSQL treated like any other data source
• Allows use of consolidated, BI tool for AdHoc
• Enables integrated (combined) datasets for reporting
• Aggregations Often “managed”
• Best of Breed tools
• ETL Development Expense• Data Latency• Loss of NoSQL language
richness• Traditional DW tools are $$• Scaling issues with DW
Database
Teradata
NoSQL programs in BI Tools
• Write a program in BI tool that flattens data, output into report
• Rich use of NoSQL native language
• Direct, up to date access• Access to 100% of dataset• Leverage “guided” report
parameter pages• Less expensive than apps
• Developer required to write program ($$)
• Slow-er (aggs, summaries)• Lacks integration with other
datasets• Still (usually) no AdHoc
access
NoSQL via BI Database (SQL)
• Enable NoSQL data access via SQL (gasp!)
• Easy reports, easy (SQL)• Integration with other data• ETL is simple INSERT/MERGEs• Live, up to date access• High performance, cached data• AdHoc access to Live + Cached• Aggregations/Summaries
• Another system in between• Still needs to be refreshed,
nightly• Not all capabilities for NoSQL
richness available via SQL
Live Query
Cached, 24hr data
Mozilla: NoSQL thru and thru(DB)
• Socorro Project: Crash reports, optionally sent to Mozilla• https://crash-stats.mozilla.com
X: NoSQL via SQL
• Using “Splunk” (ie, a commercial NoSQL-eee data aggregator/etc)• Desire to use Tableau for advanced analytics/visualization
Meteor Solutions:NoSQL thru and thru
• Using Cloudant BigCouch solution (SaaS)• High performance set of multi purpose indices on pre defined
aggregations• Up to date aggregation/reports• Better fit for Social Media graph structures over relational DB• Custom built BI applications (dashboards/reports) providing a
flexible guided view through data
AdvancedApps
A,B,C: NoSQL + MySQL
MySQLETLApp
• Many Many companies (3 we've worked with)• All “web related” companies (semi structured, some, mostly
volume)• Heavy lifting and storage, and “ETL/Data prepartion” inside
Hadoop• Push summarized, aggregated data into MySQL for analysis by
easy, dashboarding/BI Tools
Top Related