RailsWayCon: Multidimensional Data Analysis with JRuby

19
Multidimensional Data Analysis with JRuby Raimonds Simanovskis

description

Presentation at RailsWayCon 2011 conference

Transcript of RailsWayCon: Multidimensional Data Analysis with JRuby

Page 1: RailsWayCon: Multidimensional Data Analysis with JRuby

Multidimensional Data Analysiswith JRuby

Raimonds Simanovskis

Page 2: RailsWayCon: Multidimensional Data Analysis with JRuby

Raimonds Simanovskis

github.com/rsim

@rsim

.com

Page 3: RailsWayCon: Multidimensional Data Analysis with JRuby

Relationaldata model

Page 4: RailsWayCon: Multidimensional Data Analysis with JRuby

SQL is good for detailed data queries

Get all sales transactions inUSA, California

SELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_salesFROM sales LEFT JOIN products ON sales.product_id = products.id LEFT JOIN customers ON sales.customer_id = customers.idWHERE customers.country = 'USA' AND customers.state_province = 'CA'

Page 5: RailsWayCon: Multidimensional Data Analysis with JRuby

SQL becomes complexfor analytical queries

SELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1' AND customer.country = 'USA' AND customer.state_province = 'CA' GROUP BY product_class.product_family

Get total sales in USA, Californiain Q1, 2011 by main product groups

Page 6: RailsWayCon: Multidimensional Data Analysis with JRuby

If SQL is not good then we need

NoSQL!

Page 7: RailsWayCon: Multidimensional Data Analysis with JRuby

Maybe write distributed map reduce function?

http://browsertoolkit.com/fault-tolerance.png

Page 8: RailsWayCon: Multidimensional Data Analysis with JRuby

MultidimensionalData Model

Multidimensional cubes

DimensionsHierarchies and levels

Measures

Page 9: RailsWayCon: Multidimensional Data Analysis with JRuby

OLAP technologiesOn-Line Analytical Processing

Page 10: RailsWayCon: Multidimensional Data Analysis with JRuby

Oracle OLAPOracle Essbase SAP BUSINESSOBJECTS

Commercial Vendors

Cognos

Analysis Services

Page 11: RailsWayCon: Multidimensional Data Analysis with JRuby
Page 12: RailsWayCon: Multidimensional Data Analysis with JRuby

MDX query language

SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWSFROM [Sales]WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )

Get total units sold and sales amountin USA, California in Q1, 2011by main product groups

Page 14: RailsWayCon: Multidimensional Data Analysis with JRuby

(R)OLAP schemaDimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures

Relational model: fact tables, dimension tables joined by foreign keys

Mapping

Page 15: RailsWayCon: Multidimensional Data Analysis with JRuby

OLAP schema definition

schema = Mondrian::OLAP::Schema.define do cube 'Sales' do table 'sales' dimension 'Gender', :foreign_key => 'customer_id' do hierarchy :has_all => true, :primary_key => 'customer_id' do table 'customer' level 'Gender', :column => 'gender', :unique_members => true end end dimension 'Time', :foreign_key => 'time_id' do hierarchy :has_all => false, :primary_key => 'time_id' do table 'time_by_day' level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true level 'Quarter', :column => 'quarter', :unique_members => false level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false end end measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum' measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum' endend

Page 16: RailsWayCon: Multidimensional Data Analysis with JRuby

Query Builder in Ruby

olap.from('Sales').columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]').rows('[Product].children').where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]').execute

Get total units sold and sales amountin USA, California in Q1, 2011by main product groups

Page 17: RailsWayCon: Multidimensional Data Analysis with JRuby

Also more complex queries

olap.from('Sales').with_member('[Measures].[ProfitPct]'). as('(Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]', :format_string => 'Percent').columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]').rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]'). top_count(50, '[Measures].[Store Sales]')where('[Time].[2011].[Q1]').execute

Get sales amount and profit %of top 50 products sold in USA and Canada during Q1, 2011

Page 18: RailsWayCon: Multidimensional Data Analysis with JRuby

Demo

Page 19: RailsWayCon: Multidimensional Data Analysis with JRuby

Used in eazybi.com