RailsWayCon: Multidimensional Data Analysis with JRuby
description
Transcript of RailsWayCon: Multidimensional Data Analysis with JRuby
Multidimensional Data Analysiswith JRuby
Raimonds Simanovskis
Raimonds Simanovskis
github.com/rsim
@rsim
.com
Relationaldata model
SQL is good for detailed data queries
Get all sales transactions inUSA, California
SELECT customers.fullname, products.product_name, sales.sales_date, sales.unit_sales, sales.store_salesFROM sales LEFT JOIN products ON sales.product_id = products.id LEFT JOIN customers ON sales.customer_id = customers.idWHERE customers.country = 'USA' AND customers.state_province = 'CA'
SQL becomes complexfor analytical queries
SELECT product_class.product_family, SUM(sales.unit_sales) unit_sales_sum, SUM(sales.store_sales) store_sales_sum FROM sales LEFT JOIN product ON sales.product_id = product.product_id LEFT JOIN product_class ON product.product_class_id = product_class.product_class_id LEFT JOIN time_by_day ON sales.time_id = time_by_day.time_id LEFT JOIN customer ON sales.customer_id = customer.customer_id WHERE time_by_day.the_year = 2011 AND time_by_day.quarter = 'Q1' AND customer.country = 'USA' AND customer.state_province = 'CA' GROUP BY product_class.product_family
Get total sales in USA, Californiain Q1, 2011 by main product groups
If SQL is not good then we need
NoSQL!
Maybe write distributed map reduce function?
http://browsertoolkit.com/fault-tolerance.png
MultidimensionalData Model
Multidimensional cubes
DimensionsHierarchies and levels
Measures
OLAP technologiesOn-Line Analytical Processing
Oracle OLAPOracle Essbase SAP BUSINESSOBJECTS
Commercial Vendors
Cognos
Analysis Services
MDX query language
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS, [Product].children ON ROWSFROM [Sales]WHERE ( [Time].[2011].[Q1], [Customers].[USA].[CA] )
Get total units sold and sales amountin USA, California in Q1, 2011by main product groups
http://github.com/rsim/mondrian-olap
(R)OLAP schemaDimensional model: cubes dimensions (hierarchies & levels) measures, calculated measures
Relational model: fact tables, dimension tables joined by foreign keys
Mapping
OLAP schema definition
schema = Mondrian::OLAP::Schema.define do cube 'Sales' do table 'sales' dimension 'Gender', :foreign_key => 'customer_id' do hierarchy :has_all => true, :primary_key => 'customer_id' do table 'customer' level 'Gender', :column => 'gender', :unique_members => true end end dimension 'Time', :foreign_key => 'time_id' do hierarchy :has_all => false, :primary_key => 'time_id' do table 'time_by_day' level 'Year', :column => 'the_year', :type => 'Numeric', :unique_members => true level 'Quarter', :column => 'quarter', :unique_members => false level 'Month',:column => 'month_of_year',:type => 'Numeric',:unique_members => false end end measure 'Unit Sales', :column => 'unit_sales', :aggregator => 'sum' measure 'Store Sales', :column => 'store_sales', :aggregator => 'sum' endend
Query Builder in Ruby
olap.from('Sales').columns('[Measures].[Unit Sales]', '[Measures].[Store Sales]').rows('[Product].children').where('[Time].[2011].[Q1]', '[Customers].[USA].[CA]').execute
Get total units sold and sales amountin USA, California in Q1, 2011by main product groups
Also more complex queries
olap.from('Sales').with_member('[Measures].[ProfitPct]'). as('(Measures.[Store Sales] - Measures.[Store Cost]) / Measures.[Store Sales]', :format_string => 'Percent').columns('[Measures].[Store Sales]', '[Measures].[ProfitPct]').rows('[Product].children').crossjoin('[Customers].[Canada]', '[Customers].[USA]'). top_count(50, '[Measures].[Store Sales]')where('[Time].[2011].[Q1]').execute
Get sales amount and profit %of top 50 products sold in USA and Canada during Q1, 2011
Demo
Used in eazybi.com