Why Atrion

60
Challenges to designing financial warehouses, lessons learnt Steve Simon MVP SQL Server BI http Why Atrion PASS Data Architecture Virtual Chapter

description

Challenges to designing financial warehouses, lessons learnt. Why Atrion. PASS Data Architecture Virtual Chapter. Steve Simon MVP SQL Server BI http ://www.infogoldusa.com. Steve Simon. - PowerPoint PPT Presentation

Transcript of Why Atrion

Page 1: Why Atrion

Challenges to designing financial warehouses, lessons learnt

Steve Simon MVP SQL Server BIhttp://www.infogoldusa.com

Why Atrion

PASS Data Architecture Virtual Chapter

Page 2: Why Atrion

Steve Simon is a SQL Server MVP and a Senior Business Intelligence Development Engineer with Atrion Networking Corporation, Providence RI USA. He has been involved with database design and analysis for over 25 years. Steve has presented numerous papers at PASS summits over the years including PASS Europe, in addition to numerous presentations at SQL Saturday events. He is the chairperson of the Oracle/ SQL Server virtual chapter and is a PASS regional mentor.

Steve Simon

Page 3: Why Atrion
Page 4: Why Atrion

Warehouse design will change with time

Two practical examples are used in this presentation.

Cold facts

Page 5: Why Atrion

The FDR(Financial Data Warehouse)

orHow things can run amuck!!

Page 6: Why Atrion

The Michael Jackson Design Technique

Page 7: Why Atrion

So what is wrong with all of this ?

Page 8: Why Atrion

mf_ jobjob_namejob_categoryjob_statusjob_ownermonitor_flagffc_enabledlast_start_datelast_end_dateavg_start_timeavg_end_timeavg_cpu_timeaverage_job_costtotal_job_costschd_frequencyjob_description

refclientFundClient_IDCLIENT_LONG_NM

mf_oncall_ listjob_namefrequencycics_critical_indbus_critical_indtso_caller_id_1tso_caller_id_2tso_manager_iddr_critical_indjob_description

mf_client_ jobclient_short_namejob_namegen_comment

tmp_all_ successor_ jobsextract_jobsuccessor_job

mf_ job_successorapplication_idjob_namemf_sys_namesucc_appl_idsucc_job_namesucc_sys_name

mf_ job_dataset_ fund *dataset_namefundjob_name

Page 9: Why Atrion

Relational SpaghettiSELECT DISTINCT t.ip_address, f.fund FROM mf_job_dataset_fund f inner join mf_job jon f.job_name = j.job_nameinner join mf_oncall_list o on j.job_name = o.job_nameLEFT JOIN mf_client_job c ON c.job_name = j.job_name LEFT JOIN refclient RC ON f.fund = RC.FundLEFT JOIN tmp_all_successor_jobs s ON s.extract_job = j.job_nameLEFT JOIN mf_transmission t ON s.successor_job = t.job_name AND t.from_dataset NOT LIKE 'KKKK‘ LEFT JOIN mf_job j2 ON t.job_name = j2.job_nameWHERE f.fund IN (‘AAAA’) AND j.job_name = f.job_name

Page 10: Why Atrion

inventory fileclient_long_nmclient_idreceiving_partystandard_or_customextract_reformatter[Direction Flow][SSC / Extract File Description][SSC Transmission J ob][Transmission job desciption]frequencyextract_time[From Transmission file][To:Output file]transmission_typeip_addressfund

1500 Queries and extracts

Page 11: Why Atrion

Users expect reports to be rendered in under 30 seconds

Re submitting reports when no results come back. Middleware failure and tie-ups.

So what is the solution?

The challenges were

Page 12: Why Atrion

Tables based upon subject areas AND REPORT TYPE (de-normalized)

Well indexed.

Easy to populate with what is required.

Back to Michael Jackson

Page 13: Why Atrion

How do we do this?

Data Access Layers (DAL)

Page 14: Why Atrion

Before we start

A DAL function ‘joins’ 2 or more tables and returns a table result set containing a myriad of data fields.

Page 15: Why Atrion

Parameters in

A ‘DAL’ is like a bowling ball

Results out

Process the ‘goodies’

Page 16: Why Atrion

We have a data warehouse.

Users accessed data via views (prior to DAL).

Users created their own SQL to extract their data.

Queries were not structured in an optimal manner.

Page 17: Why Atrion

Joins that you would never expect

GLDAL

GL

ObscureGL

View1

ObscureGL

View3

ObscureGL

View2

POSDAL

POS

ObscureREF

View1

ObscureREF

View3

ObscureREF

View2

Page 18: Why Atrion

In short….

Joins were being made ‘willy-nilly’.

CPU clocking went through the ceiling.

Few understood execution plans.

Those queries sent to us were optimized.

Page 19: Why Atrion

In short

From 10 sites we found a lot of commonality.

Looked for ways to pull data with most optimal execution plans (across the board).

Millions of records in most tables.

M.J. to the rescue.

Page 20: Why Atrion

In short

Pull from tables BUT with optimal execution plan.

Take advantage of the TABLE indices.

Avoid pulling one lone field from a view.

Hence the ‘Birth of the DAL’.

Page 21: Why Atrion

Example of view hell

Column Name ID Data Type Null?

FUND_ID 1 VARCHAR (8 ) NAGNT_BANK_FINS_NUM 2 VARCHAR (5 ) YASOF_CLIENT_SW 3 VARCHAR (1 ) YBASE_CNTRY_CD 4 VARCHAR (2 ) YBNFCY_TAX_ID 5 VARCHAR (9 ) YBOND_SRC_CD 6 VARCHAR (2 ) YCASH_COST_MTHD_IND 7 VARCHAR (1 ) YCASH_SELL_TRANS_CD 8 VARCHAR (2 ) YCLIENT_ACCT_NUM 9 VARCHAR (12 ) Y…………………………….CLIENT_FUND_NUM 100 VARCHAR (4 ) YCLIENT_ID 101 VARCHAR (8 ) Y

Page 22: Why Atrion

Our plan of action

Page 23: Why Atrion

Architecturechanges

Page 24: Why Atrion

TABLE TABLE

VIEWVIEW

DATABASEArchitecture prior to DAL

Page 25: Why Atrion

DAL

TABLE TABLE

DATABASEArchitecture

with DAL

Page 26: Why Atrion

Boils down to efficient use of TABLE indices

Page 27: Why Atrion

Positions

General Ledger

Transactions

Lot level data

DAL Coverage

Page 28: Why Atrion

Sample user defined function

USE DALGOSELECT fund_id, asset_id, calen_dt FROM Get_Pos_Sum('m1te|fdr1|pat2','203900105|IEP','2006-01-01','2008-12-31')

Page 29: Why Atrion

Demo 1

Page 30: Why Atrion

The Michael Jackson Design Technique

Page 31: Why Atrion

Meanwhileback in the

grocery business

Page 32: Why Atrion

Never less than 900 million rows

partitioned& event data

2.3 billion

Page 33: Why Atrion
Page 34: Why Atrion

GUID’s ain’t so great!!

Page 35: Why Atrion

CustomerID int Advert GUID Customer GUID

1 NULL NULL

2 1KWW-9POIU-R2 NULL

3 NULL NULL

CustomerKey Advert GUID

2 1KWW-9POIU-R2

Customer GUID $$$$ Required for the report

12345

Session Data

Warehouse Data

End Client Data

Page 36: Why Atrion

..even tried CTE’s

Page 37: Why Atrion

;with customerKeys as ( select customerKey, customerID from [DataWarehouse].[dbo].[CustomerHelper_MWG] ch join AcmePath.dbo.tempAcmeCoupontCustomers_1plusSessions t on t.fkCustomerID = ch.CustomerID ) SELECT sum(sales) as sales, basketID, k.CustomerID FROM [DataWarehouse].[dbo].[FactDailySales] fds join customerKeys k on fds.customerKey = k.customerKey where DateKey between 20120619 and 20120715 Group by basketID, k.CustomerID

3hr 21 minutes

Page 38: Why Atrion

Indices and the super warehouse

Page 39: Why Atrion
Page 40: Why Atrion

;with customerKeys as ( select customerKey, customerID from [DataWarehouse].[dbo].[CustomerHelper_MWG] chjoin AcmePath.dbo.tempAcmeCoupontCustomers_1plusSessions t on t.fkCustomerID = ch.CustomerID )

SELECT sum(sales) as sales, basketID, k.CustomerID FROM [SuperWarehouse].[dbo].[FactDailySales] fds join customerKeys k on fds.customerKey = k.customerKey where DateKey between 20120619 and 20120715 Group by basketID, k.CustomerID

Page 41: Why Atrion
Page 42: Why Atrion

Metrics avoid insanity

Page 43: Why Atrion

Monitoring performance issues

using Reporting Services

Page 44: Why Atrion
Page 45: Why Atrion
Page 46: Why Atrion

Green text box

Page 47: Why Atrion

Red text box

Page 48: Why Atrion
Page 49: Why Atrion

Queries with aggregations

Page 50: Why Atrion

SELECT fc.CustomerID, fc.OrderID, fc.DateKey, pr.Level1CategoryID, pr.Level2CategoryID, pr.Level3CategoryID, pr.Level4CategoryID, SUM(fc.Dollars) as Dollars, SUM(fc.Units) as Units, SUM(fc.TotalWeight) as [Weight], SUM (fc.Units + (case fc.TotalWeight when 0 then 0 else 1 end)) as TotalUnits INTO rpt.Acme_OrderFROM dwh.SalesOrderDetail fcINNER JOIN dwh.DimProduct pr on fc.ProductKey = pr.ProductKeyGROUP BY fc.CustomerID, fc.OrderID, fc.DateKey, pr.Level1CategoryID, pr.Level2CategoryID, pr.Level3CategoryID , pr.Level4CategoryID

Page 51: Why Atrion

3:41:00 to complete

Page 52: Why Atrion

SSIS is an answer

Page 53: Why Atrion
Page 54: Why Atrion

DMV’s as a monitoring tool

Page 55: Why Atrion
Page 56: Why Atrion

Demo 2

Page 57: Why Atrion

DW table structure similar to reporting patterns.

Data must be cleansed and complete across reporting areas.

The take away’s

DALs may be a solution to your problem.

Page 58: Why Atrion

Reporting Services a great tool to ‘show’ problematic areas.

Finally, revisit your over all architecture regularly.

The take away’s

DMV’s.

Page 59: Why Atrion

Which at the end of the day

Resulting in a better understanding of the

Page 60: Why Atrion

Steve Simonhttp://www.infogoldusa.com

Challenges to designing financial warehouses.

PASS Data Architecture Virtual Chapter

Why Atrion