D Maeda Bi Portfolio
description
Transcript of D Maeda Bi Portfolio
In the Beginning …
• “Put all your eggs in one basket, and … watch the basket.”
Mark Twain
• “Data is only valuable if it can be accessed in a timely fashion.”
An IMS/DC Axiom
Table of Contents
• An Introduction
• A Problem Sampler
– Diagnostician at Play
– A Little Dirty Data
– A SQL Query
• SSIS and ETL Options
– SSIS and Data Management
• BIDS, SSAS, and MDX
– New Tools, Growing Arsenal
• At Your Service …
David Maeda: An Introduction
• Completing an intense 10 week course on Microsoft Business Intelligence technologies, i.e. SQL Server, T-SQL, SSIS, SSAS, SSRS, and Visual Studio interfaces.
• Broad background in IT including expertise in database and transaction management systems.
• Experience includes leadership and project management positions.
• An accomplished diagnostician and software engineer.
Diagnostician At Play
• Earlier this year, I got a good deal on a nice fly reel intended for 9 and 10 weight lines. While using the reel for striped bass on the Roanoke River several weeks later, I noticed that the drag did not tightened down to a point where it was effectively useful.
• An exchange of emails with the US distributor got me a new one way clutch bearing but it did not fix the issue.
• Examining the parts diagram for the reel, I decided to add a 7 cent wave lock washer to the drag assembly. Tested reel on the Roanoke. Problem resolved.
• Notified the distributor. After an evaluation, the fix was adopted by the manufacturer several days later.
A Little Dirty Data Problem
• In dealing with a national organization, membership information was found to have the following issues:– 30% to 60% of the email address were bad
– 10% of the regular mail addresses were bad
– Inconsistent data formats in downloaded CSV files
– Multiple entries per member
• The Problem: How to work around the “questionable” data and maintain effective membership communications with the following criteria:– Minimize expenses
– On average, needs less than 4 hours per week to manage
A Little Dirty Data Problem
• The Solution:
o Design a database to allow
downloads to update
existing data without
affecting “local” data.
o The Members table is what
gets downloaded.
o The MemberExtension
table is the repository for
“local” data.
o Manage both tables via a
web based user interface
(UI).
o UI is implemented with
PHP and JavaScript.
o Automate as much as
possible.
A Little Dirty Data Problem
• Implementation:– A Nasty Surprise: CSV Data as downloaded would not import cleanly
into MySQL. This was due to MySQL load data infile processing requiring certain characters to be escaped.
• A short Java script was written to transform the downloaded CSV file into the necessary format prior to importing it into MySQL.
– Any downloaded data is considered “questionable”.
• MySQL load data infile processing overlays existing records.
• Restrict downloaded updates to only affect the Members table.
– The Members and MemberExtension tables are synchronized as part of the update process invoked from the UI.
• Every Members entry has a corresponding MemberExtension entry.
• A new MemberExtension will be created if necessary and initialized with date and email info if present.
• Existing MemberExtension entries are not touched.
A Little Dirty Data Problem
o A Utilitarian UI• Apache
• HTML Frames
• AJAX
• PHP
A Little Dirty Data Problem
• In Summary:– We were able to circumvent most of the dirty data issues by isolating
the “questionable” data.
– The MySQL RDBMS supports ad hoc SQL queries should the necessity to alter tables, etc arise.
– Expenses were minimized by:
• Using freely available components, i.e. Java, Apache 2.2, PHP 5, MySQL 5.2, and JavaScript.
• Using volunteer labor to write the ETL code.
– A download and update sequence takes less than 10 minutes.
– A typical request to update the email distribution takes less than 5 minutes.
– Managing the database and generating the necessary distribution lists via the provided UI takes typically less than 4 hours per week.
A SQL Query
• On a recent phone interview, I was asked:
– How would you construct an SQL query to find the second highest sales total?
• My answer was:
– Use a pair of nested queries. The inner query would ascertain the top 2 totals. The outer query would return the lower of the two totals.
• In T-SQL this looks something like (It may look somewhat different in other SQL dialects):
select top 1 orderid, (unitprice * quantity) as 'totalsale'
from [order details] where (unitprice * quantity) in
(
select top 2 (unitprice * quantity) as 'ordertotal'
from [order details]
group by (unitprice * quantity)
order by ordertotal desc
)
order by totalsale asc
ETL Options and SSIS
o All CSV files are not
created equal. Neither are the
ETL tools used to prepare
and load them into a
database. Compare:
o To the left is a more
traditional approach (as used
for the Dirty Data problem).
o To the right is an approach
utilizing Microsoft’s SSIS
facility.
o SSIS has Data Management
applications beyond ETL.
package appCSV;
import java.io.*;
import java.util.StringTokenizer;
/**
* @author Dave Maeda
*
* Class to convert csv field form
*
* Invoke as: java appCSV.Convert
*
* Where: filename is the name of
* ext is the file extension.
*
* Output: A file named <filename>.
* Note: ext will default to "csv" if
*/
public class Convert
{
private static void usage()
{
System.out.println("\n");
System.out.println(" >> Usage:
Data Management 101: DID
• Three basic principles:
– Disclosure
• Viewing of data– Who’s viewing your data and are they authorized to do so?
– Integrity
• Accuracy and currency of data– Data is only meaningful if it is accurate and up to date.
– Durability
• Data loss prevention– More data is lost to accidents than malicious actions.
BIDS, SSAS, and MDX
o Business Intelligence Design Studio (BIDS)
• Ships as part on MS SQL Server
o SQL Server Analysis Server (SSAS)
• OLAP store and engine
• Builds multi-dimensional cubes
o Multi-Dimensional eXpressions (MDX)
• Used to retrieve cube data
• Used in SSAS Calculations and KPIs
SSRS
o Web Enabled
• Report Management
• Distribution
o Charts
• Conditional Fonts
• Calculated Members
• Multiple Charting Options
• Custom Colors
o Tables
• Multiple Formatting Options
• Data
• Calculated Members
• Conditional Fonts
MOSS, PPS, Dashboards, and KPIs
o MOSS
• SharePoint Server
o PPS
• PerformancePoint Server
o Dashboard
• Scorecard
o KPIs
• Parameters
• Values
• Goals and Status
• Trends (not shown)
Excel Services
o Excel Local Client
• Parameters
• Pivot Table
• Associated Chart
o Excel Services
• MOSS
• PPS Dashboard
• PPS Report
Parameters
Chart
New Tools, Growing Arsenal
• Latest additions: BIDS, SSIS, SSAS, SSRS, and MDX
• Arsenal already includes:
– OS platforms: z/OS, Windows, Unix (AIX and Sun), and Linux (Red Hat and SUSE)
– Databases: IMS, DB2, Oracle, MySQL, and SQL Server
– Languages: Assembler (IBM and Intel), C/C++, Java, JavaScript, PHP, Smalltalk, SQL, and REXX.
– Core competencies: Leadership, process improvement, team facilitation, interpersonal communications, client relations, and project management.
At Your Service …
• David Maeda
– Software Engineer
• Business Intelligence Analyst
• Diagnostician/Programmer
– Hard working and Persevering
• Personal Integrity and High Standards
– Team Leader and Team Player
• “Your prime directive as a leader is to position your team for success.”
The End