Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG...

22
Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão [email protected]

Transcript of Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG...

Page 1: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Query DIFF Utility

Comparing the data

In 2 tables (or queries)

-- show the differences --

MDCFUG

December 10, 2002

Joan Falcã[email protected]

Page 2: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

“Why would you want to do that?”

Specifically – verify that data transferred from one system to another is identical before deleting it at the source.

Generally – have a quality-assurance tool to analyze data differences: Verify expected matches Report on expected differences Identify surprise differences and matches

Page 3: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Applications Find Differences in 2 similar tables

See how a table has changed over the week QA for data entry effort (before and after) Verify expected changes due to code mods

Compare the column names from MS-SQL and Oracle schemas Or compare length of text columns

Analyze whether duplicate keys are duplicate rows (heavy hand on the “submit” button?)

See the results of modifying a query Inner join vs outer join Null predicate

Page 4: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

More Applications: QA for Ported Systems or Regression Testing

dataconversion

routine

oldsystemupdate

newsystemupdate updated

porteddbms

dbms

dataconversion

routineupdated

dbms

porteddbms

portedupdated

dbms

Difference?

Page 5: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Reinventing the wheel?

http://devex.macromedia.com/developer/gallery/index.cfm Cf_Venn: Remember Venn diagrams from school? This

Custom Tag compares two lists, performing any or all of these operations: Union, Intersection, and Differences.

TDataSourceCompare: Compares all tables, columns, datatypes of 2 MS SQL database objects that are registered as datasources on your server. Used for checking your Production database and Development database to make sure your columns all match before a critical launch. DB1 First Datasource name. DB2 Second Datasource name.

CF_List_Compare is a custom tag that will compare the values of two given lists (List 1, and List 2)

Page 6: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Reinventing the wheel?

August 7, 2002SQL Script for Comparing the Contents of Two Tables

Here's some SQL code from Eli Leiba that will all you to compare two tables -- say, table A and table B -- to determine if their content is the same. Assuming that A and B have the same structure, here's how it works. First, from set theory, recall that:

If ((|A| = |B|) && |A U B| = |A|)) ====>>> A = B |A| = NUMBER of rows in A |B| = NUMBER of rows in B

http://www.databasejournal.com/features/mssql/article.php/1441271

Page 7: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Reinventing the wheel?

http://www.databasejournal.com/features/mssql/article.php/1441271(continued)

Here's the SQL code

(with T-SQL syntax):

declare @cnt1 int

declare @cnt2 int

declare @cnt3 int

declare @res bit

select @cnt1 = count(*) from A

select @cnt2 = count(*) from B

select @cnt3 = count('x')

from (select * from A

UNION

select * from B) as t

if (@cnt1 = @cnt2)

and (@cnt2 = @cnt3)

begin

set @res = 1

print 'A = B'

end

else

begin

set @res = 0

print 'A <> B'

end

go

Page 8: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Previous implementation – UNION/JOIN query in Access

Query is messy: query for each column name query for extra rows in each of the 2 tables extra predicates for nulls extra predicates to flag duplicate rows

(should table not have a primary key)

Access permits union across dissimilar columns (different data types)

Access DOES permit inner join and union across linked tables from different DBMS, but I have not tested this.

Page 9: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Previous implementation – UNIVAC’s Symbolic Stream Generator

Generates source code (to be compiled and executed) according to parameters

Old technology (no longer available) Required a lot of parameters

One spec per column (location and type)

Compared 2 files – very limited A query is much more flexible

Page 10: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

General ColdFusion Approach

Define 2 queries With the same column names

rename with AS, if needed With a common key (possibly multi-column) Sort on the common key

Merge queries on the common key Report extra keys Report data differences when keys match

Feedback for user Column names Notification of errors

Page 11: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Parameters in General-Purpose Differencing

In a FORM, prompt for: Datasource Table specification (query) Key specification Sort order Login parameters (username/password)

Page 12: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Demos

Page 13: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Challenges in General-Purpose Programming

Column name(s) not known for SELECT *

Number of columns not known Number of key columns not known Column name syntax in ORDER BY

clause differs across databases

Page 14: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

ColdFusion essential features

CFQUERY’s returned variable: queryname.ColumnList

ListGetAt(ISQLquery1.ColumnList,ii)

CFQUERY column indexing: queryname.columnname[index]

Evalute function:Evaluate(‘queryname.columnname’)

Evaluate("ISQLquery1."&ListGetAt(ISQLquery1,ii)&"["&q1&"]")

Page 15: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Unexpected problems False matches

SQL Case-sensitive vs CF case-insensitive ColdFusion null same as empty string

(unless query uses COALESCE to give a value to nulls)

False differences Extra rows on both sides (“ships passing in the night”) Trailing blanks (do not count in SQL, do in CF)

Sort sequences differ Across database products Across database configurations of same product ColdFusion has its own sort sequence

SQL query special characters hard to store need URLEncodedFormat

Page 16: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

MoreDemos

Page 17: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Workarounds to sort problems

Sort by UPPER(columnname) Replace known special characters with

something that sorts the same UPPER(REPLACE(columnname,’_’,’ ’))

ReSort in ColdFusion, via CF5 query-on-query

Page 18: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Additional Problems In CFFORM my own javascript

inadvertantly disables CFINPUT validation onSubmit="return SubmitDiffForm(_CF_this);“

Column names with special characters won’t “evaluate” (I warn and abort)

My javascript function to uppercase character sort columns was too confusing (now disabled)

Page 19: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Additional Implementation Notes <CFSETTING EnableCFOUTPUTONLY="Yes">

To limit extraneous white space (lots of it) Hidden radio button trick – form variable is always

present and has values "Y,N“ or “N” thisTag.generatedContent gives “playback” Empty username/password attributes mean “use

defaults” (instead of ODBC error) PreserveSingleQuotes needed in CFQUERY Sandbox security may limit DIFFing Written/tested in CF 4.5; enhanced with optional

CF5 features

Page 20: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Security concerns This utility must be protected;

Otherwise, it’s a hacker’s dream NOTE: I use a session variable to hold a login

Security can be disabled (in application.cfm, set BypassSecurity=“Yes”) OK, iff protection already exists

Security must be configured Use an existing username/password to login

Avoids password management Provide parameters in LoginParam.cfm

Database passwords for queries are omitted in stored DIFF specification

Page 21: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Feature Creep (extra funcionality) Javascript function cascades merge key

to sort key input boxes (less data entry) Option to save entire DIFF specification

in a file Batch execution (can be scheduled in CF admin) Email the differences report Resort (CF5 query-on-query) to avoid sort

sequence variance

Page 22: Query DIFF Utility Comparing the data In 2 tables (or queries) -- show the differences -- MDCFUG December 10, 2002 Joan Falcão jmfalcao@starpower.net.

Future Enhancements?

Case-sensitive search Parameters to limit output Numeric tolerances in a match Encrypted passwords for datasources CfScript to speed it up