Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I...

32
Database Design (for IQ-M)

Transcript of Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I...

Page 1: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Database Design (for IQ-M)

Page 2: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Introduction

• This section has been re-vamped for the 12.4.3 course• I have removed all the design bits that are not

absolutely for IQ• This is to allow me to add more information directly

related to IQ• So this is not a Data Warehouse Design or Methodology

course, I have colleagues that do it much better then me!

Page 3: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Aggregation

• This was a dirty word in IQ-M

• Not anymore

• It is not sensible to trawl through 10 billion rows of data, when a simple aggregation table can reduce the queried data to 10 thousand rows – or less

• Even with IQ-M we can improve performance by as much as 100 times using aggregation correctly

Page 4: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

A Statement

Just because ASIQ-M is fast does not give us the right to generate slow

databases or applications

Page 5: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Primary Key/Foreign Key

• IQ-M does not enforce Primary Key/Foreign Key relationships – but it will do soon (12.5 Q2 2002)

• The optimiser does use the PK/FK relationship for query planning

• Only specify this relationship if the relationship does exist

• Incorrect specification can result in query plan errors (performance degradation) and possibly errors

• ASA does modify a join that is defined as PK/FK to an ANSI NATURAL join – this can cause problems with orphan rows

Page 6: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Key Specification

• In a Data Warehouse the production key is not, generally, used as the warehouse key

• It is more acceptable practice to use a generated key

• Make this key an Unsigned INT or BIGINT• This is the absolutely most efficient key datatype in IQ-

M

Page 7: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Primary Keys

• In IQ-M a Primary Key is an ANSI standard Primary Key– It is UNIQUE– It must not be null

• If specified as a table or column constraint then a specialised form of the HG index is created

Page 8: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Foreign Keys

• Always generate an HG index on a Foreign Key

• If the relationship is 1:1 then generate the Foreign Key column as a UNIQUE– This will force auto generation of a unique HG index

• Again try to specify join columns as Unsigned INT or BIGINT

Page 9: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

A digression on Datatypes

• There are some very important issues concerning datatypes

• We have discussed the actions of the indexes – there are areas where an index can be forced to run slowly if the datatype is specified wrongly

• Always consider the requirements for the datatype• In correct datatype specification is as bad as incorrect

index selection

Page 10: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Signed vs. Unsigned

• If you don’t need signed data in an int or bigint – use UNSIGNED

• This will speed up the accessing of the HNG index sometimes doubling the performance– HNG stores negative data as 1s complement

This means SUM() AVG() etc. run quickly But range checks require another set of scans

– If we stored as 2s complement then Range checks would run with 1 scan But SUM() AVG() would be slower!!

Page 11: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Other Datatype Issues

• Signed vs. Unsigned does not affect the other indexes to any great degree

• But…

• The selection of datatypes does

• We have already discussed keys but some other areas are worth commenting on…

Page 12: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Long Varchar() - 1

• A long varchar() is defined as a varchar() with a length greater than 255.

• If you can avoid this please try to– Only FP and word index index is allowed– No enumerated indexes or HNG

• We have seen a number of customers who use varchar(1024) as Primary Keys– please DO NOT DO THIS!!

Page 13: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Long Varchar() - 2

• Long varchar() are stored as 256 byte chunks, so using 4 bytes in a varchar(32000) only uses 256 bytes

• By default these 256 byte chunks are memset (set to zeros to improve compression)

• There is an upgrade option to memset existing 12.4.0 varchar() – this is worth doing, if you have the time!

Page 14: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Char() vs. Varchar()

• Always, if you can use char()

• Generally this will improve performance, at the modest cost of storing some small number of extra bytes

• Query performance on retrieval of char() vs. Varchar() indicates that there can be a 2-3% performance hit per column, and we have seen 10% degradation on single columns

Page 15: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Memset - 1

Initialize_Memory_To_Ones_On_Allocation

• Some slides ago I mentioned memset

• This is a function of IQ that allows varchar() – and other objects – to have their unused portions set to one on initialization

• This will improve the performance of the compression algorithms (usually)

• It may possible slow down loads – but I have not found this to be the case

Page 16: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Memset - 2

• To memset existing varchar() when upgrading to 12.4.2 set the following optionConvert_Varchar_To_1242

• To bring the HG indexes to 12.4.2 standard when upgrading a pre 12.4.2 database set the optionConvert_HG_To_1242

• In the 12.4.3 upgrade, both these options happen by default

Page 17: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Float, Real and Double

• Unless you really need them – please do not use– FLOAT– REAL– DOUBLE

• They can only have Flat FP indexes – no others

• The do not store “exact” values – only approximate

• Please try to use– NUMERIC– DECIMAL

Page 18: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

NUMERIC and DECIMAL

• Numeric and Decimal are aliases of each other

• Any numeric or decimal with a precision of less than 12 will be stored as an INT (with conversions)

• Any numeric or decimal with a precision of between 12 and 20 will be stored as a BIGINT (with conversions)

Page 19: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Join Columns

• You must generate the database schema with the table join columns having the same datatype.

• INT, UINT and BIGINT are best, but the column datatypes for each join must be the same

• Conversion cost is horrendous

Page 20: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Case and Collation Sequences

• In terms of RAW performance the fastest IQ database is one where CASE is set to RESPECT and the collation sequence is BINARY (ISO_bineng)

• This is probably not suitable for the general application of the database or warehouse server

• CASE set to IGNORE is the next fastest, then changes in the collation sequence

• The performance hits can be quite high (around 10-20% - we think!)

Page 21: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

String Searches

• String Searches such as substr(1,3,col_name) are really very slow, they rely on FP searches

• With low cardinality (1 and 2 byte FP) data the search is faster, but this can still be a restriction

• Create a new column which is the first 3 characters of the col_name column, then search on this

• This way there is no function call, so no projection, so the optimiser can use a fast index LF or HG (or if it is a range query an HNG)

Page 22: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Telephone Numbers

• A classic example of the above is the telephone number+1-301-896-1733+1 -> Country Code301 -> Area Code896 -> Sub Area Code1733 -> Local Number

• Make this 4 columns (actually 5 - the whole number), then searches use fast indexes

Page 23: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Date time

• As with telephone numbers, try storing a data time as as series of columns (or a dimension table)

• Try creating columnsDD MM YY

HH MM SS

DoWeek DoYear Quarter

etc.

Page 24: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Date vs. Datetime

• A slightly better solution to the above can be considered in the light of the 1 and 2 byte FP indexes

• Try storing the date part of a datetime as a date and the time part as hh mm ss

• So:Datetime -> date_col, hh_col, mm_col, ss_col

Page 25: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Loading Dates

• There is NO default date or datetime format for loads into IQ

• The format must be explicitly set for the load/insert to get the best performance

• However some formats are conversion enhanced

Page 26: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Enhanced Conversion formats

• DD/MM/YYYY

• DD.MM.YYYY

• DD-MM-YYYY

• HH:NN:SS

• HHNNSS

• HH:NN:SS.S

• HH:NN:SS.SS

• HH:NN:SS.SSS

• HH:NN:SS.SSSS

• HH:NN:SS.SSSSS

• YYYY-MM-DD HH:NN:SS

• YYYYMMDD HHNNSS

• YYYY-MM-DD HH:NN:SS

• YYYY-MM-DD HH:NN:SS.S

• YYYY-MM-DD HH:NN:SS.SS

• YYYY-MM-DD HH:NN:SS.SSS

• YYYY-MM-DD HH:NN:SS.SSSS

• YYYY-MM-DD HH:NN:SS.SSSSS

Page 27: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Date Load

• So it is better to useCol1 DATE(‘YYYY-MM-DD’)

• thanCol1 ASCII(10)

• The performance enhancement can be as much as a 100 fold speed up in loads (for small tables)

Page 28: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

UNION

• In IQ-M 12.4.3 the UNION clause has very few disadvantages

• Generally UNIONs are all processed in parallel

• so if you have a low user count they work well

• Also the delete question now can be solved

• Do not use DISTINCT in the UNION clause, or in the SELECT statement

Page 29: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

UNION and Delete

• If you are storing a fixed (in time) amount of data e.g.. 6 months– Then every month you delete 1/6th of the data in the table

– This is expensive

• It is better to split the fact table into 6 x one month tables– At the end of the month you truncate the oldest table

– And possibly rename the table sets

– Remember for Multiplex table rename is DDL and hence can only be done in simplex mode!

Page 30: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

What is wrong Here ?

SELECT col1, col2

FROM table1, table2, table3

WHERE table1.col1 < table2.col1

AND table1.col1 > table3.col1

Page 31: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Cartesian Joins

• These are expensive – they involve the join of every row in one table to every row in a second table.– Table A 1,000,000 rows– Table B 100,000 rows– Worktable 100,000,000,000 rows

Select * from T, R where T.a = 10 Cartesian

Select * from T, R where T.a between R.b and T.b CartesianSelect * from T, R where ABS(T.a * R.b) = T.b Cartesian

• But

Select * from T, R where ABS(T.a * T.b) = R.bNot Cartesian

Page 32: Database Design (for IQ-M). Introduction This section has been re-vamped for the 12.4.3 course I have removed all the design bits that are not absolutely.

Database Design - End