Decision support System

3
Surrogate keys Surrogate keys join the dimension tables to the fact table. Surrogate keys serve as an important means of identifying each instance or entity inside of a dimension table. Reasons for using surrogate keys Data tables in various OLTP source systems may use different keys for the same entity It may also be possible that a single key is being used by different instances of the same entity. This means that different customers might be represented using the same key across different OLTP systems. This can be a major problem when trying to consolidate information from various source systems. Or for companies trying to create/modify data warehouses after mergers and acquisitions. Existing systems that provide historical data might have used a different numbering system than a current OLTP system. Moreover, systems developed independently may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each department independently reports summary data, but can when trying to achieve an enterprise-wide view of the data. This means that we cannot rely on using the natural primary keys of the source system as dimension primary keys because there is no guarantee that the natural primary keys will be unique for each instance. A surrogate key uniquely identifies each entity in the dimension table, regardless of its natural source key. This is primarily because a surrogate key generates a simple integer value for every new entity. Surrogate keys provide the means to maintain data warehouse information when dimensions change Surrogate keys are necessary to handle changes in dimension table attributes.

description

Intro

Transcript of Decision support System

Page 1: Decision support System

Surrogate keysSurrogate keys join the dimension tables to the fact table. Surrogate keys serve as an important means of identifying each instance or entity inside of a dimension table.Reasons for using surrogate keysData tables in various OLTP source systems may use different keys for the same entityIt may also be possible that a single key is being used by different instances of the same entity. This means that different customers might be represented using the same key across different OLTP systems.This can be a major problem when trying to consolidate information from various source systems. Or for companies trying to create/modify data warehouses after mergers and acquisitions. Existing systems that provide historical data might have used a different numbering system than a current OLTP system. Moreover, systems developed independently may not use the same keys, or they may use keys that conflict with data in the systems of other divisions. This situation may not cause problems when each department independently reports summary data, but can when trying to achieve an enterprise-wide view of the data.This means that we cannot rely on using the natural primary keys of the source system as dimension primary keys because there is no guarantee that the natural primary keys will be unique for each instance. A surrogate key uniquely identifies each entity in the dimension table, regardless of its natural source key. This is primarily because a surrogate key generates a simple integer value for every new entity.Surrogate keys provide the means to maintain data warehouse information when dimensions changeSurrogate keys are necessary to handle changes in dimension table attributes.Natural OLTP system keys may change or be reused in the source data systemsSome systems have reuse keys belonging to obsolete data or for data that has been purged. However, the key may still be in use in historical data in the data warehouse, and the same key cannot be used to identify different entities.The design, implementation, and administration of surrogate keys is the responsibility of the data warehouse team. Surrogate keys are maintained in the data preparation area during the data transformation process.Improve performance of queriesThe narrow integer surrogate keys mean a thinner fact table.

Page 2: Decision support System

The thinner the fact table, the better the performance.Handle exception casesIf requirements need to be determined or do not apply, use a surrogate key.Changes or realignment of measures should be carried in a separate column in the tableIf data must be reviewed or summarized, regardless of the number of times it appears in a table, you should use that column as part of the surrogate key.Avoid globally unique identifiers as surrogate keysGlobally unique identifiers are known to work well in the source OLTP systems, but they are difficult to use when it comes to data warehouses. This is primarily because of two reasons:• Globally unique identifiers use a significant amount of space

compared to their integer counterparts. Globally unique identifiers take about 16 bytes each, where an integer takes about 4 bytes.

Indexes on globally unique identifiers columns are relatively slower than indexes on integer keys because globally unique identifiers are four times larger.

What is surrogate key? Explain it with an example.AnswerData warehouses commonly use a surrogate key to uniquely identify an entity. A surrogate is not generated by the user but by the system. A primary difference between a primary key and surrogate key in few databases is that PK uniquely identifies a record while a SK uniquely identifies an entity.E.g. an employee may be recruited before the year 2000 while another employee with the same name may be recruited after the year 2000. Here, the primary key will uniquely identify the record while the surrogate key will be generated by the system (say a serial number) since the SK is NOT derived from the data.