Data modeling fundamentals

Post on 22-May-2015

644 views 1 download

Tags:

Transcript of Data modeling fundamentals

Data Modeling Fundamentals

Version 1.1Cristi Salcescu

Subjects

• Relational Modeling• Dimensional Modeling• Object Modeling

What is data modeling?

• Apply structure• Organize

Relational Modeling

• Tables– Columns and– Rows

• Keys– Primary Key– Foreign key (Referential Integrity)– Surrogate Key– Composite Key• is a key that contains more than one column

Types of Relations

• One-to-Many• Many-to-One• Many-to-Many• One-to-One• Recursive

One-to-Many

PersonsId

LastName

FirstName

PoliciesId

Serial

Number

IssuedDate

BeginDate

EndDate

IdPerson

IdPolicyType

IdUser

Many-to-Many

One-to-One

PolciesHouseholdId

IdAddress

Age

Surface

RoomsNo

PoliciesId

Serial

Number

IssuedDate

BeginDate

EndDate

IdPerson

IdPolicyType

IdUser

PoliciesMotorId

ConstructionYear

CylCap

ChassisNo

PlateNo

Many-to-One

Self-Referencing

_CategoriesIdCategory

Name

IdParent

Normalization

• creates granularity• remove duplication• is a set of cumulative rules (Normal) Forms :

1st, 2nd, 3rd Normal Form• good for saving space, but I/O costs are cheap• bad for performance : Joins

1st Normal Form

• creates Many-to-One relation• removes duplication that occurs horizontally

2nd Normal Form

• Creates One-to-Many relation• removes duplication that occurs vertically

3rd Normal Form

• Creates Many-to-Many relation

4th Normal Form

• Creates a One-to-One relation• Separates NULL values

Insurance Policies - Car, Home and Life

OLTP vs OLAP

• OLTP : On-line Transaction Processing• OLAP : On-line Analytical Processing

Why Relational Model fails for Reporting?• too granular

• high concurrency (lots of users sharing small pieces at the same time)• too many tables : Joins are too big, SQL code too slow

OLTP

– recent data– daily basis– hundreds millions of users– high concurrency– designed for working with a single record/entity at

a time– highly “normalized”– getting data for a report involves many joins

OLAP

– huge amout of (historical) data– high speed to access huge amount of data– access many tables– low concurency : few users (top executives)– number of tables are reduced, reducing number of

joins– Data is de-normalized

Dimensional Modeling• Data Warehouse

– A gigantic storehouse of data– All data– Provides a long term storage of data– Aggregation of data from multiple systems – Reduce the load on the production system

• Facts– Transactional information– Hold numeric measures

• Dimensions– Hold the values that describe facts– Static information, or Slowly changing– Answer questions like : who, what, when, where?– Look up values

Fact table example

Denormalization

• removing Normal Forms• removes granularity• uses lots of space : I/O costs • good for performance• reduces the number of Joins• good for large database

3rd Normal Form

Denormalized

Relational Model

Denormalize facts tables

Snowflake Schema

Star Schema

Object Modeling

• a layer of objects that model the business area you're working in

UML

Unified Modeling LanguageThe most basic of UML diagrams is the Class Diagram. It describes classes and shows the relationships among them.

Types of Relations

• Inheritance• Association• Aggregation• Composition

Inheritance

class Relations

A

B

InheritanceA generalizes BB derives from A

Association

AssociationA uses B

Class fieldMethode parameterMethode Return TypeLocal variable

class Relations

A B

Aggregation

AggregationShared Association

A aggregates BB is part of A

class Relations

A B

class Relations

Airport Aircraft

Composition

CompositionNot-Shared Association

A is composed of B

class Relations

A B

class Relations

Person Le g

Domain Layer

Domain Layer– Introduced by Eric Evans, in his book “Domain Driven

Design – Tackling Complexity in the Heart of Software” @2003

– Entities• An object that is not defined by its attributes, but

rather by its identity– Value Objects

• An object that contains attributes but has no conceptual identity

Insurance – Relational Model

PersonsId

LastName

FirstName

PolciesHouseholdId

IdAddress

Age

Surface

RoomsNo

PoliciesId

Serial

Number

IssuedDate

BeginDate

EndDate

IdPerson

IdPolicyType

IdUser

PoliciesMotorId

ConstructionYear

CylCap

ChassisNo

PlateNo

Insurance – Object Model

Data Flow between the 3 Modelspkg Models

Domain M odel

Relational Model Dimens ional Model

Tables

Fac ts

Dimensions

Enti ties

ValueObjects

«flow»

«flow» «flow»

ORM/ ETL

• ORM (Object-relational mapping) http://www.agiledata.org/essays/mappingObjects.html

• ETL (Extract, transform and load)

Summary

• Relational Modeling– Tables (columns, rows)– Types of Relations– Normal Forms

• Dimensional Modeling– Facts and Dimensions– De-Normalization

• Object Modeling– Entities and Values Objects– Inheritance, Aggregation, Association

Resources

• VTC – Data Modeling• Pluralsight - Introduction to Data Warehousing