Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed...

Databases 1

Daniel POP

Week 12

Agenda

1. Introduction to NoSQL databases

2. Data representation in NoSQL

– XML

– JSON

3. Key-value database

4. (Wide) Column database

5. Document database

6. Graph database

Introduction to NoSQL databases

Relational databases advantages

1. Reliability

2. Safety

3. ACID: Atomicity, Consistency, Isolation,

Durability

4. Easy, standardized query language (SQL)

Relational databases limitations

1. Scalability

• usually costly scale-up approach

• problems with distributed databases (difficult to

join distributed tables)

2. Impedance mismatch

• different representations in memory and database

3. Complexity

• data as tables, schema-bound

4. Query language (SQL) only for structured data

5. Dealing with semi-/unstructured data

Unstructured data explosition

Semi-structured data examples

Un-structured data examples

What is Big Data?

From a technology perspective, Big Data is defined as

those data sets whose size, type, and speed-of-creation

make them impractical to process and analyse

with traditional database technologies and related tools in

a cost- or time-effective way.

Source: wikibon.org

4 Vs of BigData

• Volume: – 12TB of tweets / day => product sentiment analysis

• Velocity: – fraud detection

– predict customer churn faster (anlyse 500 million daily calls in real-time)

• Variety: – exploit 80% growth of un&semi-structured data for customer satisfaction

• Variability / Veracity:– variance in meaning (1 in 3 business leader don‘t trust the information they

use to make decisions)

Source: Brian Hopkins (Forrester)

Distributed systemsDistributed system is a “collection of independent computers that appear to

the users of the system as a single computer” (Tanenbaum)

Requirements of distributed applications

1. Consistency: all nodes ‘see’ the same data at the same

2. Availability: guarantee that every request receives a

response about whether it succeeded or failed

3. Partition tolerance: the system continues to operate

despite arbitrary message loss or failure of part of the

system

Brewer’s CAP Theorem

1. Consistency: all nodes ‘see’ the same data at the same

2. Availability: guarantee that every request receives a

response about whether it succeeded or failed

3. Partition tolerance: the system continues to operate

despite arbitrary message loss or failure of part of the

system

Brewer’s (CAP) Theorem: it is impossible for

a distributed computer system to

simultaneously provide all three requirements.

Consistency vs. Availability

More at https://www.youtube.com/watch?v=ASiU89Gl0F0

BASE instead of ACID

1. Basic Availability

2. Soft-state

3. Eventual consistency: informally guarantees that, if

no new updates are made to a given data item,

eventually all accesses to that item will return the last

updated value

NoSQL = Not Only SQL

• Not based on relational model, not using SQL

• No fixed schema, new attributes can be added to

data items at any time (schema-less)

• Designed for distributed, large clusters (scale-

out), including support for distributed processing

(MapReduce)

• Deliver eventual consistency

• Data-model specific query languages

NoSQL: classification and usage pattern

Complexity

Key-value DB

(Wide) Column DB

Document DB

Graph DB

Data model

Visual guide to NoSQL systems

Trends

Source: http://db-engines.com/en/ranking_trend

NoSQL brief history

• Carlo Strozzi (1998) – to name his lightweight, open-source,

relational database, but without SQL interface

• Johan Oskarsson and Eric Evans (2009) – meetup on distributed structured data storage; they used #nosql as Twitter hashtag

for this

• Google Bigtable (2006)

• Amazon DynamoDB (2007)

• ....

• Used on large scale at Amazon, Facebook, Google, Twitter etc.

Bibliography

Pramod J. Sadalage, Martin Fowler. NoSQL Distilled.

Addison Wesley, 2012

Watch M. Fowler @ NoSQL matters Conference in

Cologne, Germany 2013

http://martinfowler.com/nosql.html

1. Lorenzo Alberton. NoSQL Databases: why, what and when

2. Yousof Alsatom. Introduction to NoSQL

3. Wikipedia

4. Introduction to NoSQL on w3resource.com

5. http://www.nosql-database.org/ - list, by data model, of NoSQL databases

XML Data

Extensible Markup Language

• Standard for data representation and exchange

• Document format similar to HTML

– Tags describe content instead of formatting

• Also streaming format Nested Elements

•described by tag

•may have attributes

•text

Rules of well-formed XML:

• Single root element

• Matched tags and nesting

• Unique attributes within

elements

Working with XML documents

• XML Parser

– Returns an error message (if XML document is not well

formed) or a ‘hook’ to manipulate the document

– 3 types of parsers

• Document Object Model (DOM) based (e.g. Xerces, libxml2)

• Event base, such as Simple API for XML (SAX) (e.g. Expat, JAXP

• In the middle, Streaming API for XML (StAX) (e.g. Sun Java StAX

XML Processor, Woodstox)

• Displaying XML

– Convert XML document to HTML using CSS (Cascading Style

Sheets) or XSLT (Extensible Stylesheet Language

Transformations)

Validating XML documents

Content-specific specifications

- Document Type Descriptor (DTD)

- XML Schema Definition (XSD)

Tools: LibXML 2 (XML Lint)

Document Type Descriptor (DTD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Number of occurrences,

• ID and IDREF

Document Type Descriptor (DTD)* = 0 or more

| = or

? = optional

+ = 1 or more

CDATA = string

#PCDATA = text

Document Type Descriptor (DTD)* = 0 or more

| = or

? = optional

+ = 1 or more

CDATA = string

#PCDATA = text

ID = identifier

IDREF = refers to

one identifier

IDREFS = refers to

one or more

identifiers

Source: Jennifer Widom – Database Course, Stanford University

XML Schema Definition (XSD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Data types,

• Keys,

• References (typed pointers),

• … and many more

XML Schema Definition (XSD)

Can specify

• Elements,

• Attributes,

• Nesting,

• Ordering,

• Data types,

• Keys,

• References (typed pointers),

• … and many more

Typed values, occurrences constraints

• Typed values

<xsd:attribute name=“Price” type=“xsd:integer” use=“required”>

• Occurrences constraints

<xsd:element name="Book"

type="BookType”

minOccurs="0"

maxOccurs="unbounded" />

Remark: If minOccurs/maxOccurs is not specified, the default

value used is 1.

Keys and references

• Key declarations

<xsd:key name="BookKey">

<xsd:selector xpath="Book" />

<xsd:field xpath="@ISBN" />

</xsd:key>

• References (typed pointers)

<xsd:keyref name="BookKeyRef" refer="BookKey">

<xsd:selector xpath="Book/Remark/BookRef" />

<xsd:field xpath="@book" />

</xsd:keyref>

Relational vs. XML

Relational XML

Structure

Schema

Queries

Ordering

Implementation

Relational vs. XML

Relational XML

Structure Tables Hierarchical

(tree, graph)

Schema

Queries

Ordering

Implementation

Relational vs. XML

Relational XML

(tree, graph)

Schema Fixed Flexible

Queries

Ordering

Implementation

Relational vs. XML

Relational XML

(tree, graph)

Queries Simple (SQL) More cumbersome

(XPath / XQuery)

Ordering

Implementation

Relational vs. XML

Relational XML

(tree, graph)

(XPath / XQuery)

Ordering None Implied

Implementation

Relational vs. XML

Relational XML

(tree, graph)

(XPath / XQuery)

Ordering None Implied

Implementation Native 1. XML-enabled

(extensions to

RDBMS)

2. Native XML

Bibliography

NoSQL Distilled

by Pramod J.

Sadalage and

Martin Fowler,

Addison

Wesley, 2012

Chapter 1

A First Course in

Database Systems

(3rd edition) by

Jeffrey Ullman and

Jennifer Widom,

Prentice Hall, 2007

Chapter 11

Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed...

Documents

Transcript of Design Patterns: Case Study · • Not based on relational model, not using SQL • No fixed...

Transforming an ER Model into a Relational Schema

MDBS Schema Integration: The Relational Integration Model

Page 1 MDBS Schema Integration: The Relational Integration Model Ramon Lawrence MDBS Schema Integration: The Relational Integration Model Candidacy Exam.

UNIVERSITI PUTRA MALAYSIA TRANSLATING RELATIONAL CONCEPTUAL SCHEMA … · 2016-08-03 · UNIVERSITI PUTRA MALAYSIA TRANSLATING RELATIONAL CONCEPTUAL SCHEMA TO OBJECT-ORIENTED SCHEMA

XML-to-Relational Schema Mapping Algorithm ODTDMap

Outline: Relational Data Model Relational Data Model -relation schema, relations

Translating ER Schema to Relational Model

3. Schema Design - Freie Universität · 3. Schema Design: Logical Design using the Relational Data Model 3.1 Logical Schema Design 3.1.1 The Relational Data Model in a nutshell 3.1.2

CS34311 Translating ER Schema to Relational Model.

Database Relational Model. The Relational Data Model Data Modeling Data Modeling Relational Schema Relational Schema Physical storage Physical storage.

A geospatial relational database schema for interdependent ... · A geospatial relational database schema for interdependent network analysis and modelling . David Alderson. 1, Stuart

Evaluation of Relational Operations Relational Operations Schema ...

Chapter 3 Design Theory for Relational Databasesli-fang/chapter3-functionalDependency...Relational Schema Design Goal of relational schema design is to avoid anomalies and redundancy.

ER DIAGRAM TO RELATIONAL SCHEMA MAPPING

From Big data to Data Lakes 2015 Big Data To Date Lakes.pdfCopyright Mark Whitehorn “Not waving but drowning” Stevie Smith Atomic Relational Schema Schema first Early binding schema

Converting ER to Relational Schema

Relational Data Model Winter 2007Ron McFadyen ACS-39021 Outline Relational Data Model -relation schema, relations -database schema, database state -integrity.

6 Relational Schema Design

Relational Data Model Sept. 2014Yangjun Chen ACS-39021 Outline: Relational Data Model Relational Data Model -relation schema, relations -database schema,

A geospatial relational database schema for …leeds.gisruk.org/abstracts/GISRUK2015_submission_25.pdfA geospatial relational database schema for interdependent network analysis and