CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot...

31
September 23, 2015 Sam Siewert CS317 File and Database Systems Lecture 5, Part-2 – ORDBMS http://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes

Transcript of CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot...

Page 1: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

September 23, 2015 Sam Siewert

CS317 File and Database Systems

Lecture 5, Part-2 – ORDBMS http://www.ibmbigdatahub.com/video/ibm-big-data-minute-drowning-petabytes

Page 2: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

SQL Theory and Standards

DBMS Design (Connolly-Begg Chapter 10)

Part-2 Development Lifecycle

Sam Siewert

2

Page 3: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

For Discussion… Big Data – Velocity, volume, variety, veracity [2014] 1. Daily – 2.5 quintillion bytes (2,500,000,000,000,000,000) or 2 Exabytes, or

46,566,128 50GB Blu-Ray Discs, IBM Estimate

2. Annually – 7.5 billion in global population, produce/consume 2.25 unique Blu-Rays per Year, or 23 DVDs (assuming even distribution – unlikely)

3. Annually – If produced/consumed by US population alone – 53 Blu-Rays per Year or 564 DVDs per person

4. Data in Total is 40 trillion gigabytes or 800 billion Blu-Rays for just over 100 (unique) Blu-Rays per person globally

5. Data by Powers of 10 and 2 – 264 is 16 Exabytes of Addressable Data [PC limit]

6. Data Max Veolicity is 100 Gbps is Fastest Ethernet [8b/10b – 10 billion bytes per second]

7. How much is Truly Unique Data vs. Duplicated

8. What is the Quality (Veracity) of this Data?

Sam Siewert 3

Page 4: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Big Data Volume and Velocity Can Be Estimated as Shown – Disk drives shipped and in use – Online data only, or removable and archive media as well? – Bit-rot (media eventually fails, limited storage lifetime)

Variety, Depends on Level of Data Duplication – Enterprise Storage System Deduplication – E.g. EMC Deduplication – Internet Archive [petabytes] and Wayback machine,

http://www.loc.gov/about/general-information/ [traditional volumes], Stanford Digital Repository, National Archives, National A/V Conservation

Veracity, perhaps Most Challenging Part – Is the Data Correct – Not Corrupted – Is it Valid – From a Known, Trusted Source, Corresponding to

Metadata Description – Has the Data Been Processed and if so, How? – Is it Raw Data (from a sensor, user, other)? – Veracity is difficult – E.g. http://berkeleyearth.org/about-data-set

Sam Siewert 4

Page 5: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quiz #2

Let’s Go Over it …

Sam Siewert

5

Page 6: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quiz #2 Average was 68.3, Std. Deviation was 17.5 - Primarily Need to Study Book More Quiz #1 – 81.5, 8.5 (Ideal) – Mostly from In-Class Notes Let’s Go Over Solutions Now with Book Citations Solutions Provide References Back to the Book – Posted on Canvas as Well

Sam Siewert 6

Page 7: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quiz #2 - Review

Sam Siewert 7

Equi-join is a specific type of Theta-Join where the Predicate tests for EQUIVALENCE ONLY

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 8: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quiz #2 - Review

Sam Siewert 8

See p. 119, 132, 1) Selection [Restriction], 2) Projection [Projection], 3) Union [Join – Specific Union], 4) Set Difference [Codd Omits], 5) Cartesian Product [Permutation]

Encouraged! See Class Notes and Example of TC,RA, and Use of DISTINCT

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 9: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Required [Except Intersection]

Pearson Education © 2014 9

intersection can be composed as R – (R – S)

Page 10: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Nice to Have! - Relational Algebra Operations – Composed from Required

Pearson Education © 2014 10

Page 11: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quiz #2 - Review

Sam Siewert 11

Review BOOK citations for Correct Answer Carefully before Next Quiz and Exam

Page 12: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

PK, FK EQUIVALENCE Book Says that EQUIVALENCE for Equi-Join is Predicate that Uses “=“ – p. 126 (bottom) This is Simplistic, especially for Multi-table Joins and PKs formed from more than One Attribute E.g. if(X == Y) Can in Fact Involve a Complex Comparison – E.g. if X is a vector = [1, 1, 3] and Y is a vector, then

EQUIVALENCE requires Comparison of Each Component – If((X[0] == Y[0]) && (X[1] == Y[1]) && (X[2] == Y[2]))

Likewise, Consider Simple Tuples of FirstName, LastName, DoB [PK=FirstName, LastName] Another Relation [FK=FirstName, LastName] with Street Address, City, Zipcode Sam Siewert 12

Page 13: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Join Cheat Sheet http://www.codeproject.com/KB/database/Visual_SQL_Joins/Visual_SQL_JOINS_orig.jpg

Sam Siewert 13

Page 14: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

JOINS You Must Know MySQL Join Support – Inner, Cross, Left, Right, Outer, Natural, Multi-table with Predicates (Theta and Equi-Join) Cross-Join [p. 171, Matches Theory p. 126] Theta-Join [p. 170 – 3 Table Join] Equi-Join [p. 168-169] Natural-Join (Rarely Used, but Matches Theory on p. 127) Inner-Join (Not in Book! But, Common in MySQL) Alternative Form – Nested Queries [p. 164] Other Joins You are Not Responsible For (Less Useful)

Sam Siewert 14

Page 15: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Connolly-Begg Chapter 9

ORDBMS Extensions to SQL (SQL:2011)

Part -2

Sam Siewert

15

Page 16: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Unstructured Data BLOBs - Binary Large Objects – Images – Digital Video and Audio – Digital Media – Binary Data (Documents and Code), Perhaps Proprietary – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Moose-to-Skeleton.png – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/Sled-Dogs.jpg – http://mercury.pr.erau.edu/~siewerts/extra/images/example-

images/korean-air-profile.jpg

CLOBs – Character Large Objects – Log files and Traces (IT) – Transaction Logs – XML, HTML, XDS, etc. [Web documents typically via HTTP,

HTTPS]

Sam Siewert 16

Page 17: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

OO Concepts – “Real World” OOA – Object Oriented Analysis – Define Class Hierarchies (Abstract Classes with Attributes) and

Interfaces (Public, Private) and Methods (Operations) – Inheritance and Multiple Inheritance

OOD – OO Design – Encapsulation of Methods with Data (Attributes) for Abstract and

Derived Classes – Instantiation and Use of Objects [Use Cases]

OOP – Object Oriented Programming (Java, C++, …) – Programming Language – Direct Implementation of OOD – Implementation of Re-useable OO Code Libraries

Boost - http://www.boost.org/ OpenCV [C++ version] Many More … in other OOPLs

Sam Siewert 17

Page 18: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Classes Useful in Real World E.g. Biology – Kingdom, Phylum, Class, Order, Genus, Species [Multiple Inheritance Examples], Proven Use Parts – Components compose Sub-system(s) compose System(s) compose System of Systems Supports Re-Use of Objects Instantiated from Class Hierarchy Multiple Inheritance – Odd? Can be Abstract, Derived and Concrete

– E.g. Mathematical, Data Structures, Image Processing

– Organization of Information (Classes in Ontological Web Language)

– Simulation of Physical Systems – Most Often Software Libraries

Sam Siewert 18

http://en.wikipedia.org/wiki/Platypus#mediaviewer/File:Wild_Platypus_4.jpg

https://www.youtube.com/watch?v=kDay5OWDPn4#t=26

Page 19: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Quick Review of OO [not just C++] Encapsulation of Data and Methods in an Instantiated Object Objects are Instances from a Class Hierarchy

– Classes Define Encapsulated Data and Methods Virtual Functions can Be Refined Pure Virtual Functions in Abstract Classes Defined must be Refined

– Can Inherit Data and Methods from Parent Classes – Can In Fact Have Multiple Inheritance – Instantiated Objects Call Dynamically Bound Methods [Determined at Runtime]

Enables Semantic Overload [Can be Done without OO too]

– Overloaded Functions (Methods), Resolved by Type Signatures or Subtype/Sub-class

– Overloaded Operators (E.g. math operators work not only on integers and real numbers, but also vectors, matrices, and complex numbers)

– Derived Data Types from Base types

Polymorphism – Parametric – Re-useable Templates (E.g. Ada and Java Generic, C++ Template) – Functional Semantic Overloading – Dynamic or Subtype or Subclass Polymorphism using Late Binding

OOPs – Smalltalk to more current Java, C++, Ada95, … CLOS Sam Siewert 19

Page 20: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Operator and Function Overloading What is Required to Be OO? Common Consensus is – Encapsulation, Class Hierarchy, Polymorphism (Parametric & Subtype or Subclass with Late Binding), Inheritance Operator Overloading Not Required (E.g. Java Frowns Upon, No Support) Some PLs have OO Features, but not All Sam Siewert 20 http://en.wikipedia.org/wiki/Operator_overloading

Page 21: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Storing Objects in Relational Databases

One approach to achieving persistence with an OOPL is to use an RDBMS as the underlying storage engine. – O2 – merged with Informix and acquired by IBM – ObjectStore - http://www.objectstore.com/ – Objectivity - http://www.objectivity.com/products/objectivitydb – Versant - http://www.actian.com/products/operational-databases/

Requires mapping class instances (i.e. objects) to one or more tuples distributed over one or more relations. To handle class hierarchy, have two basics tasks to perform:

(1) design relations to represent class hierarchy; (2) design how objects will be accessed.

Pearson Education © 2009 21

Page 22: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Storing Objects in Relational Databases

Pearson Education © 2009 22

Page 23: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Mapping Classes to Relations Number of strategies for mapping classes to

relations, although each results in a loss of semantic information.

(1) Map each class or subclass to a relation: Staff (staffNo, fName, lName, position, sex, DOB, salary) Manager (staffNo, bonus, mgrStartDate) SalesPersonnel (staffNo, salesArea, carAllowance) Secretary (staffNo, typingSpeed)

Pearson Education © 2009 23

Page 24: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Mapping Classes to Relations (2) Map each subclass to a relation

Manager (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate) SalesPersonnel (staffNo, fName, lName, position, sex, DOB, salary, salesArea, carAllowance) Secretary (staffNo, fName, lName, position, sex, DOB, salary, typingSpeed)

(3) Map the hierarchy to a single relation Staff (staffNo, fName, lName, position, sex, DOB, salary, bonus, mgrStartDate, salesArea, carAllowance, typingSpeed, typeFlag)

Pearson Education © 2009 24

Page 25: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

ORDBMSs RDBMSs currently dominant database technology with estimated sales of US$24billion in 2011, expected to grow to US$37billion by 2016 . Vendors of RDBMSs conscious of threat and promise of OODBMS. Agree that RDBMSs not currently suited to advanced database applications, and added functionality is required. Reject claim that extended RDBMSs will not provide sufficient functionality or will be too slow to cope adequately with new complexity. Can remedy shortcomings of relational model by extending model with OO features.

Pearson Education © 2014 25

Page 26: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

ORDBMSs - Features OO features being added include: – user-extensible types, – encapsulation, – inheritance, – polymorphism, – dynamic binding of methods, – complex objects including non-1NF objects, – object identity.

Pearson Education © 2014 26

Page 27: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

ORDBMSs - Features However, no single extended relational model. All models: – share basic relational tables and query

language, – all have some concept of ‘object’, – some can store methods (or procedures or

triggers).

Some analysts predict ORDBMS will have 50% larger share of market than RDBMS.

Pearson Education © 2014 27

Page 28: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Stonebraker’s View

Pearson Education © 2014 28

Page 29: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Advantages of ORDBMSs Resolves many of known weaknesses of RDBMS. Reuse and sharing: – reuse comes from ability to extend server to

perform standard functionality centrally; – gives rise to increased productivity both for

developer and end-user. Preserves significant body of knowledge and experience gone into developing relational applications.

Pearson Education © 2014 29

Page 30: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

Disadvantages of ORDBMSs Complexity. Increased costs. Proponents of relational approach believe simplicity and purity of relational model are lost. Some believe RDBMS is being extended for what will be a minority of applications. OO purists not attracted by extensions either. SQL now extremely complex.

Pearson Education © 2014 30

Page 31: CS317 File and Database Systemsmercury.pr.erau.edu/~siewerts/cs317/documents/Lectures/...– Bit-rot (media eventually fails, limited storage lifetime) Variety, Depends on Level of

SQL:2011 - New OO Features Type constructors for row types and reference types. User-defined types (distinct types and structured types) that can participate in supertype/subtype relationships. User-defined procedures, functions, methods, and operators. Type constructors for collection types (arrays, sets, lists, and multisets). Support for large objects – BLOBs and CLOBs. Recursion.

Pearson Education © 2014 31