Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3....

13
SURVEY OF RELATIONAL DATABASES ME535 Final Paper Rameen Taeb June 9, 2014

Transcript of Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3....

Page 1: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

SURVEY OF RELATIONAL

DATABASES ME535 Final Paper

Rameen Taeb June 9, 2014

Page 2: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

1

Table of Contents Relational Database Overview ...................................................................................................................... 2

History ....................................................................................................................................................... 2

Relational Database Structure .................................................................................................................. 3

Database Normalization ........................................................................................................................... 4

Relational Algebra ..................................................................................................................................... 4

Comparison to Alternative Database Models ............................................................................................... 6

Introduction .............................................................................................................................................. 6

Non-Relational Commonly Used Database Models .................................................................................. 6

Hierarchical Database Model Overview ................................................................................................... 7

Network Database Overview .................................................................................................................... 7

Comparison Summary ............................................................................................................................... 8

Application .................................................................................................................................................... 9

Relational Database Management Systems (RDBMSs) ............................................................................ 9

Industry Use ............................................................................................................................................ 10

Application Example ............................................................................................................................... 10

References .................................................................................................................................................. 12

Page 3: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

2

Introduction A database is generally defined as an organized collection of data. The necessity to store data in an organized, repeatable, and stable manner stemmed from the invention and development of computers. As computing power increased, the necessity to manage and utilize larger sets of data increased. The first use of databases was in the 1960s following the availability of rudimentary storage hardware such as disks. The relational database model, invented in 1970, was adopted as computer hardware developed and computing resources and power became more abundant and affordable. A relational database enabled a large amount of data to be stored in a highly structured manner which enabled simplicity in data management, analysis, and search. It also allowed for relationships to be established between data or a seemingly unlimited number of tables contained within the database. These capabilities proved to be extremely useful in industry. Although relational databases do have disadvantages in comparison to non-relational databases, which are unstructured, relatively less resource demanding, and faster, relational databases have continued to see exponential growth for a multitude of applications. This document will provide an overview of relational databases, provide a comparison to alternative database models, and detail applications and use cases.

Relational Database Overview

History Relational Databases were first defined in June of 1970 by Edgar Codd as an alternate to existing forms of data storage such as hierarchical database model and the network model. The relational model largely differed from existing database models in that it relied on data search by content rather than through a series of links. This was accomplished via sets of tables, each of which store a different parameter or dataset. We will go into details regarding the structure and application of relational databases shortly. Unfortunately, despite the immediately apparent value of relational databases, computer hardware was largely not powerful enough to utilize relational databases as intended or theoretically capable. It was not until the mid-1980s until relational databases were used in industry. With the rapid development of computer hardware, by the early 1990s, relational databases were utilized as the primary model for management of large-scale data processing applications. Since then, their utilization has continued to grow at an exponential rate which has been supported by rapid growth in hardware and software capabilities. Relational databases are developed and managed via a variety of database/computing languages and specialized applications. These are typically referred to as Relational Database Management Systems (RDBMS). The most widely used systems are Oracle Database and Microsoft SQL Server, which were initially created and developed through the late 70s and 80s. End users utilize these applications to store, manage, query (or search), and analyze data.

Page 4: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

3

Relational Database Structure As previously stated, a relational database is a database which contains a collection of tables, each of which store data for specific parameters or attributes. Each table represents a relation, which is utilized as the naming convention for the dataset. It is important to note that each table is required to identify primary keys, which are defined as columns or groups of columns. In a relational database, a relationship can be established between the primary key and a foreign key on another table. Via the utilization of database languages such as SQL, relationships are established between tables and datasets. The terminology and structure for each table is detailed in Figure 1. An example of the manner in which relationships are established between tables or keys is detailed in Figure 2.

Figure 1 http://en.wikipedia.org/wiki/Relational_database

Figure 2 http://worldacademyonline.com/bookimages/25/pict_23.jpg

While the high level structure of a relational database has been defined above, it is important to understand that we've merely detailed a single level relationship. The value of relational databases is managing vast amounts of data with complex relationships which are easily queried, or searched. The relational model offers a nearly endless capability for refinement in organizing and structuring the database. Typically, with large databases, when such a refinement is done, it is referred to as Database Normalization, which is defined as the process of organizing fields and/or tables of a relational database to minimize redundancy. Database Normalization is a critical concept in industry application which necessitates large and complex relational databases. (Maier, 1983)

Page 5: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

4

Database Normalization The original inventor of the relational model, Edgar F. Codd, also introduced the concept of normalization in 1970. Codd defined several properties of normalization of relational databases including: First Normal Form (1NF) - domain of each attribute only contains atomic values and only a single

value. Second Normal Form (2NF) - 1NF and all non-prime attribute are dependent on entirety of

candidate key. Third Normal Form (3NF) - 2NF and all attributes are dependent on primary key. Boyce-Codd Normal Form (BCNF) - derivative of 3NF which deals with certain corner cases.

Typically relational databases are considered normalized when in 3NF. E.F. Codd stated that the goal of normalization greater than 1NF was: *****

1. To free the collection of relations from undesirable insertion, update and deletion dependencies; 2. To reduce the need for restructuring the collection of relations, as new types of data are

introduced, and thus increase the life span of application programs; 3. To make the relational model more informative to users; 4. To make the collection of relations neutral to the query statistics, where these statistics are liable

to change as time goes by. ***** (Jatana, Puri, Ahuja, Kathuria, & Gosain, 2012)

Relational Algebra Relational Algebra is another tool used to organize data into sets of data. This is typically done by way of

reducing the data to 1NF the performing main operations of relational algebra on data sets. The largest

application of this is providing a theoretical foundation to relational databases.

Relational Algebra involves a collection of operators specifying queries which in turn describe a step-by-step procedure for computing the answer. Basic operators include:

• Union – Set of all distinct elements.

Ex: is a collection of points which are in both A and B • Projection – return column values

where:

is a set of attribute names. This set is pulled when all the Tuples are restricted to the set.

• Selection– return rows that meet some condition

where:

Page 6: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

5

• Cross product • Difference • Other operators can also be defined in terms of basic operators

Operation My HTML Symbol

Projection PROJECT

Selection SELECT

Renaming RENAME

Union UNION

Intersection INTERSECTION

Assignment <-

Operation My HTML Symbol

Cartesian

product X

Join JOIN

Left outer

join LEFT OUTER

JOIN

Right outer

join RIGHT

OUTER JOIN

Full outer

join FULL OUTER

JOIN

Semijoin SEMIJOIN

These properties can be used to query the relational database and establish relationships as designated by the user and/or software application. Relational algebra can be used to represent SQL code. One such example is below: http://db.grussell.org/section011.html

Consider the following SQL to find which departments have had employees on the `Further Accounting' course.

SELECT DISTINCT dname

FROM department, course, empcourse, employee

WHERE cname = `Further Accounting'

AND course.courseno = empcourse.courseno

AND empcourse.empno = employee.empno

AND employee.depno = department.depno;

The equivalent relational algebra is

PROJECTdname (department JOINdepno = depno (

PROJECTdepno (employee JOINempno = empno (

PROJECTempno (empcourse JOINcourseno = courseno (

PROJECTcourseno (SELECTcname = `Further Accounting' course)

))

))

))

Page 7: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

6

Comparison to Alternative Database Models

Introduction Several alternatives exist to the Relational Database Model. Given the wide spectrum of computing power, resource limitations, application and user priorities, it is important to fully understand alternative database models, and their respective advantages and disadvantages. This ultimately the development of a database which is not only ideal for data storage, but is also optimized to fulfill the required data analysis, processing, and other desired applications of the end user.

Non-Relational Commonly Used Database Models Several non-relational database models have been and remain widely used. Each model has advantages and disadvantages and as such, it is critical for the end user to truly understand their needs prior to selecting a model. It is also important to understand that a database model fundamentally dictates the logical structure in which data is stored and manipulated. While relational model arguable provides the most value, its high level of functionality may not be needed and could even be detrimental in specific user scenarios such as one with limited resources. Below, is a summary of commonly used database models. Common Logical Data Models: Hierarchical database model Network model Relational Model

Common Physical Data Models: Inverted Matrix Flat File

Other: Associative Model Semantic Model

We will focus on Common Logical Data Models in this paper, specifically Hierarchical and Network, as they relate to Relational databases. That said, it is noteworthy that flat files refer to spreadsheets such as excel which is widely used across nearly every industry.

Page 8: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

7

Hierarchical Database Model Overview A hierarchical database model organizes data into a tree structured model which establishes parent to child relationships. This relationship is categorized as a one to many relationship. This means the parent can have many children, but the children can only have one parent. This utilized a 1 to N mapping and is very simple to understand and decipher. That said, it has significant limitations for complex relationships. (Mohamed, Altrafi, & Ismail, 2014)

Figure 3 http://en.wikipedia.org/wiki/Database_model

Network Database Overview The network model allows for a many-many relationship model, contrary to hierarchical. This allows for a more graph like structure with nodes and potential focal points. (Mohamed, Altrafi, & Ismail, 2014)

Page 9: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

8

Figure 4 http://en.wikipedia.org/wiki/Database_model

We see from Figure 5 that the computational time for Relational is significantly more than that of Network. This data is from a real-work application and DOE to understand the computational power and resource required to support relational databases versus Network

Figure 5 http://raima.com/network-model-vs-relational-model/

Comparison Summary In order to more clearly assess the advantages and disadvantages of relational model versus Network and Hierarchical models, the Figure 6 can be referenced. This figure concisely summarizes the above discussion and comparison of the various models, including ideal use cases.

Page 10: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

9

Network Model Relational Model Hierarchical Model

Pros Support Many-Many relationships

Faster than relational

Less space requirements

Support Many-Many relationships Easy to use (user doesn't need to

know structure) Well defined tables SQL is powerful when working with

structured data Data Independence (any one table

contains data independently of others)

Support One-Many Relationships

Faster than relational

Cons Harder to query (user needs to know db structure)

Lots of additional Tables to track Keys

Additional columns on every table for keys

Slower to query Updating keys requires updating

every reference to that key Lower scalability due to resource

intensity Complex data can't be represented

in tables Reliance on SQL makes dealing

with unstructured data difficult

Hard to Query (user needs to know db structure)

Use Case

Use when resources are expensive and Many-Many relationships are necessary

Use when resources are cheap Use when data is structured well

Use when resources are expensive. Network is preferred over Hierarchical models

Figure 6: Comparison table of database models

Application

Relational Database Management Systems (RDBMSs) In industry, relational databases are widely used. That said, typically, software applications, or Database Management Systems, are what ultimately enable users to utilize these databases to meet their needs. These DBMSs can be considered tools to the user and add a great deal of value. Examples include:

Page 11: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

10

Oracle Database Oracle's MySQL Microsoft's SQL Server IBM's DB2 SAP's Sybase Adaptive Server Enterprise

Let's briefly discuss Oracle Database: Oracle is typically used in industries requiring complex problem solving with large sets of data. It is

a high cost system which comes with a multitude of analysis and data management add-ons. Oracle provides ability to use many different syntaxes and provides high security. Oracle includes a partitioning feature which allows the partitioning of tables based on different set

of keys. The capabilities for such a database are extensive and far greater than what is stated above. If the functionality of Oracle database is not required, several cheaper options such as MySQL also exist which require less resources and can be considered faster for specific applications. Conceptually simpler non-relational management systems are also available for hierarchical and network databases.

Industry Use Relational Databases are used in a variety of industries which have large amounts of resources and the need for highly structured data. Some key industries in which relational databases are predominantly used include: Insurance Banking Insurance Movies/Hollywood Manufacturing

These industries typically have the luxury of a larger resource base in the context of hardware and storage, and require highly structured many-many relationships which are easy to query and manage. These industries also stand to benefit from the data analysis capability associated with the use of relational databases and the value add of this analysis when considering large sets of data which may have a variety of correlations which, if understood, can result in significant organizational value.

Application Example To better understand how a relational database looks and works on a system level, it's best to introduce an example. Scenario: Imagine we are running several branches of video stores like Blockbuster©. There are three categories of data that we need to keep track of:

Page 12: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

11

1. Customers 2. Stores 3. Inventory

That's easy enough, let’s dig a little deeper into Inventory. Inventory is made up of Films. Films have names or film_id; this is the primary Key in the Film table. Each film has many actors and each actor has played in many films. This is a many-many relationship that hierarchical databases do not support. So we have a table with actor names or actor_id; this is the primary Key in the Actor table. In a relational database, we need to "relate" actor_ids to film_ids. So by necessity, a 3rd table is required, lets call it the film_actor table. This table has two primary keys : actor_id and film_id. In this table each actor is related to many films and each film is related to many actors. Given the above structure for our database, let's go through a typical user request: A woman walks in and asks for a movie with Brad Pitt. The employee types in Brad Pitt in the "actor" field of his search tool and all the movies featuring Brad show up. Behind the GUI, the following SQL query is made against the database: Select film_id from film,actor where actor_id="Brad Pitt" In this search we select (show) film_id (film name) from (of) films where actor name is Brad Pitt. This search would have been meaningless without the film_actor table. Notice that on face value this table isn't even called on. However SQL does call on this table to only show films that are related to Brad.

Page 13: Survey of Relational databasescourses.washington.edu/Mengr535/Sample Presentations/DataBases.pdf3. To make the relational model more informative to users; 4. To make the collection

12

References Jatana, N., Puri, S., Ahuja, M., Kathuria, I., & Gosain, D. (2012). A Survey and Comparison of Relational

and Non-Relational Database. International Journal of Engineering Research & Technology.

Maier, D. (1983). The Theory of Relational Databases. Computer Science Press .

Mohamed, M. A., Altrafi, O. G., & Ismail, M. O. (2014). Relational vs. NoSQL Databases: A Survey.

International Journal of Computer and Information Technology.

Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D., & Naughton, J. (2008). Relational

Databases for Querying XML Documents: Limitations and Opportunities. MIT OpenCourseWare.

Terry Halpin, T. M. (2008). Information Modeling and Relational Databases.