Week 1. Before we start Course website Discussion Forum HW.

Post on 01-Jan-2016

219 views 2 download

Tags:

Transcript of Week 1. Before we start Course website Discussion Forum HW.

Week 1

Before we start

• Course website• http://www.cs.ucla.edu/classes/spring15/cs143/

• Discussion Forum• https://piazza.com/ucla/spring2015/cs143/home

• HW and Project Submisssion• https://ccle.ucla.edu/course/view/15S-COMSCI143-1

• Contact:• Hsuan Chiu cherylautumn@cs.ucla.edu• Yuxi Chenyuxichen@cs.ucla.edu

• Come to my office hour if you have questions• Hsuan Chiu BH 2432 Wed 4:30pm~5:30pm, Fri 12:00pm~2:00pm• Yuxi ChenBH 2432 Mon 1:30pm~2:30pm, Fri 12:00pm~2:00pm

• Start from next week, we will use Discussion 2B session as another office hour. Students enrolled in 2B please leverage the other two sessions.

The discussion class

• We usually try to• Go over material covered in the lecture very quickly• Answer your questions• Do more exercises• Go over the previous homework• Discuss projects• …

Today’s plan

• Introduction to DBMSs• Relational Algebra• Project 1

IntroductionChapter 1

Let’s start with a question

• How would you store the information of the books you have in your bookshelf at home?

• You can use a simple text file, or spreadsheet in your file system, and it’s going to work perfectly

• What if you need to store the information of all books in all the libraries in UCLA?

• Why it’s not wise to use the same approach?

Drawbacks of using file systems to store data

• Data redundancy and inconsistency• Multiple file formats, duplication of information in different files

• Difficulty in accessing data • Need to write a new program to carry out each new task

• Data isolation - multiple files and formats• Integrity problems

• Integrity constraints (e.g., account balance > 0) become “buried” in program code rather than being stated explicitly

• Hard to add new constraints or change existing ones

Drawbacks of using file systems (cont.)

• Atomicity of updates• Failures may leave database in an inconsistent state with partial updates

carried out• Example: Transfer of funds from one account to another should either

complete or not happen at all

• Concurrent access by multiple users• Concurrent access needed for performance• Uncontrolled concurrent accesses can lead to inconsistencies

• Example: Two people reading a balance (say 100) and updating it by withdrawing money (say 50 each) at the same time

• Security problems• Hard to provide user access to some, but not all, data

Solution?

• We need a system to support all the previous needs• That system is called Database Management System (DBMS)

Database management system (DBMS)

• DBMS contains information about a particular enterprise• Collection of interrelated data• Set of programs to access the data • An environment that is both convenient and efficient to use

• Database Applications:• Banking: transactions• Airlines: reservations, schedules• Universities: registration, grades• Sales: customers, products, purchases• Online retailers: order tracking, customized recommendations• Manufacturing: production, inventory, orders, supply chain• Human resources: employee records, salaries, tax deductions

• Databases touch all aspects of our lives

Instances and schemas

• Similar to types and variables in programming languages• Schema – the logical structure of the database

• Example: The database consists of information about a set of customers and accounts and the relationship between them

• Analogous to type information of a variable in a program• Instance – the actual content of the database at a particular point in time

• Analogous to the value of a variable

Schema for LibBook system

• Book (BID, BTitle, BPrice, BPubDate, …)• Library (LID, LName, LDept, …)• BoLi(BID, LID, Date, Number, …)• …

Instances for LibBook system

• Book

• Library

• BoLi

BID BTitle BPrice BPubDate

12 DB Concept 153 2009

32 OS Concept 102 2008

42 AI in practice 100 2005

15 Intro. To DB 50 2010

LID LName LDept

23 CSLib CS

12 Powell

65 Lib65 Humanities

BID LID Date Number

42 23 2009 3

12 23 2009 1

12 12 2005 2

Today: Relational Algebra

• Way to ask queries of relations.• Operators

Operation Symbol Explanation

Selection σ Selects rows

Projection π Selects columns

Union ⋃

Intersection ⋂

Cross-product ×

Join ⋈ Join tables on some condition

Division / Later..

Introduction to the Relational Algebra

Relation schema and instance

• A1, A2, …, An are attributes• R = (A1, A2, …, An ) is a relation schema

Example:

instructor = (ID, name, dept_name, salary)

• Formally, given sets D1, D2, …., Dn, a relation r is a subset of

D1 x D2 x … x Dn

• One (set) of the attributes should be key (identifier)

• Order of tuples is irrelevant

Models = Relations (almost)

Attribute = ColumnTuple = RowDomain = Allowed values for each columnKey = Row’s Identifier in the tableCardinality = Number of rowsDegree = Number of columns

• So from now on, we use table and relation interchangeably. • Although tables are ordered and can have duplicate items.

Database

• A database consists of multiple relations (tables)

• Information about an enterprise is broken up into parts

instructor student advisor

• Bad design: univ (instructor -ID, name, dept_name, salary, student_Id, ..)results in

• repetition of information (e.g., two students have the same instructor)

• the need for null values (e.g., represent an student with no advisor)

• Normalization theory (Chapter 7) deals with how to design “good” relational schemas

Relational query languages

• Now that we have a collection of relations as a relational database, we need a language to access (query) our database:

• Procedural vs non-procedural, or declarative• Relational operators

Selection

• Select rows• Example: σbirth_year < 1950(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection

• Select rows• Example: σbirth_year < 1950(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Projection

• Select columns• Example: πusername, email(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Projection

• Select columns• Example: πusername, email(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection & Projection

• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection & Projection

• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection & Projection

• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection & Projection

• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Selection & Projection

• What’s the difference between the following:– σbirth_year < 1950(πusername, email(R))– πusername, email(σbirth_year < 1950(R))

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Union

• Set union between two relations with same fields

• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Union

• Set union between two relations with same fields• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

Union

• Set union between two relations with same fields

• Example: σbirth_year < 1950(R) U σbirth_year%2==0(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

Intersection

• Set intersection between two relations with same fields

• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Intersection

• Set intersection between two relations with same fields

• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

Intersection

• Set intersection between two relations with same fields

• Example: σbirth_year < 1950(R) σ⋂ birth_year%2==0(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

Set Difference

• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Set Difference

• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

-

Set Difference

• A – B takes out all rows in B from A.• Example: σbirth_year%2==0(R) - σbirth_year < 1950(R)

Username Email Birth Year

beep beep@gmail.com 1980

Cross product

• A x B takes all rows from A and combines them with all rows from B.

• Example: πusername(R) x πbirth_year(R)

Username Email Birth Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1920

beep beep@gmail.com 1980

beepboop blitzcrank@gmail.com 1985

Cross product

• A x B takes all rows from A and combines them with all rows from B.

• Example: πusername(R) x πbirth_year(R)

Username

meep

notmeep

beep

beepboop

Birth Year

1920

1980

1985X =

Cross product

• A x B takes all rows from A and combines them with all rows from B.

• Example: πusername(R) x πbirth_year (R)

Username Birth Year

meep 1920

meep 1980

meep 1985

notmeep 1920

notmeep 1980

notmeep 1985

beep 1920

beep 1980

beep 1985

beepboop 1920

beepboop 1980

beepboop 1985

Username

meep

notmeep

beep

beepboop

Birth Year

1920

1980

1985X =

Join

• A B joins A and B based on some column.⋈– Natural join: Default, joins on matching

columns.– Can also specify a particular join predicate.

• Example: πusername,email(R) π⋈ username,birth_year(R)

Username Email

meep meep@gmail.com

notmeep notMeep@gmail.com

Username Birth Year

meep 1920

beep 1980

notmeep 1985

Join

• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 1. cross product

Username_1 Email Username_2 Birth_Year

meep meep@gmail.com meep 1920

meep meep@gmail.com beep 1980

meep meep@gmail.com notmeep 1985

notmeep notMeep@gmail.com meep 1920

notmeep notMeep@gmail.com beep 1980

notmeep notMeep@gmail.com notmeep 1985

Join

• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 2. selection on username_1=username_2

Username_1 Email Username_2 Birth_Year

meep meep@gmail.com meep 1920

meep meep@gmail.com beep 1980

meep meep@gmail.com beepboop 1985

notmeep notMeep@gmail.com meep 1920

notmeep notMeep@gmail.com beep 1980

notmeep notMeep@gmail.com notmeep 1985

Join

• A B joins A and B based on some column.⋈• Example: πusername,email(R) π⋈ username,birth_year(R)• 3. project attrubutes

Username Email Birth_Year

meep meep@gmail.com 1920

notmeep notMeep@gmail.com 1985

Division● A / B projects the columns from A that are

not in B, while selecting elements of A that fully intersect with B.

● Think of it as opposite of cross-product!● Example: who takes both cs186 and cs162?

Person Class

meep cs186

meep cs162

moop cs186

moop cs61

beep cs186

beep cs162

Class

cs186

cs162

=

Person

meep

beep/

Division

● Can be expressed with the following equation:

● Things to eliminate: Things that are in the cross product of πx(A) and B that are NOT in A.

● Subtract the left hand side (projection) of those candidates from πx(A).

πx(A) - πx((πx(A) x B) - A)

Division example 2

pno weight

p1 22

p2 17

p3 17

P4 6

P5 30

sno pno

s1 p1

s2 p3

s2 p5

s3 p3

s3 p4

s4 p6

s5 p1

s5 p2

s5 p3

s5 p4

s5 p5

s5 p6

Table: spj

Table: p

Query: Find supplier id ‘sno’ from table spj that they supply all components of weight equal to 17

Division

Idea: Find the values that do not belong in the answer,

and remove them from the list of possible answers.

πx((πx(A) x B) - A)

πx(A) - πx((πx(A) x B) - A)

Division

• 1. formulate all possible answer

(πx(A) x B) : (πsno(spj) x πpno(σweight = 17(p) ))

sno

s1

s2

s3

s4

s5

pno

p2

p3

X =

Sno pno

s1 p2

s1 p3

s2 p2

s2 p3

s3 p2

s3 p3

s4 p2

s4 p3

s5 p2

s5 p3

Division• 2. find out disqualified answer (πx(A) x B) - A

Sno pno

s1 p2

s1 p3

s2 p2

s2 p3

s3 p2

s3 p3

s4 p2

s4 p3

s5 p2

s5 p3

sno pno

s1 p1

s2 p3

s2 p5

s3 p3

s3 p4

s4 p6

s5 p1

s5 p2

s5 p3

s5 p4

s5 p5

s5 p6

Sno pno

s1 p2

s1 p3

s2 p2

s3 p2

s4 p2

s4 p3

Find which supplier who do not support both p2 and p3

- =

Temp table:tmp

Division

• 3. remove those disqualified answer from the list of possible answers.

πx(A) - πx((πx(A) x B) - A)=πsno(spj)-πsno(tmp)

sno

s1

s2

s3

s4

s5

-

sno

s1

s2

s3

s4

sno

s5=

Division

• Since division operator is not implemented in SQL, you need to use more complex SQL query to present.

• But the basic concept is if you see a query expressed with “All”, you may need to consider about using division.

• “Which students are registered on ALL the courses given by Soini?” • “Which boys are registered on those courses that are taken by ALL t

he girls?”

Exercise

• Consider these relations in database. Primary key attributes are underlined. • employee ( eid, street, city)• works ( eid, c-name, salary)• company ( cid, c-name, city)• manager ( pid, mgrid)• Assume the companies may be located in several cities. Find all companies

located in every city in which U-Bank is located. • Find the eids and cities of residence of all employees who work for U-Bank. • Find the eid, streets, and cities of residence of all employees who work for U-

Bank and earn more than $10,000 per annum.• Find the eids of all employees in this database who live in the same city as the

company for which they work.• Assume the companies may be located in several cities. Find all companies

located in every city in which U-Bank is located.

Project 1A

• Set up environment. • VM Installation. http://yellowstone.cs.ucla.edu/~hchiu/cs143/• Download CS143.ova

• Run a test.• 1. start VM. • 2. create a php file.• 3. put under folder www.

• Demo: bmi and calculator.

Form example.

<html><body><form action="handle.php" method="GET">Name: <input type="text" name="name" />Age: <input type="text" name="age" /><input type="submit" /></form><?phpif($_GET["name"]){echo "Welcome ".$_GET["name"]."<br/>";echo "You are ".$_GET["age"]." years old.";}?></body></html>

Hint for project 1A

evalExample:<html><body><?php$equ= "3+4*5";eval("\$ans= $equ;");echo "ans= ".$ans;?></body></html>

Hint for project 1A

preg_match<html><body><? preg_match("/^(http:\/\/)?([^\/]+)/i", "http://www.php.net/index.html", $matches); print $host = $matches[0]."<BR>"; print $host = $matches[1]."<BR>"; print $host = $matches[2]."<BR>"; // get last two segments of host name preg_match("/[^\.\/]+\.[^\.\/]+$/","http://www.php.net",$matches); echo "domain name is: ".$matches[0]."\n"; ?> </body></html>

matches

• If matches is provided, then it is filled with the results of search. • $matches[0] will contain the text that matched the full pattern, • $matches[1] will have the text that matched the first captured

parenthesized subpattern, and so on.