CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D....

38
CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. [email protected]

Transcript of CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D....

Page 1: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

CSCI 1200Introduction to Computing for

Non Majors

CHAPTER 7

Tami Meredith, [email protected]

Page 2: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Chapter 7 Objectives Explain what a database is and

describe its basic structure Identify the kinds of problems that can

be best solved with database software Describe different kinds of database

software Describe database operations for

storing, sorting, updating, querying, and summarizing information

2

Page 3: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

The Google Guys Searchfor Tomorrow Handles hundreds of

millions of queries a day

Launched by two Ph.D. students (Larry Page and Sergey Brin)

In 2004, Google offered shares to the public

Google is an example of a "text database"

3

Page 4: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Get to know your search engines

Be specific when you search

Ask questions Use quotes to

narrow your search Know your pluses

and minuses

More is usually better than less

Refine your queries and try again – search is a process, not an action

Explore keywords Dig, don't just look

at the first page of results

4

Web Search

Page 5: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

What is a Database?

Database: an organized or structured collection of information/data that can easily be accessed (retrieval/search), managed, and updated

Databases make it easier to:› Store large quantities of information› Retrieve information quickly and flexibly › Organize and reorganize information › Print and distribute information in a variety of

ways

5

Page 6: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Types of DatabasesMedia Databases: music, movie, and photo

libraries – often based on keywords, annotations, and meta information

PIM (personal information manager): electronic organizer

GIS (geographical information systems): include geographic and demographic data in map form

Web databases: large scale textual databases for HTML and related content – also consider the tags as well as the content

Relational Databases: General purpose systems based on SQL

Custom systems for "Big Data"6

Page 7: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Anatomy

Database: organized collection of information stored in a computer

Database program: software tool for organizing storage and retrieval of that information

7

Page 8: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Anatomy

Table: collection of related information› Like a single page of a spreadsheet

Record: a single row in a table› information related to one person, product, or event› Each discrete chunk of information in a record is a

field Field: A cell in a table (intersection of a row and

a column)› holds different types of data: numeric (height,

income), text (title, name), structured (date, address), computed (age), reference (parent, sibling)

8

Page 9: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Operations Import: receive data in some form (often as text or

CSV) Add: New records Update: Change existing records Delete: Unneeded records Browse: navigate through information (not very useful) Query: find records that match specific criteria (search

and retrieve) Sort: rearrange records (alphabetically or numerically) Print: reports, labels, and form letters Export: to another program

9

Page 10: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Access

Structured Query Language (SQL): standard language for programming complex queries› Graphical user interfaces allow point-and-click queries

SQL statements are understood by MS Access, MS SQL Server, DB2, Oracle, Sybase, MySQL and others.

SQL is a sublanguage that:› Uses high-level, easy-to-understand statements› Has a relational database orientation› Is portable across a wide range of systems

10

Page 11: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Customers: A Simple Table

ID CustomerName

ContactName

Address City PostCode

Country

1 Alfreds Futterkiste

Maria Anders Obere Str. 57 Berlin 12209 Germany

2 Ana Trujillo Emparedados y helados

Ana Trujillo Avda. de la Constitución 2222

México D.F.

05021 Mexico

3 Antonio Moreno Taquería

Antonio Moreno Mataderos 2312 México D.F.

05023 Mexico

4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK

5 Berglunds snabbköp

Christina Berglund

Berguvsvägen 8 Luleå S-958 22 Sweden

11

Page 12: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

SQL Queries

Describe desired result – let the system figure out how to get the data for you

You need to know how the tables are structured or you'll be unable to do anything

Format:SELECT (which columns) FROM (which table) WHERE (conditions that apply)

Example:SELECT CustomerName FROM CustomersWHERE Country=`Mexico`;

Page 13: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

SELECT: Notes

Conditions use: =, <, <=, >, >=, <> Conditions combined using AND, OR,

NOT Domain-specific conditions also exist: e.g.,

LIKE SQL implements multi-sets, duplicate

records are possible unless DISTINCT is used

SELECT * identifies all columnsE.g., SELECT * FROM STUDENTS;

Page 14: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Simple SQL Queries

SELECT * FROM Customers; SELECT CustomerName, City FROM

Customers; SELECT DISTINCT City FROM

Customers; SELECT * FROM Customers

WHERE Country='Mexico'; SELECT * FROM Customers WHERE

ID=1;14

Page 15: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

SQL Queries with AND & OR

SELECT * FROM CustomersWHERE Country='Germany'AND City='Berlin';

SELECT * FROM CustomersWHERE City='Berlin'OR City='London';

SELECT * FROM CustomersWHERE Country='Germany'AND (City='Berlin' OR City='München');

15

Page 16: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Sorting Results in SQL

SELECT * FROM CustomersORDER BY Country;

SELECT * FROM CustomersORDER BY Country, CustomerName;

SELECT * FROM CustomersORDER BY Country ASC, CustomerName DESC;

16

Page 17: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Joins

Getting and merging data from two tables

The reason that Databases can do so much more than a spreadsheet

Done with multiple FROM clausesSELECT EMPLOYEE.Name , DEPARTMENT.LocationFROM EMPLOYEE, DEPARTMENT WHERE EMPLOYEE.Dnumber = DEPARTMENT.Dnumber;

Page 18: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Complex SQL

Basic SQL will do a lot, but to handle industrial tasks, SQL is a lot more complex

Entire "programs" can be written in SQL

Used in conjunction with other programming languages

E.g., SQL + PHP is the basis of most WWW content management systems (CMS) 18

Page 19: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Pattern Matching

SELECT Fname FROM EMPLOYEE WHERE Address LIKE `%Houston%`;

% Matches zero or more characters SELECT Fname FROM EMPLOYEE WHERE

PostalCode LIKE `B_____`; _ matches one character SELECT Fname FROM EMPLOYEE WHERE

(Salary BETWEEN 30000 AND 50000); BETWEEN = ((S >= 30000) AND (S

<=50000))

Page 20: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Aliasing

SELECT E.Name FROM EMPLOYEE AS E WHERE E.Age > 30;

SELECT E.Name FROM EMPLOYEE E WHERE …

INSERT INTO `menu` (`parentid`,`display`,`link`,`module`,`target`,`status`,`menuid`) VALUES((SELECT `id` FROM (SELECT * FROM `menu`) AS

something WHERE `display`='Projects'),'EMWerx',(SELECT `id` from `content_pages` WHERE `name`='EMWerx'),'Content','same','1','1');

Page 21: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Operations

SELECT .15*Salary AS TaxRate FROM EMPLOYEE;

Operations include: +, -, * , / for numbers

| | is string concatenation +,- increment or decrement a date,

time, or timestamp by an interval - of two times, dates, or timestamps

yields an interval

Page 22: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Nested Queries

IN returns TRUE if a value is a member of a multiset

SELECT DISTINCT PnumberFROM PROJECTWHERE Pnumber IN(SELECT Pnumber FROM PROJECT, DEPARTMENT,

EMPLOYEE WHERE Dnum=Dnumber AND Mgr_ssn = SSN AND Lname=`Smith`);

ALL, ANY (also SOME) also work on multisets SELECT Lname FROM EMPLOYEE WHERE

Salary > ALL (SELECT Salary FROM EMPLOYEE WHERE Dno=5);

Page 23: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Evaluation

Unqualified attributes refer to the innermost nested query, use an alias to rename a result and refer to its attributes for outer queries

If a WHERE clause contains a nested query the inner query is evaluated once for each outer query

Page 24: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Exists

Returns TRUE if there is at least one tuple/row in the argument

Can be used with NOT SELECT Lname FROM EMPLOYEE

WHERE NOT EXISTS (SELECT Pnumber FROM PROJECT);

UNIQUE returns TRUE if a multiset is a set --- i.e., has no duplicates

Page 25: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Explicit Joins

Possible to move a join condition to a FROM clause for making queries clearer

SELECT Fname FROM(EMPLOYEE JOIN DEPARTMENT ON

Dno=Dnumber)WHERE Dname=`Research`;

Page 26: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Aggregate Functions

COUNT, SUM, MIN, MAX, AVG SELECT AVG(SALARY), MIN(SALARY),

MAX(SALARY) FROM EMPLOYEE; SELECT COUNT(*) FROM EMPLOYEE,

DEPARTMENT WHERE Dno=Dnumber AND Dname<>‘Research’;

SELECT COUNT (DISTINCT Salary) FROM EMPLOYEE;

Page 27: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database-Management Systems

Database Management Systems (DBMS) help manage the tables of a database

Conciseness: redundancy wastes space and may cause inaccuracies if duplicates not updated at the same time

Correctness: requires all tables to contain the correct information

Completeness: all tables should contain all available information

27

Page 28: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

3 Valued Logic Databases do not use traditional

"Boolean" logic based on True and False

They add in a third value, "Unknown," to handle missing data values

Adds a lot of complexity to database systems but also a lot of power and flexibility to prevent errors due to unknown/missing values

Page 29: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

RDBMS What makes a database relational?

› Tables related› Information spans a single table› Complex relationships are possible› Data may have attributes & values

29

Page 30: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Trends Downsizing and decentralizing

› Client/server approach: uses requests to access database

› Distributed databases: spread data across networks Data mining: discovery of hidden predictive

information› Helps to predict trends and patterns in data› Not always correct, use with care

Data scrubbing: Repairing incorrect (dirty) data Data warehouses: large, centralized systems

housing enterprise data

30

Page 31: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Fads Object-oriented databases: store objects

› Every object is an instance of a class.› The class includes the data and the operations› Not been as successful as hoped

Multidimensional database technology stores data in more than two dimensions› Data is organized in cubes› More hype and terminology than innovation

(basic theory always supported this idea)

31

Page 32: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Trends

Intelligent searches› More artificial

intelligence technology

› Natural language queries

› Contextual search tools will classify results

› Clustering of results by similarity

32

Page 33: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Database Management

Design and creation of a database is a highly specialised task

Database administration is an IT career SQL seems simple, but yet, is very

complex to fully master Companies are highly dependent on

their databases and can fail if errors are made

33

Page 34: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Get a professional, it's easy to screw up and the costs are very high if you do

Choose the right tool for the job

Knowing how you'll do retrieval is the key to successful design

Start with a plan; be prepared to change your plan

Make your data:› consise› complete› correct

Databases are only as good as their data

Query with care Too much data is

better than too little

34

Database Tips

Page 35: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

"Identity Theft" The new bogeyman Applies to any case of someone accessing your

accounts as "you" (e.g., boy/girlfriend using your bank card, cell phone, etc.)

Not as common as media makes it out to be Can be triggered by being careless online or by

errors in judgment Usually just a simple case of someone using

your account after you share access information No different to lending your bank card to the

wrong person

35

Page 36: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Privacy Concerns

Record matching uses a unique field to combine information from different databases

Privacy is decreasing› Data can be transmitted almost instantly› Workplace monitoring› Surveillance cameras

36

Page 37: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Don’t give out specifically identifying information: SIN, DoB, Driver's license number

Minimise other information you distribute

Be stingy with your email address – though spam filtering and multiple addresses make this less important now

Say no to sharing your personal information

Think before you post – everything lasts forever

Pay attention to privacy preferences

If you think there’s. incorrect or damaging information about you in a file, find out

Don't believe companies – they can be infiltrated, use dubious techniques

37

Privacy Concerns

Page 38: CSCI 1200 Introduction to Computing for Non Majors CHAPTER 7 Tami Meredith, Ph.D. tami.meredith@live.com.

Summary Database programs enable users to store,

organize, retrieve, communicate, and manage large amounts of information

There are many specialized databases such as media libraries, GIS, and PIMs

Organizations are moving toward a client/server approach that enables users to have access to data stored in servers throughout the organization’s network

The accumulation of data by government agencies and businesses is a growing threat to our right to privacy

38