Introduction to SQL · SQL is used in a lot of places Big database servers: SQL Server, MySQL,...

Post on 12-Aug-2020

6 views 0 download

Transcript of Introduction to SQL · SQL is used in a lot of places Big database servers: SQL Server, MySQL,...

Introduction to SQLBen Smith

Washington State University

SQL is used in a lot of places

Big database servers:

SQL Server, MySQL, Oracle, DB2

But programs can also connect to those servers:

SAS, Python, R

Four Examples

SQLite in Firefox

MySQL & SQLite in R (Omitted)

MS SQL (using ODBC) in SAS

MS SQL Server Manager Studio

Let’s talk about data types

Data types

Chars, Varchars, Text (strings)

Ints, Floats (binary numbers)

Decimal (base 10 number)

I’m proposing there is in fact only one datatype

Memory really looks like this

0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0 1 0 0 1 0 1 0 1 1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 1 1

Data types, really

If we group 8 bits together we can represent 255 different things, let’s say we map those to characters of the alphabet

Using this method a bunch of “bytes” (8 bit groups), make up a “string”

Floats (binary decimal numbers) can be represented by taking 1 bit to represent the sign, some number of bits to represent the exponent (e.g. 8) and the rest to represent the fraction (e.g. 23)

Decimal Data TypeJust like 1/3 can’t be perfectly defined in base 10, there are numbers in base 2 that can’t be perfectly defined

The decimal data type solves this issue by storing each individual digit in multiple bits

About 100 times slower than float

So what is a query

It’s a question with three parts:

What do I want

Where is it located

Under what conditions

SyntaxSELECT column, ...

FROM table

... JOIN table ON ...

WHERE column = VALUE

OR column LIKE ‘VALUE’

AND column >= VALUE OR column IN(...)

Think about Joins as Merges

That is you are executing on each table independently then merging the results

Demo

Considering Complicate Conditions

Conditions can be embedded, just like math

ExampleWHERE

(

(student.degree_program_1_major_code IN (@majorone,@majortwo, @majorthree, @majorfour) AND student.[degree_program_1_level_code]=@levelcode AND student.degree_program_1_obj_start_date>bCensus.date AND student.degree_program_1_obj_start_date<=aCensus.date AND student.[center_1_code]=@center_code)

OR

(student.degree_program_2_major_code IN (@majorone,@majortwo, @majorthree, @majorfour) AND student.[degree_program_2_level_code]=@levelcode AND student.degree_program_2_obj_start_date>bCensus.date AND student.degree_program_2_obj_start_date<=aCensus.date AND student.[center_2_code]=@center_code)

)

AND student.enrollment_status_code=3 AND student.total_credits>0 AND student.term_code=CAST((CAST((@myyear+1) AS char(4))+'3') AS INT) AND student.class_standing_code=6

Demo

Groups

So you are always getting a set of results THEN grouping them

Aggregation functions work WITH the group

Example

SELECT COUNT(DISTINCT emplid) AS c,

acad_prog, sex

...

GROUP BY acad_prog, sex

Example

SELECT COUNT(DISTINCT emplid) AS c,

acad_prog, sex, MAX(term_gpa)

...

GROUP BY acad_prog, sex

Demo

Subquery

It is a query inside of a query

It can exist anywhere in the query

Can be slow method, if an approach with Joins is available, do that

ExampleSELECT emplid, (

SELECT TOP 1 gpa

FROM student WHERE s.emplid=emplid

ORDER BY gpa DESC

)

FROM student AS s

Demo