Displaying Query Results

32
Displaying Query Results Group 8- Last but not the Least Arturo Cantillep/Greg Lynch/Nora Troy-Shaw/Yanxin Zhao

description

Displaying Query Results. Group 8- Last but not the Least Arturo Cantillep/Greg Lynch/Nora Troy-Shaw/Yanxin Zhao. Displaying Query Results. .1 Introduction Presenting Data/What the Boss Wants .2 Its Power Ordering Data/Producing an Ordered Report Enhancing Query Output - PowerPoint PPT Presentation

Transcript of Displaying Query Results

Page 1: Displaying Query Results

Displaying Query Results

• Group 8- Last but not the Least• Arturo Cantillep/Greg Lynch/Nora Troy-Shaw/Yanxin Zhao

Page 2: Displaying Query Results

Displaying Query Results.1 Introduction

Presenting Data/What the Boss Wants.2 Its Power

Ordering Data/Producing an Ordered ReportEnhancing Query Output Usage (Business Scenario)

.3 Summarizing Data/Summary Functions

.4 Count Functions

.5 Data GroupingGroups and Subgroups

.6 using the where clause

.7 using the having clause

.8 Using the find function

.9 using boolean expression

.10 Summary

Page 3: Displaying Query Results

.1 Introduction

• SQL – a one stop shop alternative for the following:

• Data Step – it can create new table,new variables within a table , add insert data

• Proc Report – do summary, count,mean,

in a nice format too…

• Proc Print – present the result to the user

Page 4: Displaying Query Results

.1 Presenting Data

Objectives• Display a query’s result in a specified order

• Use SAS formats, labels, and titles to enhance the appearance and useability of a query’s output

Business Scenario ( a very realistic one)

• You had been instructed by by your boss to cut company costs by producing a list of old, highly paid sales staff and who had served the longest in the company

Page 5: Displaying Query Results

.1 What the Boss Wants

• Data on the service years of the employees ranked (descending)• Data on the Salary (descending) of each employee• Data on the age of each employee

(descending)

Sample Report

1. Mr. X 25 yrs. service yrs $50,000 salary

40yrs.old age of employee….

Page 6: Displaying Query Results

.2 Ordering DataUse the ORDER BY clause to sort query results in a specific order

Descending order (by following the column name with the DESC keyword)

Order the query results by specifying the following:Any column name from any table in the from clause, even if the column is not on the select listA column name or number representing the position of an item in the select listan expressiona combination of any of the above individual items separated by commas

Page 7: Displaying Query Results

.2 Ordering Data(Our Boss wish list…)From the Sales staff database, list the employee ID, employee name,their

years of tenure with the company, their salary and their age

Example1: Arrange it by descending years of tenure (Chopping block 1)

PROC SQL;

select employee_id, employee_name, int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure,

Salary,int(yrdif(Birth_date,today(),'ACTUAL')) as Age from arturo.salesstaff order by tenure desc;

quit;

Page 8: Displaying Query Results

.2 Ordering Data

PARTIAL PROC SQL OUTPUT ….

The SAS System 17:38 Wednesday, May 26, 2010 48

Employee ID Employee_Name tenure Salary Age --------------------------------------------------------------------------------------------_______

120172 Comber, Edwin 36 $28,345 66

• 120174 Simms, Doungkamol 36 $26,850 66• 121086 Plybon, John-Michael 36 $26,820 65• 120151 Phaiyakounh, Julianna 36 $26,520 65• 121035 Blackley, James 36 $26,460 66• 120166 Nowd, Fadi 36 $30,660 65• 121060 Spofford, Elizabeth 36 $28,800 65• 121073 Court, Donald 36 $27,100 61• 121075 Sugg, Kasha 36 $28,395 65• 120167 Tilley, Kimiko 36 $25,185 56• 121138 Tolley, Hershell 36 $27,265 61• 120154 Hayawardhana, Caterina 36 $30,490 65

• ** - Tenure is in descending order

Page 9: Displaying Query Results

.2 Producing an Ordered Report

• Remember to sort the output in descending order of tenure and then by Age

Proc SQL;

select employee_id, employee_name, int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure,

Salary, int(yrdif(Birth_date,today(),'ACTUAL')) as Age

from arturo.salesstaff where salary gt 30000

order by tenure desc, salary desc;

Page 10: Displaying Query Results

Producing an Ordered Report• The Output Sample

The SAS System 17:38 Wednesday, May 26, 2010 53

Employee Annual Employee ID Employee_Name tenure Salary Age --------------------------------------------------------------------------------------- 120166 Nowd, Fadi 36 $30,660 65 120154 Hayawardhana, Caterina 36 $30,490 65 121081 Knudson, Susie 34 $30,235 61 120125 Hofmeister, Fong 31 $32,040 55 120129 Roebuck, Alvin 24 $30,070 45 120159 Phoumirath, Lynelle 23 $30,765 46 120158 Pilgrim, Daniel 22 $36,605 45 121080 Chinnis, Kumar 22 $32,235 51 121021 Farren, Priscilla 16 $32,985 35

Page 11: Displaying Query Results

.2a Enhancing Query OutputYou can use SAS formats and tables to customize PROC SQL Output. In the SELECT list, after the column name, but before the commas that separate the columns, you can include the following:

• Text in quotation marks (ANSI) or the label = Column modifier (SAS enhancement) to alter the column heading ie use labels instead of variable names

• The FORMAT = column modifier to alter the appearance of the values in that column

ie. Formatting cash amounts with dollar sign and commas

PROC SQL: proc sql; select employee_id label= "Employee Identifier" , employee_name label= "Employee Name" , int(yrdif(emp_Hire_date,today(),"ACTUAL")) as tenure label="Years of Service",

Salary label="Income" format = dollar12., int(yrdif(Birth_date,today(),'ACTUAL')) as Age label="Employee Age"

from learn.salesstaff where salary gt 30000 order by tenure desc, salary desc; quit;

Page 12: Displaying Query Results

Enhancing Query Output

PARTIAL PROC SQL OUTPUT ….

The SAS System 13:22 Thursday, May 27, 2010 2

Employee Years of Employee Identifier Employee Name Service Income

Age ------------------------------------------------------------------------------ 120166 Nowd, Fadi 36 $30,660 65 120154 Hayawardhana, Caterina 36 $30,490 65 121081 Knudson, Susie 34 $30,235 61 120125 Hofmeister, Fong 31 $32,040 55 120129 Roebuck, Alvin 24 $30,070 45 ** - Tenure is in descending order

Page 13: Displaying Query Results

.2b Business Scenario

Produce a report of salary listing of active employees who has wages above $30000 + their 7% bonus. The requestor provided this sketch of the desired report.

Proposed Annual Savings

Employee Number

9999 Salary + bonus: $32012.32

Additional Techniques to use:• define a new column containing the same constant character value for every row• Using SAS titles and footnotes

• Use a combination of these techniques to produce the proposed Annual Savings Report

Page 14: Displaying Query Results

.2 Enhancing Query OutputThe code:

PROC SQL;

title 'Annual Savings Plan 2011';

proc sql;

select employee_id label="Employee ID",

Salary*1.07 as NewSalary label="Salary 2011" format=dollar12.2

from arturo.salesstaff where Salary > 30000 and emp_term_date < 1 order by NewSalary desc;

quit;

• TITLE and FOOTNOTE statements must precede the SELECT statement.• PROC SQL has an option, DQUOTE=, which specifies whether PROC SQL treats values within double quotation

marks (" ") as variables or strings.• With the default, DQUOTE=SAS, values within double quotation marks are treated as text strings.• With DQUOTE=ANSI, PROC SQL treats a quoted value as a variable. This feature enables you to use reserved

words such as AS, JOIN, GROUP, or DBMS names and other names that are not normally permissible in SAS, such as table names, column names, or aliases. The quoted value can contain any character.

• Values in single quotation marks are always treated as text strings.•

Page 15: Displaying Query Results

Enhancing Query Output

PARTIAL PROC SQL OUTPUT ….

Annual Savings Plan 2011 14:51 Thursday, May 27, 2010 13

Employee ID Salary 2011 ------------------------------------ 120158 $39,167.35 121063 $38,509.30 121021 $35,293.95 121099 $35,015.75 120135 $34,764.30 121085 $34,491.45 121080 $34,491.45 121022 $34,464.70 120166 $34,446.51

Page 16: Displaying Query Results

24

Objectives Use functions to create summary queries.

Group data and produce summary statistics for each group.

24

25

Summary FunctionsHow a summary function works in SQL depends on the number of columns specified.

If the summary function specifies only one column, the statistic is calculated for the column (using values from one or more rows).

If the summary function specifies more than one column, the statistic is calculated for the row (using values from the listed columns).

25

.3 Summarizing Data

Page 17: Displaying Query Results

26

The SUM Function (Review)The SUM function returns the sum of the non-missing arguments.

General form of the SUM function:

argument includes numeric constants, expressions, orvariable names. Only when all arguments aremissing will the SUM function return a missingvalue.

26

SUM(argument1<,argument2, ...>)

We need to find the total salary for all employees in the company. /*Find total salary for all active employees*/proc sql;Title 'Total Salary For Active Employees';select "TOTAL:" , sum(Salary) format =comma12.2 from greg.salesstaff where Emp_Term_Date is missing;quit;

Page 18: Displaying Query Results

Total Salary For Active Employees 23 08:24 Saturday, May 29, 2010

------------------------------- TOTAL: 3,512,777.25

33

.4 The COUNT FunctionThe COUNT function returns the number of rows returned by a query.

General form of the COUNT function:

33

argument can be the following:

* (asterisk), which counts all rows

a column name, which counts the number of non-missing values in that column

COUNT(*|argument)

Page 19: Displaying Query Results

35

Summary FunctionsA few commonly used summary functions are listed. Both ANSI SQL and SAS functions can be used in PROC SQL.

35

SQL SAS Description

AVG MEAN returns the mean (average) value.

COUNT FREQ, N returns the number of non-missing values.

MAX MAX returns the largest value.

MIN MIN returns the smallest non-missing value.

SUM SUM returns the sum of non-missing values.

NMISS counts the number of missing values.

STD returns the standard deviation.

VAR returns the variance.

34

Summary FunctionsExample: Determine the total number of current

employees.

PROC SQL Output

34

Countƒƒƒƒƒƒƒƒ

308

s103d07

proc sql;select count(*) as Count

from learn.salestaffwhere emp_term_date is missing

;quit;

Page 20: Displaying Query Results

Proc SQL: Data Grouping

Page 21: Displaying Query Results

.5 Grouping Data

ods rtf file="grouping1.rtf";

proc sql;

title "Grouping by Gender";

select Gender, avg(Salary) as Average_Salary format=dollar8. from learn.salesstaff

group by gender;

ods rtf close;

run;

We can produce output calculated by group using SQL. Here, we calculate the average salary by gender:

Gender Average_SalaryF $27,924

M $27,955

Page 22: Displaying Query Results

Groups and SubgroupsWe can produce output calculated by group and subgroup. Here, we calculate counts for each gender by job title:

ods rtf file="grouping2.rtf";proc sql;title "Counts by Job Title";select Job_Title as Title, Gender, count(*) as Counts from staff group by Job_Title, gender;ods rtf close;run;

Job_Title Gender CountsSales Rep. I F 21

Sales Rep. I M 42

Sales Rep. II F 25

Sales Rep. II M 25

Sales Rep. III F 15

Sales Rep. III M 19

Sales Rep. IV F 7

Sales Rep. IV M 9

Page 23: Displaying Query Results

.6 Using the WHERE Clause

We can create output specified within particular restraints by using the WHERE clause. Here, we have the average salaries for women employees by job title:ods rtf file="grouping3.rtf";proc sql;title "Average Salary of Women Employees";select job_title as Title, avg(salary) format=dollar8. as Average from staffwhere gender="F" group by job_title;ods rtf close;run;

Job_Title AverageSales Rep. I $26,339

Sales Rep. II $27,368

Sales Rep. III $29,436

Sales Rep. IV $31,420

Page 24: Displaying Query Results

.7 Using the HAVING Clause

Alternatively, we can use the HAVING clause to specify which data we want to display. Here, we count the number of women employees by job title.ods rtf file="grouping4.rtf";proc sql;title "Count by Title of Women Employees";select job_title as Title, count(*) as Women from staff group by Job_title, gender having gender eq "F" order by Count desc;ods rtf close;run;

Job_Title WomenSales Rep. II 25

Sales Rep. I 21

Sales Rep. III 15

Sales Rep. IV 7

Page 25: Displaying Query Results

.8 Using the FIND FunctionThe FIND function is a useful tool for locating data and counting data. Here, we use the FIND function to count the number of women employees and display salaries by job title.ods rtf file="grouping5.rtf";proc sql;title "Summary Information of Women Employees";select Job_title, count(find(gender, "F", "I")>0) as Women, avg(salary) format=dollar8. as Average from staffgroup by Job_title, gender having gender="F";ods rtf close;run;

Job_Title Women AverageSales Rep. I 21 $26,339

Sales Rep. II 25 $27,368

Sales Rep. III

15 $29,436

Sales Rep. IV

7 $31,420

Page 26: Displaying Query Results

The find function• The FIND function returns the starting

position of a substring within a string. NOTICE: the string must be character value.

• The general form of FIND function is:

FIND(string,substring<,modifier(s)><,startpos>)

STRING --- constant, variable, or expression to be searched.

SUBSTRING --- constant, variable, or expression sought within the string.

MODIFIER(S) --- i=ignore case, t=trim trailing blanks.

STARTPOS --- an integer specifying the start position and direction of the search.

Page 27: Displaying Query Results

• EXP: find the starting position of the substring F in the character variable Gender.

proc sql;

select gender,

find(Gender,"F","t")

"female_employee"

from learn.Salesstaff

;

quit;

The find function

Because “F” is in the first position of the first substring, so the the value returned by FIND function is “1”; and “F” is not is the second substring , so the returned value is “0”

Page 28: Displaying Query Results

.9 Using Boolean Expressions• Boolean expressions evaluate to TRUE(1) or

FALSE(0).• They are used in this SELECT list to distinguish

rows that have “F” in the Gender column.proc sql;select Job_Title,Gender, (find(Gender,"F","i")>0)

"female_employee"from learn.Salesstaff;quit;

The boolean expression will produce the value 1 when Gender contains “F” and 0 when is does not.

Page 29: Displaying Query Results

Using Boolean ExpressionsFemale_Employee to Male_Employee Partial output

Employee female_ Employee Job Title Gender employee ========================================== Sales Rep. I F 1 Sales Rep. I F 1 Sales Rep. II M 0 Sales Rep. III F 1 Sales Rep. I F 1 Sales Rep. II F 1 Sales Rep. II M 0 Sales Rep. II M 0 Sales Rep. IV M 0 Sales Rep. I M 0 Sales Rep. II M 0 Sales Rep. II M 0 Sales Rep. II F 1

Page 30: Displaying Query Results

Use the counted value to calculate the percentage.proc sql;title "Female_Employee to Male_Employee Ratios"; select job_title, sum((find(Gender,"F","i")>0)) as Female_employee, sum((find(Gender,"F","i")=0)) as Male_employee, calculated Female_employee/calculated Male_employee "F/M Ratio" format=percent8.1 from learn.salesstaff group by job_title ; quit;

Using Boolean Expressions

Page 31: Displaying Query Results

Using Boolean Expressions Female_Employee to Male_Employee Ratios

Female_ Male_ F/M Employee Job Title employee employee Ratio ================================================================= Sales Rep. I 21 42 50.0% Sales Rep. II 25 25 100.0% Sales Rep. III 15 19 78.9% Sales Rep. IV 7 9 77.8%

Page 32: Displaying Query Results

Summary

• In summary, the SQL procedure provides the following capabilities:– ordering of report

– enhanced report production through labels and formats

– summarization of data

– use of counts and data grouping

– use of selection criteria like where and having

– use of find function and boolean expressions

These characteristics makes the analysis of data a breeze….

thanks to PROC SQL.