s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these...

24
s-chapter 16 ~=xploring, Displaying, and Examining Data ··>llearningobjectives lo.fterreading this chapter, you should understand ... I 1 That exploratory data analysis techniques provide insights and data diagnostics by emphasizing visual representations of the data . .:2 How cross-tabulation is used to examine relationships involving categorical variables, serves as a framework for later statistical testing, and makes table-based analysis using one or more control variables an efficient tool for data visualization and decision making.

Transcript of s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these...

Page 1: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

s-chapter 16~=xploring, Displaying, and Examining Data

··>llearningobjectiveslo.fterreading this chapter, you should understand ...

I1 That exploratory data analysis techniques provide insights and data diagnostics by emphasizing visual

representations of the data .

.:2 How cross-tabulation is used to examine relationships involving categorical variables, serves as aframework for later statistical testing, and makes table-based analysis using one or more controlvariables an efficient tool for data visualization and decision making.

Page 2: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>b ri ng ing researchtol ifeMyra and Jason are wrapping up their review of the materials Myra delivered for MindWriter's

latest partnership with Henry and Associates. Jason, knowing Myra is eager to hear anytidbits on the City Center for Performing Arts project, escorts her through the outer office.

Sammye, Henry and Associates newest intern, is busy poring over cross-tabs. He decides itis the perfect time to text Sammye on the rules of data confidentiality he broached with the

interns last week.

-<;-~~~~t?k-i~.::.,::~~:~i~:

"'wine dUringil1t~rIllis;i6ns,BuL\\,e'renotf#en9Ugh ...•-"When the board ~~~joved your'proR

tnto the .4ataJ? .saY:~lll~~··df··~he'··b6~Q'~.·as~~W~~bns·.. .,Plan,;" q~.e~.~~M~ra: .:~~ta~,n'-,~.:fe~~t:-·"are fully correct'and.which,Illlght have to bemodlfieq .refe(encetPthose boxlilcedi<igt"ams.-Wl

·~~~r;~~1~~~;~~~~;~I;'···i~f~~~0~!~~~~r~~~~..: -~:-.~ {.;:~- ..• '.

• . •.,,<~.-.- ,

as a client, Iappreciate it No harm dope tlii$;time, d)Jringthepreliminkya:nilysi~~;phase 'for MijJd\,{~fet:s:;'Jr;)though. What Ja.son failed to tell you is I'm ori:CCPA's C.olllpleteCare study. I didn't give them to yohbe9A#$~::t'f.):

.-. ":"_ .., "_,,,:',:_, '-"-.: ' _':"::--:'-~":,~,.->,( ,.",:_i,' .:'.-:',.. .._;.,,;,::' '-',t ..:,:.-':" ,:':'.:. _ .:"~,:,,.,;,:,~,,:!?~;,,\>:?,~~,,>:-;:.:<

board and part of the project team. Before .Jason would have had to explain bow to interpret them and: :." .

stiJppe.d y()u,t~~Fg~,'!ere ~ettingil!te,estifl~·:p.~i]s:cd'iltinue_''-'" .,',-""

.".::}nd anythi~g Y~~~havF!}() ~.~El,2:[N[:}~r.',~:1f~]l;~~;n~hOLigl1," finishes 'Myra.

","; .~~: ~. ,

Page 3: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>part IV Analysis andPresentation of Data

>!eXPloratOry Data AnalysisThe convenience of data entry via spreadsheet, optimal mark recognition (OMR), or the data editor

I of a statistical program makes it tempting to move directly to statistical analysis. That temptation iseven stronger when the data can be entered and viewed in real time. Why waste time finding out if theI data confirm the hypothesis that motivated the study? Why not obtain descriptive statistical summaries(based on our discussion in Appendix 15a) and then test hypotheses?

IExploratory data analysis is both a data analysis perspective and a set of techniques. In this chapter,

we will present unique and conventional techniques including graphical and tabular devices to visual-ize the data. Exhibit 16-1 reminds you of the importance of data visualization as an integral element in

I the data analysis process and as a necessary step prior to hypothesis testing. In Chapter 3, we said re-search conducted scientifically is a puzzle-solving activity as well as an attitude of curiosity, suspicion,

I and imagination essential to discovery. It is natural, then, that exploration and examination of the datawould be an integral part of our data analysis perspective.

In exploratory data analysis (EDA) the researche.r has the flexibility to respond to the patterns re-vealed in the preliminary analysis of the data. Thus, patterns in the collected data guide the data analy-sis or suggest revisions to the preliminary data analysis plan. This flexibility is an important attributeof this approach. When the researcher is attempting to prove causation, however, confirmatory dataanalysis is required. Confirmatory data analysis is an analytical process guided by classical statisticalinference in its use of significance testing and confidence. I

One authority has compared exploratory data analysis to the role of police detectives and other in-vestigators and confirmatory analysis to that of judges and the judicial system. The.former are involved

As this ad from Radius Globalh.ltctrket Research I torrneriyData Development Worldwide.sug~ests, "pushing data. intoa te-nptats gets the job done"But the company argues thatit isn't an effective way toanalyze data. Every projectcleSet'ves a research ouestion-specific data analysis plan Theway tile data from a particularproject are massaged is oftendetermined by the sf'\iliof theresearcher=just as the skillof re pctter determines theoes gn of the pot.U"\j', "A,,,:;';;::'-~,~~L~£-·~'~}!~~:r·k'1a~,,'Cort;:

It's not the clay,

-, ~ • ~ .: ...",1')'.':':' ".'' . ., ,"".-

"1' .••.•

" .•.. ..- ...• ':.,... .•. .... .•., '"'~. '... '-. ...•

'v ...•. :....•. " ..

, . '-.; .....•..

It's the potter.

; "":"~~J".,;.1 ,,-", .! :,.,.~ -1,"'" 'J' ,:; :"' .. ;,!' ,,''''Ii,'. Bu: "';"...."j,,,: , .•<.:••• ~";",,.;:~, , •••••,,>t •.::.,'.' ." ''')'''-: r,' ~;;""

.•. "",-",,~.1 :'." :lC'.,./, •• ·", ••.,.,,, ",'":,,,,,". "".' ~ •..."",.: :~;"!<" ",' .". ,,,.,tJ,,·,~,,~.~_,.., ..,••, '''~'''''''.'" ":K"

Page 4: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying,and Examining Data 431

;:"Exhibit 16-1 Data Exploration, Examination, and Analysis in the Research Process

(

_-- __ --<f Research \ ) _. Design ------..

.Data Analysis

. ·1 Cross-Tabulation of Variables

Preparation of Data Displays(histograms, boxplots, Pareto,

stem-and-leaf, AID, etc.)

Hypothesis Testing

Determine Recommendations

in the search for clues and evidence; the latter are preoccupied with evaluating the strength of theevidence that is found. Exploratory data analysis is the first step in the search for evidence, withoutwhich confirmatory analysis has nothing to evaluate.? Consistent with that analogy, EDA shares a corn-monality with exploratory designs, not formalized ones. Because it doesn't follow a rigid structure, itis free to take many paths in unraveling the mysteries in the data-to sift the unpredictable from thepredictable.

A major contribution of the exploratory approach lies in the emphasis on visual representations andgraphical techniques over summary statistics. Summary statistics, as you will see momentarily, mayobscure, conceal, or even misrepresent the underlying structure of the data. When numerical summa-ries are used exclusi vely and accepted without visual inspection, the selection of confirmatory modelsmay be based on flawed assumptions'> For these reasons, data analysis should begin with visual inspec-tion. After that, it is not only possible but also desirable to cycle between exploratory and confirmatoryapproaches.

Frequency Tables. Bar Charts, and Pie Charts"Several useful techniques for displaying data are not new to EDA. They are essential to any ex-amination of the data. For example, a frequency table is a simple device for arraying data. Anexample is presented in Exhibit 16-2. It arrays data by assigned numerical value, with columnsfor percem, valid percent (percent adjusted for In issiug data), and CUITIUiali ve percenl. Ad recal i, anominal variable, describes [he ads that participants remembered seeing or hearing vithout beingprom pted by the researcher or the measurement instrument. AIthough there are 100 observations,

Page 5: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

e-part IV Analysisand Presentation of Data

According to New York Times columnist Steve Lohr, in the digital

age statisticians "are changing the image of the profession as

a place for dronish number-nerds. They are finding themselvesincreasingly in demand-and even cool.?" Lohr asserts that the

rapid ascendancy of statisticians, who can eam $125,000 inheir first year after getting a doctorate, is the result of the recent

explosion of digital' data. With Web-based data expanding rap-

idly, up to fivefold by 2012,b there are myriad opportunities for

exploration and problem solving.

Google's chief economist, Hal Varian, explains the impor-

tance of the Intemet's free and ubiquitous data this way: "The

ability to take data - to be able to understand it, to process it.

to extract value from it, to visualize it, to communicate it-that'sgoing to be a hugely important skill in the next decades. »c

While statisticians are in high demand, Varian emphasizeshe need for managers to understand data themselves. In old

organizations, you had an "army of people digesting data andfeeding it to decision makers at the top." Today, it is essential

that people can access, understand, and communicate insights

from data analysis that affect everyday decisions. Using statisti-

cal models, multivariate analysis, and data mining, Internet-age

statisticians operate as "bridge scientists" engaged in the quest

to find meaningful patterns in infonrnation while they advance

business opportunities and identify risks.

So where are the newly fashionable "number-nerds" going?Even in a bad economy, Wall Street, finance, pharmaceuticals,

insurance, research labs, and the government are hiring, Manyof those being hired are part of multi-billion-dollar acquisitions.

Keeping pace with the expanding market for "business intelli-

gence" software, SPSS offers software and data tools designed

to help countless companies understand their consumers. SPSS

was acquired by IBM for $1.2 billion. IBM also offered nearly

$5 billion for the purchase of Cognos. Additionally, Oracle pur-chased Hyperion Solutions for roughly $3.3 billion, while SAP

purchased Business Objects for $4.8 billion. And Microsoftbought Farecast.com for $115 million to support its new venture,

Bing Travel, that reveals if you should purchase an airline ticketnow or wait for.a time closer to your desired flight's departure."

So, dive into data analysis and learn as much as you can. It's

a skill that is much in demand.

www.google.com; www.SAP.com; www.spss.com

the small number of media placements make the variable easily tabled. The same data are presentedin Exhibit 16-3 using a pie chart and a bar chart. The values and percentages are more readily un-derstood in this graphic format, and visualization of the media placements and their relative sizesis improved.

>Exhibit 16-2 A Frequency Table of Ad Recall

TV program A 10 10.0 10.0 10.0

TV program B 2 8 8.0 8.0 18.0

TV program C 3 7 7.0 7.0 25.0

TV program 0 4 13 13.0 13.0 . 38.0

Radio program A 5 24 24.0 24.0 62.0

Radio program B 6 4 4.0 4.0 66.0

Radio program C 7 11 11.0 11.0 77.0

Magazine f\ 8 6 6.0 6.0 83.0

Magazine 8 9 7 7.0 7.0 90.0

Outdoor biiiu08r'u 10 10 10.0 10.0 100.0

Total 100 100.0 100.0

Valid cases 100 Missing cases 0

Page 6: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

schapter 16 Exploring, Displaying, and Examining Data 433

I>!Exhibit 16-3 Nominal Variable Displays (Ad Recall)

I MEDIA %

• TV program A 10I • TV program B 8

I• TV program C 7

11TV program D 1311Radio program A

I24

III Radio program B 4

I~ Radio program C 11o Magazine A 6

o Magazine B 7o Outdoor billboard 10100

25 .-----------------------------------------------------~

5

20

15CQl(::Qla,10

TV-A TV-B TV-C TV-D RAD-A RAD-B RAD-C MAG-A MAG-B OutD

MEDIA

When the variable of interest is measured on an interval-ratio scale and is one with many potentialvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency table of the average annual purchases of PrimeSell's top 50 customers. Only two values, 59.9and 66, have a frequency greater than 1. Thus, the primary contribution of this table is an ordered listof values. If the table were converted to a' bar chart, it would ha ve 48 bars of equal length and two barswith two occurrences. Bar charts do not reserve spaces for values where no observations occur withinthe range. Constructing a pie chart for this variable would also be pointless.

Histoorams-- .•J

The histogram is a conventional solution for the display 01 interval-ratio data. Histograms arc used

when it is possible to group the variable's values into intervals. Histograms are construe ed with bars(or asterisks) that represent data values, where each value occupies an equal amount of auea within the

Page 7: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

434 >part IV Analysis and Presentation of Data

• Arrange the num-bers to revealpatterns.

• Use aver~ages, totals, orpercentages toachieve focus.

• Compare like scalesin a single table.

• Choose simplicityover complexity.

• Use empty spaceand design to guidethe eye to numbersthat must be com-pared and to makepatterns and excep-tions stand out.

• Summarize eachdata display.

• Label and titletables for clarity ofmessage.

• Rounded numbers can be most easily compared, enabfing us to more easily determine the ratioor relationship of one number to another.

• If precision is critical to the number (e.g., you are researching taxes or design specifications ordrug interactions), don't round the numbers .

. " - .

• Order numbers from largest to srn~iest [1uinber:.· .:• In a vertically arranged taBle, order tile largest number at the top.• In a horizontal arrangement, ordertnela,rgest numbers on the left.

• When lookirig for changesovertime, order the riumbers by year, from most distant (left or topitomost recent. ·C.'

• An average provides a point for comparison.• Don't use an average if the raw data reveal a bimodal distribution.

• Totals emphasize the big picture.

• Percentages show proportionate relationships more easily than raw data.

• Convert numbers to a common scale wh~rithe numbers reflect different scales (e.g., gramsversus ounces of cereal consumption; rrionthly salary data versushourlywage data). .

• Several smaller tables reveal pattems better rather than one large, complex iable.

• Complex tables are used as a convenient reference source for multiple elements of data.

• Design a table with a smaller number ofcplurnns than rows.

• Single-space numbers that must b8co~tqred.

• Use gridlines to group numbers wrrhina,table; avoid gridlines between numbers that mustbe compared.

• Use empty space to create gutters between numbers in simple tables.

• Right~align column headers and table numbers.

• Write a phrase or sentence that summarizes your interpretation of the data presented; don'tleave interpretation to chance.• Summary statements might be used as the title of a table or chart in the final research report .• The summary need not mention any numbers.

• Titles should be comprehensive: lricludewhat (subject of the title or messaqe), where (if datahave a geographic base), when (date ortrne perlod covered), and unitof measure. .

• ,Include common information in the title: It lengthens a title but shortens the table's columnheadings.

• Avoid abbreviations in column headings unless well known by your audience.

• Avoid footnotes; if used, use symbolS-like the asterisk-rather than numbers (numbers used asfootnotes can be confused with the content numbers of tile table).

For reference, provide an undertable source line for later reference.

Page 8: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying, and Examining Data

;ilj,~~~~~ji'i,.··INExAM'pEIB,:>.;·~~Y:,;A~!;;<:"

·~};::'~1:~~~~~~%t~~i8·},~t~~~~?;'~i~tr,~~'

.-:;.'::t'<.,·:, ',I <,::.<i;(;";';;·,'>i"''i':n.:;>XiJ,:,!.:.-'--'-''-''-'~~~"'-'''-~~-'-~~~~~=::;"/:"., .. \ci·::~.;';,~(,.'~'. Online Spending and Purchases by Internet Users in Select Countries in

. Western Europe, September 2006 (average);.<,

", --.".~': ;. Spending Purchases

Eur 790 6

Eur1159 11

Eur 509 8

Eur 521 10

Eur 454 7

Eur 681 7

Eur1406 7

Eur 452 5Eur 1013 g

Eur 1201 18

Belgium

Denmark

France

Germany___ --C: ... Ci/:' .-:.i

__ lta_I'-y ,;X;r~~j\:'M?f,',;, <,Netherlands

Norway

I •..Spain

Sweden

United Kingdom

Source: Synovate and SPA Market Research-UK for the European Interactive AdvertisingAssociation (ElM), January 2007.

80134 www.eMarketer.com

435

The next table recasts the data using Bigwood and Spore's

guidelines. First the table title has changed; now the six-monthperiod on which the spending data are based is more obvious.We've also changed the column headers to reflect that eachis an averaqe, and we have right-justified the headers and the

numbers. We've arranged the table by Average Spending (euro)

in descending order and interpreted the euro cOlumn'by

a dollar conversion column. W~might not 'need ih~'if~~k.column if we were euro spender's ourselves but,ifw~'ace;'~

familiar with another currency, the addition of this column h . . ....us interpret the data. With this arrangement, the scandi·na~ia'g~::;l{~

countries are looking more attractive, as is the U.K.

western European Six-Month On!ine SpenrJinrJ ancl Purchases

Average Spendi:;9(Euros}

Average Number01 Purchases

::\veiage Spending

{Oo!!~rs)'

Norway'i,551'.70

1406

United Kingdom 1201 18

Denmark 1159 11 1,50~3.?~!'

Belgium

Netherlands

790101~3 ~9 1~,3_1_.2_.,_8_6_

6 1,024.63

7

Sweden

681

France

Italy

8

8(33.26

Germany 10521

7

Spain 452 SfjG.24

Source: Svnovate and SPA Market Pesearch-Ulc Ior tho European Interactive Aclvertising Association (ElM), January 2007.

! 1 euJ'{;· i 2(:i~;'; cioil8;'S

Page 9: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

e-part IV Analysis and Presentation of Data

Average Spending(Euros)

Average Numberof Purchases

.. :,; .. c.' ... _U_n_ite_d_K_in_g_d_o_m ~ 1_2_0_1 18_M_o_s_t_fr_e_Q_u_en_t_b_u_y_e_rs_

[{~l~;;":::'-:{>i'-~-:-:-a-a~-~-----------1-;-~9-1--------:-~----------

Sweden 1013 9

France 509 8 Avoraqe frequency buyers

Norway 1406 7

Netherlands 681 7Italy 454 7

Belgium 790 6

Spain 452 5 least frequent buyers

Average 818.6 8.8

Source: Synovate and SPA Market Research-UK for the European Interactive AdvertisingAssociation (ElM). January 2007.

Finally, we offer a recasting of the data based on a newly calcu-

lated column, Average Transaction (Euro). Germany doesn't look

so attractive now.

One last note on tables: as a researcher you want to strive for

consistency. If you are ordering from most to least, choose this

arrangement for every table.

Western European Six-Month Online Spending and Purchases

Average AverageSpending Average Number Average Transactlon Transaction

(Euros) of Purchases (Euro s) (Dollars)-

NorWay 1406 7 201 specialty shoppers 261

Belgium 790 6 132 171

Sweden 1013 9 118 146

Denmark 1159 11 ;05 137

Netherlands 681 7 97 126

Spain 452 5 90 117

United Kingdom 1201 18 67 87

Italy 454 7 65 84

France 509 8 6') W),)

Germany 521 10 52 bar(Jain hunters 68

Source. Synovate and SPA Market Research-UK for the European Interactive Adv8I'lising Association (ElM). January 2007.

'1 eCHO ~. 1.2367 dollars.

Page 10: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying, and8<amining Data 437

AverageSpending Average Number Average Transaction

(Euros) of Purchases (EUTOS)

521 10 52

509 8 64

454 7 65

1201 18 67

452 5 90

681 7 97

1159 11 105

1013 g 113

790 6 132

1406 7 201

---------------------------------------~.;".

Spain 117

Netherlands 126

Source: Synovate and SPA Market Research-UK for the European Interactive Advertising Association (EIAA), January 2007.

"1 euro = 1.2967 couars

enclosed area. Data analysts find histograms useful for (I) displaying all intervals in a distribution,even those without observed values, and (2) examining the shape of the distribution for skewness,kurtosis, and the modal pattern. When looking at a histogram, one might ask: Is there a single hump(a mode)? Are subgroups identifiable when multiple modes are present? Are straggling data valuesdetached from the central concentration?'

The values for the average annual purchases variable presented in Exhibit 16-4 were measured on aratio scale and are easily grouped. Other variables possessing an underlying order are similarly appropri-ate for histograms. A histogram would not be used for a nominal variable like ad recall (Exhibit 16-3)that has no order to its categories.

I A histogram of the average annual purchases is shown in Exhibit J 6-5. The midpoint for each inter-val for the variable of interest, average annual purchases, is shown on the horizontal axis; the frequencyor number of observations ill each interval, on the vertical axis. We erect a vertical bar above the mid-point of each interval on the horizontal scale. The height of the bar corresponds with the frequency ofobservations in the interval above which it is erected. This histogram was constructed with intervals20 increments wide, and the last interval coruai ns only two observations, 200.9 and 218.2. These val-ues are found in Prime.Sef ls average annual purchases frequency table (Exhibit 10-4). Intervals witho counts show gaps in the data and alert the analyst to look for problems with spread. When the LiPpU

tail of the distribution is compared with the frequency table, we find three extreme v.ilues (183.2, 206.<),and 218.2). Along with the peaked midpoint and reduced number of observations in the upper tail, thishistogram warns us of irregularities in the data.

Page 11: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

e-part IV Analysis and Presentation of Data

>E.xhibit 16-4 Average Annual Purchases of PrimeSell's Top 50 Customers

I Cumulative c. . Cumulative.rue Frequency Percent Percent _'.~". Value Frequency . Percent Percent '~.

54.9 1 2 2 75.6 2 54

55.4 1 2 4 76.4 2 56

55.6 1 2 6 77.5 2 5856.4 1 2 8 78.9 2 60

56.8 1 2 10 80.9 2 62

56.9 1 2 12 82.2 1 2 64

57.8 1 2 14 82.5 1 2 66

58.1 1 2 16 86.4 1 2 68

51

8.2 1 2 18 88.3 1 2 70

58.3 1 2 20 102.5 1 2 72

58.5 1 2 22 104.1 1 2 74

59.9 2 4 26 110.4 1 2 76

61.5 1 2 28 111.9 1 2 78

62.6 1 2 30 118.6 1 2 80

;81 2 32 123.8 1 2 82

6.0 2 4 36 131.2 1 2 84

86.3 1 2 38 140.9 1 2 86

67.6 1 2 40 146.2 1 2 88

69.1 1 2 42 153.2 1 2 90

6192 1 2 44 1632 1 2 92

70.5 1 2 46 166.7 1 2 9472.7 2 48 183.2 1 2 96

72.9 2 50 206.9 1 2 98

73.5 2 52 218.2 1 2 100- -Total 50 100

;:,Exhibit 16-5 Histogram of PrimeSell's Top 50 Customers' Average Annual Purchases

o

15

>-0c 10(j)

:::J0'(j)

U::

5

Average annual purchases

Page 12: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying, and Examining Data 439

Stem-and-Leaf Displays"The stem-and-leaf display is a technique that is closely related to the histogram. It shares some ofthe histogram's features but offers several unique advantages. It is easy to construct by hand for small

j,;amples or may be produced by computer programs. In contrast to histograms, which lose informa-ion by grouping data values into intervals, the stem-and-leaf presents actual data values that can be

!'nspected directly, without the use of enclosed bars or asterisks as the representation medium. This'eature reveals the distribution of values within the interval and preserves their rank order for finding

!:hemedian, quartiles, and other summary statistics. It also eases linking a specific observation back tohe data file and to the subject that produced it.

Visualization is the second advantage of stern-and-leaf displays. The range of values is apparent at ajslance, and both shape and spread impressions are immediate. Patterns in the data-such as gaps whereha values exist, areas where values are clustered, or outlying values that differ from the main body ofthe data-are easily observed. .I To develop a stem-and-leaf display for the data in Exhibit 16-4, the first digits of each data item/lre arranged to the left of a vertical line. Next, we pass thrOUgh. the average annual purchases per-I~entages in the order they were recorded and. place the last digit for each item (the unit position,1.0) to the right of the vertical line. Note that the digit to the right of the decimal point is ignored.rThe last digit for each item is placed on the horizontal row corresponding to its first digit(s). Now itis a simpJe matter to rank-order the digits in each row, creating the stern-and-leaf display shown inExhibit 16-6.I Each line orrow in this display is referred to as a stem, and each piece of information on the stem iscalled a leaf The first line or row is

51455666788889

The meaning attached to this line or row is that there are 12 items in the data set whose first digit is five:4,55,55,56,56,56,57,58,58,58,58, and 59. The second line,

6112466799

shows that there are eight average annual purchase values whose first digit is six: 61, 62, 64, 66, 66,67,69, and 69.

When the stem-and-leaf display shown in Exhibit 16-6 is turned upright (rotated 90 degrees to theIleft), the shape is the same as that of the histogram shown in Exhibit 16-5.

>Exhibit 16-6 A Stem-and-Leaf Display of PrimeSeli's Average Annual Purchases Data

5 4556667888896 124667997 022356788 02268910 2411 01812 313 114 0615 316 361718 31920 621 8

Page 13: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

440 spart IV AnalysisandPresentationofData

>Exhibit 16-7 Pareto Diagram of MindWriter Repair Complaints

800r---------------------------~~====~

(j) 600C'(ija.E00•...'(ij 4000.~'0•...Q).0E:JZ

oProblem not Multiple

resolved repairsneeded

Shippingdelay

Repairtime

Phoneaccessdelay

Pareto Diagrams

100%99%95%

80%'"C'iija.E00•...

50% '(ija.~'0~C<1>t"ID0..

0%

Pareto diagrams derive their name from a 19th-century Italian economist. In quality management,J. M. Juran first applied this concept by noting that only a vital few defects account for most problemsevaluated for quality and that the trivial may explain the rest. Historically, this has come to be knownas the 80120 rule-that is, an 80 percent improvement in quality or performance can be expected byeliminating 20 percent of the causes of unacceptable quality or performance.

The Pareto diagram is a bar chart whose percentages sum to 100 percent. The data 'arederivedfrom a multiple-choice, single-response scale; a multiple-choice, multiple-response scale; or frequencycounts of words (or themes) from content analysis. The respondents' answers are sorted in decreas-ing importance, with bar height in descending order from left to right. The pictorial array that resultsreveals the highest concentration of improvement potential in the fewest number of remedies. Ananalysis of MindWriter customer complaints is depicted as a Pare to diagram in Exhibit 16-7. The cu-mulative frequency line in this exhibit shows that the top two problems (the repair did not resolve thecustomer's problem, and the product was returned multiple times for repair) accounted for 80 percentof the perceptions of inadequate repair service.

-;,.

Boxplots?The boxplot, or box-atid-whislcer plot, is another technique used frequently in exploratory data analy-sis." A boxplot reduces the detail of the stem-and-leaf display and provides a different visual imageof the distribution's location, spread, shape, tail length, and outliers. Boxplots are extensions of thefive-number summary of a distribution. This summary consists of the median, the upper and lowerquartiles, and the largest and smallest observations. The median and quartiles are used because theyare particularly resistant statistics. Resistance is a characteristic that "provides insensitivity to local-ized misbehavior in data."? Resistant statistics are unaffected by outliers and change only slightly inresponse to the replacement of small portions of the data set.

Recall the discussion of the mean and standard deviation in Appendix 15a. Now assume we takea data set [5,6,6,7,7,7,8,8,9) and calculate its mean. The mean of the set is 7: the standard deviation1.22. If the 9 is rcpl ace d with 90, the mean becomes 16 and the standard deviation increases to 27.78.Tile mean is now two times larger than most of the numbers in the di xtr ibution, a nd the standard

------------------------------.~

Page 14: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying,and'Examining Data

deviation is more than 22 times its original size. Changing only one of nine values has disturbed thelocation and spread summaries to the point where they no longer represent the other eight values.Both the mean and the standard deviation are considered nonresistant.statistlcs; they are susceptibleto the effects of extreme values in the tails of the distribution and do not represent typical valueswell under conditions of asymmetry. The standard deviation is particularly problematic because itis computed from the squared deviations from the mean.!? In contrast, the median and quartiles arehighly resistant to change. When we changed the 9 to 90, the median remained at 7 and the lower andupper quartiles stayed at 6 and 8, respectively. Because of the nature of quartiles, up to 25 percent ofthe data can be made extreme without perturbing the median, the rectangular composition of the plot,6r the quartiles themselves. These characteristics of resistance are incorporated into the constructionof boxplots. .I Boxplots may be constructed easily by hand or by computer programs. The basic ingredients of theplot are:

I. The rectangular plot that encompasses 50 percent of the data values.2. A center line (or other notation) marking the median and going through the width of the box.3. The edges of the box, called hinges.4. The "whiskers" that extend from the right and left hinges to the largest and smallest values."

These values may be found within 1.5 times the interquartile range (IQR) from either edge of thebox. These components and their relationships are shown in Exhibit 16-8.

When you are examining data, it is important to separate legitimate outliers from errors in measure-

~

I ent, editing, coding, and data entry. Outliers, data points that exceed + 1.5. the interquartile range,eflect unusual cases and are an important source of information for the study. They are displayed or:iven special statistical treatment, or other portions of the data set are sometimes shielded from theirtrffects. Outliers that are entry mistakes should be corrected or removed during editing.

Exhibit 16-9 summarizes several comparisons that 6are of.help.to the analyst. Boxplots are an excellent diag-no.stic tool; especially when graphed on the same scale. The percent boost in company

t d b b t t· '.The .upper.two plots in the exhibit are both symmetric, revenue crea eyes prac Ices Inbut one is larger than the other. Larger box widths are data quality.sometimes used when the second variable, from the same

441

>Exhibit 16-8' Boxplot Components

Largestobserved valuewithin 1.5 IORof upper hinge

Smallestobserved valuewithin 1.5 IORof lower hinge

Extremeor faroutsidevalue

Outsidevalueor outlier

\

Outsidevalueor outlier

IIIIIIIII

Outer fence310R plusupper hinge

,I

I I I I1 .••.. 1.5IQR-.;..I_IQR_I-+-1.5IQR-rlI I I I

Inner fence Hinge: Hinge: Inner fencelower hinge Lower Upper 1.5 IOR plusminus quartile quartile upper hinge1.510R I I

150% of II observed II values are II within the II box II

Outer fencelower hinqeminus310R

--------------,-

Page 15: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

442 »part IV Anatysisand Presentation of Data

>Exhibit 16-9 Diagnostics with Boxplots

• Symmetric•

•Symmetric-larger relative size inproportion to sample size

• Right skewed•

• Leftskewed•

• Small spread•

.-cxJ-..-cxJ-.

Notched at the median for a test ofthe equality of population medians

measurement scale, comes from a larger samplesize.The .box widths should be' proportional t~ thesquare root of the sample size, but not all plotting programs account for this.'? Right- and left-skeweddistributions and those with reduced spread are also presented clearly in the plot comparison. Finally,groups may be compared by means of multiple plots. One variation, in which a notch at the medianmarks off a confidence interval to test the equality of group medians, takes us a step closer to hypoth-esis testing." Here the sides of the box return to full width at the upper and lower confidence intervals,When the intervals do not overlap, we can be confident, at a specified confidence level, that the medi-ans of the two popularions are different.

In Exhibit 16-10, multiple boxplots compare five sectors of PrimeSell's customers by their averageannual purchases data. The overall impression is one of potential problems for the analyst: unequalvariances, skewness, and extreme outliers. Note the similarities of the profiles of finance and retailingin contrast to the high-tech and insurance sectors. If hypothesis tests are planned, further examinationof this plot for each sector would require a stem-and-leaf display and a five-number summary. Fromthis, we could make decisions on the types of tests to select for confirmatory analysis (see Chapters 17,18, and 19).

Increasingly, participant data are being attached to their geographic dimension as Geographic Infor-mation System (GIS) software and coordinate measuring devices have become more affordable andeasier to use. Essentially a GIS works by linking data sets to each other with at least one common datafield (e.g .. a household s street address). The GIS allows the researcher to connect target and classi fica-Lion variables from a survey to specific geographic-based databascs like U.S. Census data, to developa richer understanding of the sample's attitudes and behavior. Wilel) radio fr-equency identification

(RFID) data become more prevalent, much behavioral data will be ab1 to connect with these newgeographically rich databascs.

Page 16: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

;.chapter 16 Exploring, Displaying, and Examining Data 443

>Exhibit 16-10 Boxplot Comparison of Customer Sectors

2,500

2,000(i)c~ 1,500"E~2 1,000;;::eQ.

Qi 500z '*~rr...

+ .",:--

0

-500

Financial Health High-tech

Sector

Insurance Retailing

t "' 12 HtlU~\fl'Q Vnrt:;;•• t lot'! ,!;.. 132 Hov-'S"'''Q Uni':,:.

1A~•• Dl~i~low

Foreclosure ActiO(lS (OHO<JSinOunns

Sorneiin18S there is n~')better vI/ay to eJisplay data than with c.l map. In LOU~, nome foreclosures in The United States hiial"i JII-ti:~-!C't"dd!:' f~ut c:jff"~i·?ni r-:.;:t·t~ o! !!l~)er: Intry ';!0;~_~2ff.:,·-t0d (lW~r~;nn/ hy th(~ C:l!hnrinlC niOrVF:lgr-; ~(!SiS. jJ,i. 8.quir:k

g'ance, when yf!O~lIX)l!c.ldata 81"0maoned i'nu can lsll \ivi;iC;il stE%S 'Nere most afff~ct8cJ Tllis map frcml fieaa/Tracshows home frXf;closure listinos. l:;y si'::'"1.te.

Page 17: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

444 >part IV Analysis and Presentation of Data

The most common way to display such data is with a map. Colors and patterns denoting knowledge,attitude, behavior, or demographic data arrays are superimposed over street maps (finest-level GIS),block-group maps,or county, state, or country maps to help identify the best locations for stores basedon demographic, psychographic, and life-stage segmentation data. Florists array promotional responseinformation geographically and use the map to plan targeted promotions. Consumer and business-to-business researchers use mapping of data on ownership, usage level, and price sensitivity in plottinggeographic rollouts of new products. Although this is an attractive option for exploratory analysis, itdoes take specialized software and hardware, as well as the expertise to operate it. Students are encour-aged to take specialized courses on GIS to expand their skill set in this growing area.

Throughout this section we have exploited the visual techniques of exploratory data analysis to lookbeyond numerical summaries and gain insight into the patterns of the data. Few of the approaches havestressed the need for advanced mathematics, and all have an intuitive appeal for the analyst. When themore common ways of summarizing location, spread, and shape have conveyed an inadequate pictureof the data, we have used more resistant statistics to protect us from the effects of extreme scores andoccasional errors. We have also emphasized the value of transforming the original scale of the dataduring preliminary analysis rather than at the point of hypothesis testing.

Depending on the management question, we can gain valuable insights by examining the data withcross-tabulation.Cross·tabulationis a technique forcornparing data from two or more categoricalvariables such as gender and selection by one's company for an overseas assignment. Cross-tabulationis used with demographic variables and the study's target variables (operationalized measurement ques-tions). The technique uses tables having rows and columns that correspond to the levels or code valuesof each variable's categories. Exhibit 16-11 is an example of a computer-generated cross-tabulation.This table has two rows for gender and two columns for assignment selection. The combination of thevariables with their values produces four cells. Each cell contains a count of the casesof the joint clas-sification and also the row, column, and total, percentages. The number of-row cells and column cells isoften used to designate the size of the table, as in this 2 X 2 table. The cells are individually identifiedby their row and column numbers, as illustrated. Row and column totals, called marginals, appear atthe bottom and right "margins" .Qfthe table. They show the counts and percentages of the separate rowsand columns.

> Cross= Tabulation

... 4-' ~

:-.Exhibit 16-11 SPSS Cross-Tabulation of Gender by Overseas Assignment Opportunity

OVERSEAS ASSIGNMENT

Cellcontent ----+---- No

RowTotal'2

GENDERMale

~o64.555.640.0

Female32

84.24~.432.0Cell 2. 1 ----+--------t-

(row 2, column 1)

ColumnTotal

7272.0

100100.0

2828.0

6262.0

Marginals3838.0

Page 18: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying,and Examining Data 445

When developing data displays, some research analysts tum not

to heavy duty statistical software packages like SPSS or SAS,

but to the spreadsheet on their desktop. Two enterprising pro-

fessors from Pennsylvania State University, Gary L. Ulien andmind Rangaswamy, have teamed to developed Microsoft Excel

plug-ins that empower the spreadsheet to create some com-monly desired displays with just a few mouse clicks. "With thissoftware product, users will be able to use the power of world-

class analytics from within Excel, an interface with which they are

already comfortable," claims their website.

Each plug-in offers a unique template for data entry where

the charting directions are embedded. Once the data are en-

tered, a series of windows guides the user through the process

of creating the display, such .asthe GE Matrix shown here. The

user can practice thetEJChniqves on sample data from real.com-

panies or enter hisorher OWfl data.

The plug-ins currently facilitate displays for forecasting, con-joint analysis, customer choice analysis, customer lifetime value

analysis, GE Portfolio Matrix plotting, positioning analysis, re-source allocation analysis, new-product and service design, and

segmentation/targeting analySis-ail common analytical exer-cises for managers involved in marketing engineering. Thanks to

Ulien and Rangaswamy, some sophisticated display techniques

have been significantly simplified.

www.mktgeng.com

~!:'< ~ '!JoM' ••••••• 'Y- t- """!6._ ~ .•• ·vDro)C.d~ i!i:~':'"' ~,'" IUP,'-, -G.

-----------.~-------~.--:----~----.~-.----.-..-----.--,...,---:

Finance weights

~~I-----~~~~ \.,

.g 1- '_·_--f-( "'SO""'."'w"'-or"-.. -------l

, I____ J.

, _-._ __ .•.........• _ - , .

I!

iiI

computers!

I

Cross-tabulation is a first step for identifying relationships between variables When tables are COIl- .

structed for statistical testi ng, we call them contingency tables, and the test determines if the c1assi flea-tion variables are independent of each other (see chi-square in Chapter 17). Of course, tables may belarger than 2 X 2.

The Use of PercentagesPercentages serve two purposes in data presentation, Fir..t, they simplify the data by reducing ~l!! nurn-bcrs to a range from 0 to 100. Second, they translate the data into standard form, with a base of 100, for

relative comparisons. In a sampling situation, the number of cases Ulat fall into a category is meaning-less unless it is related to some base. A count of 28 overseas assignees has little mean ng unless we

Business Strength

Page 19: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

41,6 >partlV Analysis and Presentation of Data

>Exhibit 16-12 Comparison of Percentages in Cross-Tabulation Studies byOverseas Assignment

Study J. Study 2OVERSEAS ASSIGNMENT OVERSEAS ASSIGNMENT

CountYes No Row Pet Yes No

Row Col Pet Row]. 2 Total Tot Pet 1 2 Total

GENDER 22 40 62 GENDER 225 675 900

./Male 1 35.5 64 ..'; 62.0 Male 1 25.0 75.0 60.0

711.6 55.6 62.5 59.222.0 40.0 15.0 45.0

b 32 311 135 465 600female 2 84.2 38.0 Female 2 77.5 40.0

44.4 40.1132.0 31.0

Column 28 ,72 100 Co l uan 360 ],],40 1500Total 211.0 72.0 100.0 Total 24.0 76.0 100.0

know it is from a sample of 100. Using the latter as a base, we conclude that 28 percent of this study'ssample has an overseas assignment.

Although the above is useful. iris even more useful when the research problem calls for a compari-son of several distributions of data. Assume the previously.reported data were-collected five years agoand the present study had a sample of 1,500, of which 360 were selected for overseas assignments. Byusing percentages, .we.can.see the relative, relationships and shifts in the; data (see Exhibit 16-12).

With two-dimension tables, the selection of a row or column will accentuate a particular distributionor comparison. This raises the question about which direction the percentages should be calculated.Most computer programs offer options for presenting percentages in both directions and interchangingthe rows and columns of the table. But in situations in which one variable is hypothesized to be the pre-sumed cause, is thought to affect or predict 'a response, or is simply antecedent to the othervariable, welabel it the independent variable. Percentages should then be computed in the direction of this variable.Thus, if the independent variable is placed on the row, select row percentages; if it is on the column,select column percentages. In which direction should the percentages run in the previous example? Ifonly the column percentages are reported, we imply that assignment status has some effect on gender.This is implausible. When percentages are reported by rows, the implication is that gender influencesselection for overseas assignments.

Care should be taken in interpreting percentages from tables. Consider again the data in Exhibit 16-12.From the first to the second study, it is apparent that the percentage of females selected for overseasassignments rose from 15,8 to 22.5 percent of their respective samples. This should not be confusedwith the percentage within each sample who were women with overseas assignments, a number whichincreased from 6 percent (Study 1) to 9 percent (Study 2). Among all overseas selectees, in the first study21.4 percent were women, while in the second study, 37,5 percent were women. Similar comparisons canbe made for the other categories. The tables verify an increase in women with overseas assignments, butVie cannot conclude that their gender had anything to do with the increase,

Percentages are used by virtually everyone dealing with numbers-e-but often incorrectly. The fol-lowing guidelines, if used during anal ysis, will help to prevent errors in reporting: 14

o Averaging percentages. Percentages cannot be averaged unless each is weighted by the size ofthe group from which it is derived. Thus, a simple average will not suffice; it is necessary to usea weighted average.

D Use of too large percentages. This often defeats the purpose of percentages=-which is to sim-plify, A large percentage is difficult to understand and is confusing, If a 1,000 percent increaseis experienced, it is better to describe this as a lO-fold increase,

• Using too small a base. Percentages hide the base from which they have been computed, A fig-UI-e of 60 percent when contrasted with 30 percent would appear t suggest a sizable difference.

Page 20: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring.Displaying.andExaminingData 447

BusinessWeek recognizedthe StarbucksCard Duetto™ Visaasone of the importantnew products of 2003. In fact, it was theonly financialproduct on the list. Starbucks Card DuettoTM Visais a multifunctioncard that combines the features of a prepaidstored-valuecard. known as the Starbucks Card. with a regu-lar credit card. Starbucks. in conjunction with Visa and BankOne (now Chase).did extensive research to determine if theproposed new paymentoption had appeal. Focus groups wereused to determinethe level of potential consumer confusionwith the multifunctioncard. determinecard attractiveness.andrefine messaging.A series of online surveys were conductedboth before and after the launch of the product to determinemarket receptivity.A press releaseabout the partnershipabouteight months before launch generated news coverage result-ing in traffic to Starbucks' website. Early purchase intent wasdetermined-py.those Starbucks customers who took initiative

. and siqned up via the website to be prenotifiedbye-mail whenthe card became'available. Among the postlaunch research

questions guiding measurementof retum on marketing invest-ment (ROMI)are:

• Does the card enhancethe Starbucks customerexperience(how satisfiedis each customer, and docustomers feel appreciated).?

• Did the card prove valuableto all partners: Starbucks,Chase,and Visa?

• Did card activity, which is linked to charitable donations.permit Starbucks to give back to the communities inwhich it operates in a significantway?

If you were in charge of this research.what would you be lookingfor during exploratorydata analysis?

www.chase.com; usa.visa.com; www.starbucks.com.

To learnmore about this research.read the case and downloadthe video "Starbucks. Bank One.and VisaLaunch the StarbucksCard Duetto™ Visa" from the OnlineLearningCanter.

Yet if there are only three cases in the one category and six in the other, the differences would.not be ,~ssignificantas they have been made to appear with percentages .

• Percentage decreases can never exceed 100 percent. This is obvious, but this type of mistakeoccurs frequently. The higher figure should always be used as the base or denominator. Forexample, if a price was reduced from $1 to $.25, the decrease would be 75 percent (75/100).

Other Table-Based AnalysisThe recognition of a meaningful relationship between variables generally signals a need for furtherinvestigation. Even if one finds a statistically significant relationship. the questions of why and underwhat conditions remain. The introduction of a control variable to interpret the relationship is oftennecessary. Cross-tabulation tables serve as the framework.

Statistical packages like Minitab, SAS, and SPSS have among their modules many options for theconstruction of n-way tables with provision for multiple control variables. Suppose you are interestedin creating a cross-tabulation of two variables with one control. Whatever the number of values in theprimary variables, the control variable with five values determines the 'number of tables. For some ap-plications, it is appropriate to have five separate tables; for others, it might be preferable to have adjoin-ing tables or have the values of all the variables in one. Management reports are of the latter variety.Exhibit 16-13 presents an example in which all three variables are handled under the same banner.Programs such as this one can handle far more complex tables and statistical information."

An advanced variation on n-way tables is automatic interaction detection (AID). AID is a com-puterized statistical process that requires that the researcher identify a dependent variable and a set ofpredictors or independent variables. The computer then searches among up to 300 variables for thebest single division of the data according to each predictor variable. chooses one, and splits the sampleusing Cl statistical (est to verify tile appropriateness of this choice.

Exhibit I (i·14 shows the tree diagram that resulted from anAID study of customer sa isfaction withMindWriter's CompleteCare repair service. The initial dependent variable. is the overall impression of

Page 21: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

448 >part IV Analysis and Presentation of Data

>Exhibit 16-13 SPSS Cross-Tabulation with Control and Nested Variables

Control VariableCategory 1 Category 2

Nested Variable Nested Variablecat 1 I cat 2 I cat 3 cat 1 I cat 2 , cat 3

Stub ... Cells •••

SEX OF EMPLOYEEMALES FEMALES

MINORITY MINORITYCLASSIFICATION CLASSIFICATION

WHITE NONWHITE WHITE NONWHITEEMPLOYMENT CATEGORY

CLERICAL 16% 7% 18% 7%OFFICE TRAINEE 7% 3% 17% 2%SECURITY OFFICER 3% 3%COLLEGE TRAINEE 7% 0% 1%EXEMPT EMPLOYEE 6% 0% 0%MBA TRAINEE. 1% 0% 0%TECHNICAL 1%

.~ ~. ...,. =

>I::xhibit 16-14 Automatic Interaction Detection Example (MindWriter's Repair Satisfaction)

Overall CompleteCare Impression(Expectations = number scale)

Poor

Resolution of the problem

44% Met few35% Met some m15% Met most ~6% Met all 120% Exceeded !

n = 54 !1<f't_.>r.,::..s."",""""" ~~

Condition on arrival Service rep technical competence Speed of repair

IPoor

I I I

2% Metal!0"10 Exceeded

n = 25

)~~~/~xgellent. "2c7,% Metfew

42% Metsome

,;~a.~:~:.i4;16:t'.~orq Exceeded .'

:[1 = ~~. ,_,,,"".,,)< "'-':C'~''-__ ~

15% Me(f~w;~;;;;·53% Met sOrhe :~

.2.a;% _~A~rtD,bs.'tj~4'% MEif-aIl'·;.,'0% Ex~eeded~>ri.::=A5 ',,' . '-t;,;""",",",,,,,,,;:;;;"b..m!l!

.Alig/Exeeii~rit2%M9tfet;v

34 "(0 MetsQrne .-494.B:J\f1~t m:e/sf":)~_

11% fvjet all"· ,'"4% Exce~aed ".. . -'.- .

P6~r/Av~r?ge .~~O'XrMeU~",j" ;,-;.

2E~i:~~i,p.~ot;,;;~69%: MeFiolfl '; ~~-6%ExCe?~ed ' ~

63% Met few26% Met some

Page 22: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring, Displaying, and Examining Data 449

the repair service. This variable was measured on an interval scale of 1 to 5. The variables that contrib-ute to perceptions of repair effectiveness were also measured on the same scale but were rescaled toordinal data for this example (1-2 = poor, 3 = average, and 4-5 = excellent). The top box shows that62 percent of the respondents rated the repair service as excellent (41 % + 21%). The best predictor ofrepair effectiveness is "resolution of the problem."

On the left side of the tree, customers who rated "resolution of the problem" as poor have fewer ex-pectations being met or exceeded than the average for the sample (6 percent versus 62 percent). A poorrating on "condition on arrival" exacerbates this, reducing the total satisfied group to 2 percent. Fromthis example you can see that the researcher separately studied (applied AID to) each subgroup to findthe variable that when split again makes the next largest contribution to understanding the consumers'evaluation process-and to the reduction of unexplained variation in each subsample. This analysisalerts decision makers at MindWriter to the best- and worst-case scenarios for the CompleteCare ser-vice, how to recover during a problematic month, and which "key drivers," or independent variablesipfluencing the process, should receive corrective resources.

'. summary.

Exploratory data analysis (EOA)provides a perspective andset of toolsto search for clues and patterns in the data.EOAaugments rather than supplants traditional statistics.In addition to numerical summaries of location, spread,and shape, EOA uses visual displays to provide a com-plete and accurate impression of distributions and variablerelationships.

Freque~9 tables array data from lowest to highestvalues with..counts and percentages. They are most usefulfor inspecting the range of responses and their repeatedoccurrence. Bar charts and pie charts are appropriatefor relative comparisons of nominal data. Histograms areoptimally used with continuous variables where intervalsgroup the responses. The Pareto diagram is a bar chartwhose percentages sum to 100 percent. The causes of theproblem under investigation are sorted in decreasing impor-tance, with bar height descending from left to right. Stem-and-leaf displays and boxplots are EOAtechniques that

>keyterrnS

automatic interactiondetection (AID) 447

cross-tabulation 444

provide visual representations of distributions. The formerpresent actual data values using a histogram-type devicethat allows inspection of spread and shape. Boxplots usethe five-number summary to convey a detailed picture ofa distribution's main body, tails, and outliers. Both stem-and-leaf displays and boxplots rely on resistant statistics toovercome the limitations of descriptive measures that aresubject to extreme scores.

2 The examination of relationships involving categoricalvariables employs cross-tabulation. The tables used forthis purpose consist of cells and marginals. The cells maycontain combinations of count, row, column, and total per-centages. The tabular structure is the framework for laterstatistical testing. Computer software for cross-classificationanalysis makes table-based analysis with one or more con-trol variables an efficient tool for data visualization and laterdecision making. An advanced variation on n-way tables isautomatic interaction detection (AID).

marginals 444

nonresistant statistics 441exploratory data analysis (EDA) 430

five-number summary 440'boxplot 440

cell 444 frequency table 431confirmatory data analysis 430 histogram 433

interquartiie range (lOR) 441 stem and-leaf display 439contingency table 445

control variable 447

outliers 441

Pare to diagram 440

resistant statistics 440

Page 23: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

?,~.gi~,~.Y§.$1~lg9W~,sljpD$.,~",. " .'.' ..', " ,', ",.'',,",.''''''",.",'i<;"'" •.•. '- •• -- "0'" 'w"',*,"~',"'~~~JR~~~«'iIii"''iili"Ctt'''~~'ik''''~~;&~!'''';iiJ:1tMi''''''"""4\'';T{~,<.",,,,".,0<,"-l!""'':''~'~<''i~',"\,:, :f·.~."·':;lf·,,:.<!i'r ,",S '~~"7''''J,"'''';<l:~:;''f!:~'h?'-''?r~'-·'''t;;r''·~·k,/o'')''?V'ii·.I>',2''Y,;;g,~~I""""~,."..",%~;""<:,'"f.·"r $, 'b;';"~ • 'e.-" " ';;,,'., :,_",.;·1.;.,'-"· ,.', ",,;"" '" «: .,',••...• :, c,' :"-' ~",,- ,~,~••";~,,-""", <:1,,,, 'o~;~~,!,~~~~';:~~~·':'e',~,,;':'\'rl;"

T srrns in Review

4.50 »part IV AnalysisandPresentationof Data

1 Define or explain:

a Marginals.

b Pareto diagram,

c Nonresistant statistics.

I d Lower control limit.

e The five-number summary.

Making Research Decisions2 Suppose you were preparing two-way tables of percent-

ages for the following pairs of variables, How would you runthe percentages?

a Age and consumption of breakfast cereal.

b Family income and confidence about the family's future.

C Marital status and sports participation.

d Crime rate and unemployment rate.

3 You study the attrition of-entering college freshmenI (those students who enter college as freshmen but don'tstay to graduate). You find the following relationshipsbetween attrition, aid, and distance of home from college.What is your interpretation? Consider all variables andrelationships.

Home Near Home FarAid Receiving Aid Receiving Aid

Yes No Yes No Yes No(%) (%) (%) (%) (%) (%)

Drop Out 25 20 5 15 30 40

Stay 75 80 95 85 70 60

I

4 A local health agency is experimenting with two appealletters, A and B, with which to raise funds. It sends out 400of the A appeal and 400 of the B appeal (each subsampleis divided equally among workinq-class and middle-classneighborhoods). The agency secures the results shown inthe following table.

a Which.appeal is the best?

Ic

bWhich class responded better to which letter?

Is appeal or social class a more powerful independentvariable?

Appeal A Appeal B

Middle Working Middle WorkingClass Class Class Class(%) (%) (%) (%)

Contribution 20 40 15 30

No Contribution 80 60 85 70

100 100 100 100

5 Assume you have collected data on sales associates of alarge retail organization in a major metropolitan area. Youanalyze the data by type of work classification, educationlevel, and whether the workers were raised in a rural orurban setting. The results are shown here. How would youinterpret them?

Annual Retail Employee Turnover per 100 Employees

High Education Low Education

Hourly Hourly HourlySalaried Wage Salaried Wage Salaried Wage

Rural 8 16 6 14 18 18Urban 12 16 10 12 19 20

Bringing Research to Life

6 Identify the variables being cross-tabulated by Sammye.Identify some plausible reasons why such an explorationwould be a good idea.

From Concept to Practice

7 Use the data in Exhibit 16-5 to construct a stem-and-teatdisplay.

a Where do you find the main body of the distribution?

b How many values reside outside the inner fence(s)?

Frorn the Headlines8 Asustek, the Taiwanese manufacturer that basically invented

the netbook category, has been researching more radicaldesign ideas, including a classy wrist-top computer, theWaveface Ultra. It is made from a bendable display that canconnect to the Internet, make phone calls, and crunch data.Essentially, it's a bracelet that acts like a smartphone.

a How might you use such a device to display stimuli forrespondents?

b What is the interactive data exchange potential forresearchers?

Page 24: s-chapter 16personalsiteofannchong.weebly.com/uploads/1/9/0/7/19074109/chap_16.pdfvalues, these techniques are not particularly informative. Exhibit 16-4.(page 438) isa condensed fre-quency

>chapter 16 Exploring,Displaying,andExaminingData 451

Agri Comp NCRCC; Teeing Up and NewStrategic Direction

Mastering Teacher LeadershipProofpoint: Capitalizing on aReporter's Love of Statistics

* You will find a description of each case in the Case Abstracts section of this textbook. Check the Case Index to determinew!lether a case provides data, the research instrument, video, or other supplementary material. Written cases ared1lwnloadable from the text website (www.mhhe.com/cooper11e). All video materialand video cases are available fromthe Online Learning Center.