Next on OPRAH – Bringing Data Out of the Closet Walter Giesbrecht, Data Librarian York University...
-
Upload
lucinda-park -
Category
Documents
-
view
214 -
download
1
Transcript of Next on OPRAH – Bringing Data Out of the Closet Walter Giesbrecht, Data Librarian York University...
Next on OPRAH –
Bringing Data Out of the Closet
Walter Giesbrecht, Data LibrarianYork University
Jeff Moon, Head, Documents Unit
Queen’s University
OLA SuperConference
Friday, 1 February, 2002
Not this Data …
… but these kinds!
Before we get all shaken up about data and statistics, with warnings that such and such a percent of people get such and such a disease
after following such and such a personal habit...
… it is useful to note that:
• 80% of those who go insane drink coffee, tea, or beer
• 98% of those who commit suicide sleep indoors
• and darned near 100% of those injured in traffic accidents are people who move from one
place to another!
Let’s take a look at
Data and Statistical Analysis…
have you ever seen the movie “Twins”?
Think of “Arnie” as the
“Data” continuum…
Tables, Charts, Graphs
(from books, journals, the web, etc...)
A ‘number’
Raw Survey Data
# French Mother Tongue (1996) in Ontario
Employment levels by
occupation class
Annual inflation rate from 1914 to present
Aggregate Data Microdata
Coded responses of
surveyed individuals
Canada - EmploymentTelecommunication Equipment
Industry
479,285
1914 7.21915 7.3
… …1990 93.31991 98.51992 1001993 101.81994 1021995 104.21996 105.81997 107.61998 108.61999 110.52000 112.1
Aggregate Data:
A Number
Tables, Charts, Graphs Time Series
Sources of Aggregate Data…
Statistics Canada is generally the first stop for Canadian Data:
• The Canada Year Book (print)
• The Daily (web)
• Canadian Social Trends (web/print)
• CANSIM / E-Stat (web) – time series…
• “Canadian Statistics” (web)
• Beyond 20/20 Files – multidimensional tables…
Survey Data (microdata):
Age Sex MarStat Children Income Occ Educ
Person 1 24 M 1 1 5 1 7Person 2 34 F 1 0 3 5 3Person 3 52 F 2 2 4 3 3Person 4 64 F 1 3 6 4 4Person 5 23 M 3 1 7 2 6Person 6 63 F 4 1 5 6 3………Person "n" 29 M 1 0 5 2 2
Statistical analysis software is used to generate meaningful results… e.g. SPSS, SAS.
“variables”“r
esp
on
den
ts”
Sources of Survey Data…
Once again, Statistics Canada is generally the first stop for Canadian Data:
• The “Data Liberation Initiative” (DLI) provides access to hundreds of publicly released survey data files.
Polling Companies (Environics, CROP, etc.) produce microdata files as well.
For US & International data, the “Inter-university Consortium for Political & Social Research” (ICPSR)
Survey DataAggregate DataPostcard Camera
“Fixed”
“Flexible”
Think of “Danny” as the
“Statistical Analysis” continuum…
Percentages
Counts
Standard
Deviations
Tests of
Significance
Descriptive Statistics
Averages
Inferential Statistics
Significance testing
Percentages Counts Standard Deviations
Averages
Tables, Charts, GraphsA ‘number’ Raw Survey Data
Data continuum…
Statistical Analysis continuum…
Aggregate / Descriptive Microdata / Inferential
To review…
Data
Aggregate &
Survey Data
(Microdata)
Statistical Analysis
Counts, Percentages, Averages, Standard Deviations, Cross-tabulations, t-tests,
Regression, etc.
Reference Question Example:
How many of you have had a patron arrive at the Reference Desk with a newspaper article reporting Statistics Canada data?
Globe & Mail, Dec 17, 2001, p A15
“…71% of 15- to 17-year-olds use online chat rooms, double the proportion of the only slightly older 20-
24-year-olds.”
First, note that the article says:
“Statistics Canada, in a study released
last week…”
So… where do you go from here?
First… Let’s try:
http://www.statcan.ca/start.html
Which leads you to the following:
Canadian Social Trends,
Winter 2001
Which leads, in turn to:
Here is the statistic quoted in the Globe…
and here is the source…
So… how do we check out this source?
General Social Survey, 2000
DLI Web Site (or Local Data Centre)
http://www.statcan.ca/english/Dli/dli.htm
Documentation
and Data…
So… going to your campus “Data Centre”
http://library.queensu.ca/webdoc/ssdc/key.htm
AGEGR5 less than or equal to 3
Results…
79.9 %
65.9 %
71 %
48 %vs
CanadianSocialTrends
?Our
cross-tab
“An errata will be issued for the table appearing in CST because the table does not show percentages for those who used the Net in the last month but for those
who used the Net in the last year.”
“The difference in the numbers is because I used the variable H19 while your client is using the variable
H20. H19 asked respondents who had used the Internet in the last year, if they had ever used the Internet to connect to an ONLINE CHAT SERVICE. H20 asked
respondents how often they used the Internet to connect to an online chat service in the last month.”
Reply from Statistics Canada…
So… let’s try again with H19
So we need…
The numbers match!
AND… you’ll note the table now says “last 12 months”
Original Table…
Revised…
Dec 2001
Jan 2002
So…
We can use survey files to verify published results.
But…
We can also use survey files to expand on published results and explore new avenues of research.
For example…
1. What is the influence of gender, education, or income on Internet use?
2. Are there differences between provinces? Between URBAN and RURAL dwellers?
3. Or any number of other “dimensions”… any question asked in the survey.
Survey DataAggregate DataPostcard Camera
“Fixed”
“Flexible”
Sources of Aggregate Data…
– e.g., Canada Year Book, STC print publications
• CD-ROM
– e.g., 1996 Census Profiles, LFHR, other DSP products
• Web-based
– The Daily
– “Canadian Statistics”
– PDF versions of print publications
– Beyond 20/20 Files – multidimensional tables…
– CANSIM / E-Stat – time series
Beyond 20/20: what is it?
• Used to display multidimensional data, i.e., more than 3 dimensions or characteristics at once– e.g., age, sex (usually 3!), geography, date, etc. ...
• allows user to customize the display of the data• very useful for aggregate data, less so for
microdata
Beyond 20/20:what is it used for/in?
• used in an increasing number of STC products,
– many CD-ROM DSP products, • e.g., LFHR, ITC, Profiles, Nation Series,
Dimensions, etc.
– one of available formats on E-Stat
CANSIM
• acronym for CANadian Socio-Economic Information Management System
• time-series data
• available– direct from STC ($)– via E-Stat (free to registered institutions)– via DLI (from UofT)
CANSIM II via E-Stat
Dealing with data really isn’t that hard ...
Don’t be afraid to ask for help!