Chuck Humphrey Data Library University of Alberta.

35
Quantitative Evidence Sociology 519 Chuck Humphrey Data Library University of Alberta

Transcript of Chuck Humphrey Data Library University of Alberta.

Page 1: Chuck Humphrey Data Library University of Alberta.

Quantitative EvidenceSociology 519

Chuck HumphreyData Library

University of Alberta

Page 2: Chuck Humphrey Data Library University of Alberta.

Outline Quantitative evidence

Distinction between statistics and data Observational evidence Statistics are about definitions and

classifications Aggregate data and microdata Understanding the Census

Access to evidence Statistical and aggregate data sources Microdata sources

Page 3: Chuck Humphrey Data Library University of Alberta.

Statistics and Data

Statistics• numeric facts & figures • derived from data, i.e,

already processed• presentation-ready• need definitions• published

Data• numeric files created and

organized for analysis/ processing

• requires processing• not display-ready• need detailed

documentation• disseminated, not published

Page 4: Chuck Humphrey Data Library University of Alberta.

Statistics and Data

Six dimensions or variables in this table

The cells in the table are the number ofestimated smokers.

GeographyRegion

TimePeriods

Social ContentSmokersEducationAgeSex

Page 5: Chuck Humphrey Data Library University of Alberta.

WHERE ARE THE DATA!

Page 6: Chuck Humphrey Data Library University of Alberta.

Statistics and Data

Page 7: Chuck Humphrey Data Library University of Alberta.

Stories are told through statistics

The National Population Health Survey in the previous example had over 80,000 respondents in 1996-97 sample and the Canadian Community Health Survey in 2005 has over 130,000 cases. How do we tell the stories about each of these respondents?

We use statistics to create summaries of these life experiences.

Data enable us to construct the tables or analyses to tell these summarized stories.

Page 8: Chuck Humphrey Data Library University of Alberta.

Statistics are about definitions!

Statistics are dependent on definitions. You may think of statistics as numbers, but the numbers represent measurements or observations based on specific definitions.

Tables are structured around geography, time and social content based on attributes of the unit of observation. These properties all need definitions.

Page 9: Chuck Humphrey Data Library University of Alberta.

Statistics involve classifications!

ClassificationsSex

TotalMaleFemale

Periods1994-19951996-1997

Page 10: Chuck Humphrey Data Library University of Alberta.

Some classifications are based on standards while others are based on convention or practice.

For example, Standard Geography classifications

Statistics involve classifications!

Page 11: Chuck Humphrey Data Library University of Alberta.

What about data? It is helpful to understand some basics

about the origins of data, especially since statistics are derived from data. As we will see later, having a good understanding of data can greatly help in the search for statistics.

There are three generic methods by which data are produced. Statistics are generated from the data produced out of all of these methods.

Observational Methods

Experimental Methods

Computational Methods

Page 12: Chuck Humphrey Data Library University of Alberta.

Methods for producing data Observational

MethodsExperimental

MethodsComputational

Methods

Focus is on developing observational instruments to collect data

Focus is on manipulating causal agents to measure change in a response agent

Focus is on modeling phenomena through mathematical equations

Correlation Causation Prediction

Replicate the analysis (same data or similar)

Replicate the experiment

Replicate the simulation

Statistics summarize observations

Statistics summarize experiment results

Statistics summarize simulation results

Page 13: Chuck Humphrey Data Library University of Alberta.

Facts about statistics and data

Statistics are derived from observational, experimental and simulated data .

A table is a format for displaying statistics and presents a summary or one view of the data.

Tables are structured around geography, time and attributes of the unit of observation.

Statistics are dependent on definitions.

Working with data requires some computing skills with analytic software.

Page 14: Chuck Humphrey Data Library University of Alberta.

Questions to ask about statistics

• Who published this statistic? Can you name the producer or distributor of the

data? You need this information to provide a citation for

each statistic. You should ask yourself what motive is behind

this published statistic.• What view of the data is shown in this

statistic? What level of geography is shown? What time period is shown? What social characteristics are shown?

Page 15: Chuck Humphrey Data Library University of Alberta.

Questions to ask about statistics

• What concepts are represented in this statistic? Are definitions provided with the statistic for

geography, time or the social characteristics? Was a standard classification system used for the

categories of the statistic?

• Can you identify a data source for the statistic? If there isn’t a data source, the statistic isn’t real. Is there enough information that you could find

the data? Can you name the data source itself?

Page 16: Chuck Humphrey Data Library University of Alberta.

The Canadian Census

The Census is the largest survey collected in Canada and is taken every five years.

The last two censuses were in 2001 and 2006. The censuses in years ending in 1 are known as the decennial census and contain certain questions only asked every ten years (e.g., religion.)

Page 17: Chuck Humphrey Data Library University of Alberta.

Census of Population

Two forms are used to collect the Census: 2A, which goes to 80% of the households, and 2B, which goes to the other 20%.

In 2006, the 2A form contained 8 questions while the 2B form had these 8 and 53 additional questions.

Long history of specific questions (see the Census Dictionary.)

Need to understand the content of the Census to know what statistics are possible from the Census.

Page 18: Chuck Humphrey Data Library University of Alberta.

Census Definitions

The Census Dictionary is also important to understand the current definitions for concepts as well as historical definitions.

Here is an example on aboriginal identity:“The Aboriginal identity question was asked for the first time in the 1996 Census. It asked the respondent if he/she was an Aboriginal person, i.e., North American Indian, Métis or Inuit. The question is used to provide counts of persons who identify themselves as Aboriginal persons. The concept of 'Aboriginal identity' was first used in the 1991 Aboriginal Peoples Survey.”

Page 19: Chuck Humphrey Data Library University of Alberta.
Page 20: Chuck Humphrey Data Library University of Alberta.
Page 22: Chuck Humphrey Data Library University of Alberta.

Geographic Unit

Geo-code

Page 23: Chuck Humphrey Data Library University of Alberta.

Geo-referenced data

The unit analysis makes up the rows in the data file and is the object being

described by the other variables the file. The values for this variable are geo-codes

for Census tracts.

Page 24: Chuck Humphrey Data Library University of Alberta.

Geo-referenced data

This case in the data file represents Census Tract 0023.00, which was shown in

the image two slides earlier.

Page 25: Chuck Humphrey Data Library University of Alberta.

The variety of geographic units Statistics Canada groups the variety of

geographic units associated with the Census into two categories:

Source for the graphics: Illustrated Glossary, 2006 Census Geography, Statistics CanadaSource: Illustrated Glossary, 2006 Census Geography, Statistics Canada

Page 27: Chuck Humphrey Data Library University of Alberta.

Census geo-codes Statistics Canada has two categories of

geo-code systems: Standard Geographic Classification (SGC) Other geographic entities

Source for the graphic: Illustrated Glossary, 2006 Census Geography, Statistics Canada

Page 28: Chuck Humphrey Data Library University of Alberta.

Standard geographic classification

Source: Illustrated Glossary, 2006 Census Geography, Statistics Canada

Page 29: Chuck Humphrey Data Library University of Alberta.
Page 30: Chuck Humphrey Data Library University of Alberta.

Standard geographic classification, 2006

The link to Definitions, data sources and methods on the main page of the Statistics Canada website provides a link to Standard Classifications, which includes Geography.

Page 31: Chuck Humphrey Data Library University of Alberta.

Other geographic entities Census Metropolitan Areas

Source for the graphic: Illustrated Glossary, 2006 Census Geography, Statistics Canada

Metropolitan Areas 2006 Map of Edmonton CMA

Page 32: Chuck Humphrey Data Library University of Alberta.

For characteristics about Canadians, you need to become familiar with Statistics Canada’s website.

This is a complex website. Use the “Popular picks” list on the home page and search for statistics by browsing subject terms.

Historical Statistics

Online sources for statistics

Page 33: Chuck Humphrey Data Library University of Alberta.

E-STAT is a portal to free CANSIM time series statistics and Census results from 1981 to 2006.

CANSIM on Statistics Canada’s website charges $3.00 a time series, while these statistics accessed through CANSIM on E-STAT are free.

E-STAT

Page 34: Chuck Humphrey Data Library University of Alberta.

The Library homepage has useful guides for locating statistics online and in print

Online guide to published stats

Page 35: Chuck Humphrey Data Library University of Alberta.

Microdata & aggregate data

Microdata• from observational methods

• created from the respondents in a survey

Aggregate Data

•statistics organized in a data file structure

•derived from microdata sources

•used in GIS & time series analysis