Download - Finding and Using Secondary Data and Resources for Research

Finding and Using Secondary Data and Resources for Research

Karen Whiteman, PhDJune 10, 2014

1

Overview of Presentation

•What is Secondary Data? • Finding and Accessing Data • Online Demo • Creating a Personalized Dataset• Support and Resources

2

Secondary Data Myths• Secondary data in research is more time consuming and

complicated than other methodologies.

False. While there is a certain degree of difficulty using secondary data, working with secondary data can and should be adapted to the skill level of the researcher.

• Secondary data is inferior to the alternative of collecting one’s own data.

False. Using secondary data is not a replacement for personal data collection; it is most useful in conjunction with other methodologies, such as experimentation, survey research, or clinical research.

3

Weighing the Pros and ConsPros• Secondary data are often collected using well-established

measures with known psychometric properties for the specific population being studied

• Many secondary datasets contain, or can be created to provide, diverse samples that are likely to be representative of more broad populations

• Secondary datasets are often large enough to provide good statistical power for most types of planned analyses

• Cost-effective (i.e., takes less time than collecting your own data)• IRB is likely expedited Cons• Inability to select specific questions or measurements• Lack of control over precise timing of data collection

4

Questions to Consider…• Are you interested in looking at data at one point in time or over time? • Are you interested qualitative, quantitative, or mixed methods? • What group of people do you want to study (target population)? • Older adults• LGBT• Specific race/ethnicity• Rural/urban• City, State, National, International

• What topic area are you interested in? • Criminal justice• Education• HIV/AIDS• Mental health• Substance use

5

Questions to Consider…

6

Approach to Successful Research with Large Datasets1. Define your research topic and research questions2. Select a database3. Get to know your database4. Structure your analysis and presentation of findings in a way that is clinically meaningful

7

Data Banks

8

• Inter-university Consortium for Political and Social Research (United States)

• The UK Data Service

• Council of European Social Science Data Archives

• Australian Social Science Data Archive

Inter-university Consortium for Political & Social Research

9

Types of Data

Quantitative • Micro data are the coded numerical responses to surveys with

a separate record for each individual respondent • Macro data are aggregate figures, for example country-level

economic indicators *data formats include SAS, SPSS, Stata, R

Qualitative Restricted files

10

Benefits of Large Scale Government Data • Good quality data • Produced by experienced research organizations • Usually nationally representative with large samples • Good response rates • Very well documented • Can contact agency for question

• Hierarchical data • Treatment model effects on individual• Intra-household effects on individual

• Longitudinal Data• Allows for comparisons over time• Experience working with longitudinal datasets

11

What Can I do With the Data?

• Comparative research, restudy or follow-up study

• Re-analysis/secondary analysis

• Research design and methodological advancement

• Replication of published statistics

• Teaching and learning 12

Find and Analyze Data

Data search (basic, advanced) • Enter search term• Browse by topic • Browse by series• Browse by geography• Browse by investigator • Browse by data format• Browse international data • View all studies

13

Special Conditions • Anyone can access the data, however, there are special conditions:

• Need prior IRB approval from your institution

• Need to complete special licensure

• Complete Approved Researcher forms

14

Inter-university Consortium for Political & Social Research

• The Inter-university Consortium for Political and Social Research

15

mailto:http://www.icpsr.umich.edu/icpsrweb/landing.jsp

mailto:http://www.icpsr.umich.edu/icpsrweb/landing.jsp

Finding data – Other related catalogues• Health and Retirement Study

• Labor force participation, health transitions at end of work life, income, pension plans, health insurance, cognitive function, assets, disability, health care costs.

• National Epidemiologic Survey on Alcohol and Related Conditions• Collects data on background, alcohol and drug consumption, abuse and

dependence, treatment utilization, family history of alcoholism or drug abuse, tobacco use and dependence, medicine use. Current and family mental health (e.g., depression, anxiety and personality disorders, medical conditions, and victimization).

• Youth Risk Behavior Surveillance System• Responses include behaviors that contribute to unintentional injuries and

violence, tobacco use, alcohol and other drug use, sexual behaviors, unintended pregnancy and sexually transmitted diseases (STDs), unhealthy dietary behaviors, and physical inactivity. 16

Now what? Found a database….now what?

Is the database is a format you understand? (SPSS, R, SAS)Check to see if the related variables exist by downloading codebookRun a simple analysis to find out the sample size of the (1) full sample,

(2) control variables, (3) independent variables, (4) dependent variables. Is the sample size adequate?

Has your study been done before?

Special populations Race/ethnicity Age LGBT

Creating a Personalized Dataset• Organizing the project• Create a central repository of the following information:

• Contact information of owners of database• Codebooks• Questionnaires• User Guides• Articles of interest published by others who used the data

18

Creating a Personalized Dataset• Create a personal variable codebook

19

Variable Basline 3-month 6-month Codes Site identification Site_ID 01 = UCSF 02 = Chinatown 03 = Sunset Park 04 = Rochester 05 = U. Penn Sociodemographics Age B_X1 Financial situation b_a13 1=can’t make ends meet;

2=just enough to get along; 3=are comfortable; 4=DK

Race 1=White; 2=Latino/Hispanic; 3=African-American/Black

Moderator Gender S_A1 1=male

2=female

Creating a Personalized Dataset• Structuring the data• Merge files• Clean variables

• Coding (highest # is the reference)• Create new variables if necessary (restructuring, combining)

• Example: Anxiety Disorders, Time, Depression diagnosis

• Check everything TWICE

20

Creating a Personalized Dataset• Statistical Considerations• Weighting the sample

• Weights redistribute the sample to be representative of a larger, well-defined population

• Treatment of missing data• Listwise deletion• Complete case analysis• Multiple imputation• Full maximum likelihood estimation• Restricted maximum likelihood estimation• Last Observation Carried Forward

21

Support and Resources

• Statistical consulting • Statistical forums • http://www.bristol.ac.uk/cmm/learning/support/jisc.html

• Youtube.com• https://www.youtube.com/user/ProfAndyField

• Webinars• www.theanalysisfactor.com

• Private consulting

22