Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and...

34
Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel Hill [email protected]

Transcript of Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and...

Page 1: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Terminology for Statistics

How Can End Users Connect?

Stephanie W. HaasSchool of Information and Library ScienceUniversity of North Carolina at Chapel Hill

[email protected]

Page 2: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 2

Overview

1. Terminology and End User Searching Characteristics of users and searches Types of queries Other sources of confusion

2. Ideas for Solutions Goals What needs to be solved Possible tools and structures

3. Final Points

Page 3: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 3

Terminology and End User Searching

Characteristics of users and searchesTypes of queriesOther sources of confusion

Page 4: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 4

Searching isn’t easy

“Query matching is effective only when the search is specific, the searcher knows precisely what he or she wants, and the request can be expressed adequately in the language of the system” (Borgman, 1996, p. 494)

If you don’t know what to call it, you can’t find it.

If you don’t know what it means, you can’t use it.

Page 5: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 5

The Mapping Problem

DataElement(s)

AgencyTerm(s)

User’sTerm(s)

User’sInformation

Need

Search

Page 6: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 6

Inside the System – Metadata Registry

Statistical experts’ understanding and usage

Crisp operational definitions (ideal)

Unambiguous terms (ideal)

Minimal or predictable contextual effects

DataElement(s)

AgencyTerm(s)

Page 7: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 7

Outside the System

Choice of terms may depend on: user’s domain

knowledge user’s search

knowledge user’s notion of

what is available terms seen

elsewhere luck?

User’sTerm(s)

User’sInformation

Need

Page 8: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 8

Users’ Knowledge

Varying sophistication of questions

What is the universe for this survey question, given the questions leading up to it?

What is the current unemployment rate? Please send me the answer before my 9:00 class tomorrow.

Page 9: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 9

Types of Queries

Correct (matching) termconsumer price index consumer price

indexObvious synonym

health care medical care (CPI)Conceptual cluster of synonyms/near

synonymswoman, female, girls women

Page 10: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 10

Types of Queries (2)

“External” terms, common outside the agency, no direct data element equivalent inside the agency.inflation (generally use CPI or PPI)turnover (retention rate? job or profession

tenure?)new jobs (first appearance on payroll?)

Page 11: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 11

Types of Queries (3)

“Trendy” terms. Subset of external terms.cyberjobs (from magazine article)Webmaster (recent coinage)reinvention

Page 12: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 12

Types of Queries (4)

Concept access”Give me everything you have about

worker benefits”Good answer requires pulling together

information from many sources (which may be more or less compatible).

(See MapStats for example. http://www.fedstats.gov/mapstats/)

Page 13: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 13

Contributing Factors

Confusion about basic statistical conceptsseasonal adjustment

“Indicates the adjustment of timeseries data to eliminate the effect of intrayear variations which tend to occur during the same period on an annual basis.” (BLS Selective Access)

Page 14: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 14

“To seasonally adjust a given economic time series is to eliminate that part of the change in the series which can be ascribed to the normal seasonal variation”

“Seasonal adjustment is a mathematical process whereby the effects of recurring non-economic factors are removed from an economic time series.”

(Dictionary of U.S. Government Statistical Terms, 1991)

Page 15: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 15

“A term applied to time series from which periodic oscillations with a period of one year have been removed.” (Cambridge Dictionary of Statistics, 1998)

What is this number, and what does it mean?rate, index, ratio, value

Page 16: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 16

Contributing Factors (2)

Major conceptual distinctions and when they apply. Different levels of geographical regions,

and the data available at each level (nation, region, state, metropolitan area, county)

Establishment data vs. household data Note the importance of context in the

use of these terms and data.

Page 17: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 17

Contributing Factors (3)

Inherent ambiguity: the pay concept Carol Hert & John Fieber, search

terms from FedStats Web Page (http://www.fedstats.gov/), 11/98, 28,248 unique queries

Agency terms used for pay concept include:

income, compensation, earnings, wage, salary

Page 18: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 18

BLS/CPS Terms

Total combined income “includes money from jobs, net income

from business, farm or rent, pensions, dividends, interest, social security payments and any other money income received” (CPS)

Compensation “sometimes used to encompass the

entire range of wages and benefits” (BLS Glossary of Compensation Terms)

Page 19: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 19

BLS/CPS Terms (2)

Usual weekly earnings “include any overtime pay,

commissions, or tips usually received” (CPS concepts)

Hourly earnings “hourly rate as stated by the

employer…does not include tips, commissions, or any other non-hourly wages.” (CPS interviewer manual)

Page 20: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 20

What does this user want? correction officer, income

Monetary income received - including that unrelated to job

Compensation, including benefits - total job package

Usual weekly earnings - including regular overtime

Hourly earnings - excluding overtime

Page 21: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 21

Ideas for Solutions

GoalsWhat needs to be solvedPossible tools and structures

Page 22: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 22

Goals for Possible Solutions

Maintain the distinction between agency (authority) terms and user terms. Note the distinction between a

terminology and user vocabulary Often lack of structure, stability, or

context (although patterns do exist)

Page 23: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 23

Not equally weighted terminologies

T1

Data ElementConcepts

Data Elements

T2

Page 24: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 24

Asymmetrical Structure

Agency Terms User Terms

Data ElementConcepts

Data Elements

registry contents

Page 25: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 25

Maintenance Issues

Indexing is not the primary function of the agency.

Less than total coverage will still help.Can we assume:

Agency terms are adopted/defined slowly? User terms are more volatile (especially the

“trendy” ones)?

How often must mapping structures, procedures be updated?

Page 26: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 26

Easing Users’ Pain

No problem same word(s), same meaning different word(s), different meaning

Support needed (thesaurus, definitions, explanation) different word(s), same meaning

(synonyms) same word(s) or different word(s), some

relationship between meanings (e.g., BT, NT, part-of, domain specific)

Page 27: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 27

Same word(s) or different word(s), some undefined overlap in meaning

??? Can these users be helped ??? Same word(s), different meaning (if

unnoticed by user) Same word(s) or different word(s), no

relationship (wrong source of information?)

Page 28: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 28

Providing Agency Information

Substituting agency term(s) for user term(s) and/or expanding user term(s) Hidden or overt? Automatic or interactive?

Displaying conceptual term clusters (e.g., gender, race, occupation)

Facilitating browsing

Page 29: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 29

Giving definitions and examples source? “official” or basic?

Highlighting usage notes (the footnotes) Who needs to see them? When?

Page 30: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 30

Crosswalk

Mapping between agency and user terms

Asymmetrical, build from users’ side80/20 principle for coverageMultiple sources of terms:

Search sessions Interviews with consultants, intermediaries Media reports, textbooks, other “public”

sources

Page 31: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 31

Asymmetrical Structure

Agency Terms User Terms

Data ElementConcepts

Data Elements

Crosswalk

Page 32: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 32

“Enhanced Indexing”

Expanding agency pay terms, FedStats Web page (Hert & Haas, preliminary findings)

Assume that more overlap between terms increases users’ chances of success

Query sessions where 50% of terms were agency terms Without expansion = 89% With expansion = 73%

Page 33: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 33

Other Possibilities

Thesaurus, with relationships such as see and use for

Multilingual thesaurus or dictionary, treating terminologies as equal

Fully incorporate end-user terms into classification or data element concept entries (Desirable?)

Page 34: Terminology for Statistics How Can End Users Connect? Stephanie W. Haas School of Information and Library Science University of North Carolina at Chapel.

Open Forum 2000 34

Final Points

Users are inventive in term use.Users discourage easily.Maintenance is a crucial concern.Is the 80/20 principle useful?