Terminology for Statistics
How Can End Users Connect?
Stephanie W. HaasSchool of Information and Library ScienceUniversity of North Carolina at Chapel Hill
Open Forum 2000 2
Overview
1. Terminology and End User Searching Characteristics of users and searches Types of queries Other sources of confusion
2. Ideas for Solutions Goals What needs to be solved Possible tools and structures
3. Final Points
Open Forum 2000 3
Terminology and End User Searching
Characteristics of users and searchesTypes of queriesOther sources of confusion
Open Forum 2000 4
Searching isn’t easy
“Query matching is effective only when the search is specific, the searcher knows precisely what he or she wants, and the request can be expressed adequately in the language of the system” (Borgman, 1996, p. 494)
If you don’t know what to call it, you can’t find it.
If you don’t know what it means, you can’t use it.
Open Forum 2000 5
The Mapping Problem
DataElement(s)
AgencyTerm(s)
User’sTerm(s)
User’sInformation
Need
Search
Open Forum 2000 6
Inside the System – Metadata Registry
Statistical experts’ understanding and usage
Crisp operational definitions (ideal)
Unambiguous terms (ideal)
Minimal or predictable contextual effects
DataElement(s)
AgencyTerm(s)
Open Forum 2000 7
Outside the System
Choice of terms may depend on: user’s domain
knowledge user’s search
knowledge user’s notion of
what is available terms seen
elsewhere luck?
User’sTerm(s)
User’sInformation
Need
Open Forum 2000 8
Users’ Knowledge
Varying sophistication of questions
What is the universe for this survey question, given the questions leading up to it?
What is the current unemployment rate? Please send me the answer before my 9:00 class tomorrow.
Open Forum 2000 9
Types of Queries
Correct (matching) termconsumer price index consumer price
indexObvious synonym
health care medical care (CPI)Conceptual cluster of synonyms/near
synonymswoman, female, girls women
Open Forum 2000 10
Types of Queries (2)
“External” terms, common outside the agency, no direct data element equivalent inside the agency.inflation (generally use CPI or PPI)turnover (retention rate? job or profession
tenure?)new jobs (first appearance on payroll?)
Open Forum 2000 11
Types of Queries (3)
“Trendy” terms. Subset of external terms.cyberjobs (from magazine article)Webmaster (recent coinage)reinvention
Open Forum 2000 12
Types of Queries (4)
Concept access”Give me everything you have about
worker benefits”Good answer requires pulling together
information from many sources (which may be more or less compatible).
(See MapStats for example. http://www.fedstats.gov/mapstats/)
Open Forum 2000 13
Contributing Factors
Confusion about basic statistical conceptsseasonal adjustment
“Indicates the adjustment of timeseries data to eliminate the effect of intrayear variations which tend to occur during the same period on an annual basis.” (BLS Selective Access)
Open Forum 2000 14
“To seasonally adjust a given economic time series is to eliminate that part of the change in the series which can be ascribed to the normal seasonal variation”
“Seasonal adjustment is a mathematical process whereby the effects of recurring non-economic factors are removed from an economic time series.”
(Dictionary of U.S. Government Statistical Terms, 1991)
Open Forum 2000 15
“A term applied to time series from which periodic oscillations with a period of one year have been removed.” (Cambridge Dictionary of Statistics, 1998)
What is this number, and what does it mean?rate, index, ratio, value
Open Forum 2000 16
Contributing Factors (2)
Major conceptual distinctions and when they apply. Different levels of geographical regions,
and the data available at each level (nation, region, state, metropolitan area, county)
Establishment data vs. household data Note the importance of context in the
use of these terms and data.
Open Forum 2000 17
Contributing Factors (3)
Inherent ambiguity: the pay concept Carol Hert & John Fieber, search
terms from FedStats Web Page (http://www.fedstats.gov/), 11/98, 28,248 unique queries
Agency terms used for pay concept include:
income, compensation, earnings, wage, salary
Open Forum 2000 18
BLS/CPS Terms
Total combined income “includes money from jobs, net income
from business, farm or rent, pensions, dividends, interest, social security payments and any other money income received” (CPS)
Compensation “sometimes used to encompass the
entire range of wages and benefits” (BLS Glossary of Compensation Terms)
Open Forum 2000 19
BLS/CPS Terms (2)
Usual weekly earnings “include any overtime pay,
commissions, or tips usually received” (CPS concepts)
Hourly earnings “hourly rate as stated by the
employer…does not include tips, commissions, or any other non-hourly wages.” (CPS interviewer manual)
Open Forum 2000 20
What does this user want? correction officer, income
Monetary income received - including that unrelated to job
Compensation, including benefits - total job package
Usual weekly earnings - including regular overtime
Hourly earnings - excluding overtime
Open Forum 2000 21
Ideas for Solutions
GoalsWhat needs to be solvedPossible tools and structures
Open Forum 2000 22
Goals for Possible Solutions
Maintain the distinction between agency (authority) terms and user terms. Note the distinction between a
terminology and user vocabulary Often lack of structure, stability, or
context (although patterns do exist)
Open Forum 2000 23
Not equally weighted terminologies
T1
Data ElementConcepts
Data Elements
T2
Open Forum 2000 24
Asymmetrical Structure
Agency Terms User Terms
Data ElementConcepts
Data Elements
registry contents
Open Forum 2000 25
Maintenance Issues
Indexing is not the primary function of the agency.
Less than total coverage will still help.Can we assume:
Agency terms are adopted/defined slowly? User terms are more volatile (especially the
“trendy” ones)?
How often must mapping structures, procedures be updated?
Open Forum 2000 26
Easing Users’ Pain
No problem same word(s), same meaning different word(s), different meaning
Support needed (thesaurus, definitions, explanation) different word(s), same meaning
(synonyms) same word(s) or different word(s), some
relationship between meanings (e.g., BT, NT, part-of, domain specific)
Open Forum 2000 27
Same word(s) or different word(s), some undefined overlap in meaning
??? Can these users be helped ??? Same word(s), different meaning (if
unnoticed by user) Same word(s) or different word(s), no
relationship (wrong source of information?)
Open Forum 2000 28
Providing Agency Information
Substituting agency term(s) for user term(s) and/or expanding user term(s) Hidden or overt? Automatic or interactive?
Displaying conceptual term clusters (e.g., gender, race, occupation)
Facilitating browsing
Open Forum 2000 29
Giving definitions and examples source? “official” or basic?
Highlighting usage notes (the footnotes) Who needs to see them? When?
Open Forum 2000 30
Crosswalk
Mapping between agency and user terms
Asymmetrical, build from users’ side80/20 principle for coverageMultiple sources of terms:
Search sessions Interviews with consultants, intermediaries Media reports, textbooks, other “public”
sources
Open Forum 2000 31
Asymmetrical Structure
Agency Terms User Terms
Data ElementConcepts
Data Elements
Crosswalk
Open Forum 2000 32
“Enhanced Indexing”
Expanding agency pay terms, FedStats Web page (Hert & Haas, preliminary findings)
Assume that more overlap between terms increases users’ chances of success
Query sessions where 50% of terms were agency terms Without expansion = 89% With expansion = 73%
Open Forum 2000 33
Other Possibilities
Thesaurus, with relationships such as see and use for
Multilingual thesaurus or dictionary, treating terminologies as equal
Fully incorporate end-user terms into classification or data element concept entries (Desirable?)
Open Forum 2000 34
Final Points
Users are inventive in term use.Users discourage easily.Maintenance is a crucial concern.Is the 80/20 principle useful?
Top Related