Data discovery from a digital library perspective

12
Data discovery from a digital library perspective Greg Janée, Darren Hardy UC Santa Barbara

description

Data discovery from a digital library perspective. Greg Janée, Darren Hardy UC Santa Barbara. Outline. Questions grappling with granularity struggling with search dithering over distribution pondering process Integrating search with access. institution (NASA). data center (GSFC). - PowerPoint PPT Presentation

Transcript of Data discovery from a digital library perspective

Page 1: Data discovery from a digital library perspective

Data discoveryfrom a

digital library perspective

Greg Janée, Darren HardyUC Santa Barbara

Page 2: Data discovery from a digital library perspective

2

Outline

• Questions– grappling with granularity– struggling with search– dithering over distribution– pondering process

• Integrating search with access

Page 3: Data discovery from a digital library perspective

3

Granularity

institution (NASA)

data center (GSFC)

program (MODIS)

product (sea surface temperature)

resolution (1km)

space

time

granule

datum

type

organization

Page 4: Data discovery from a digital library perspective

4

Approaches I

• ADL– uniform object (metadata) representation– flat list of collections (=containers)– possible extensions:

• collections as first-order objects• nested containers

• THREDDS– hierarchical “collection” datasets– “coherent” datasets (=aggregation server?)– “direct” datasets

Page 5: Data discovery from a digital library perspective

5

Approaches II

• Granularity on the Web...– webpage– multi-page document– website

• ...and sidestepping it– uniform representation (webpage)– page linking– visible, decomposable identifiers (URLs)

Page 6: Data discovery from a digital library perspective

6

• Use heuristics to return “best” match

dataset

inheritdescriptive metadata

aggregateintrinsic metadata

Flattening granularity

Page 7: Data discovery from a digital library perspective

7

Search

• Type– text, numeric, space, time, ...

• Source– data itself– intrinsic metadata– added (usually descriptive) metadata– 3rd party

Page 8: Data discovery from a digital library perspective

8

Distribution

• Centralized system– eg. Google, ECHO– SPOF; requires resources

• Peer-to-peer– eg. BRICKS, built on P-GRID– MPOF; requires commitment

• ADL: incomplete peer-to-peer

Page 9: Data discovery from a digital library perspective

9

A “textbook” search process

• Classic process (Lancaster 1979)– Information need– Stated request– Selection of database– Search strategy– Search in database– Screening of output

• Web search - about the same 25 years later

Page 10: Data discovery from a digital library perspective

10

What’s the real process?• Irrational search (Pharo & Järvelin 2006)

– Textbook search processes insufficient– Disjointed incrementalism theory

• Many smaller steps• Learning during a search• Subjective & dynamic information needs over time

• What’s the ideal for earth science data users?– How do you inform choices during search?– How do you formulate a search, and what’s the

context?– When is enough enough?

Page 11: Data discovery from a digital library perspective

11

Integrating search with access

• File menu– Open...– Search library...– Close– Quit

• Query results returned as a THREDDS catalog?

Page 12: Data discovery from a digital library perspective

12

We’re funded to do this!