Digital libraries with superimposed information - Ph.D. Defense

Post on 19-Jun-2015

422 views 1 download

Tags:

description

Slides from my Ph.D. defense that took place on Jan 28, 2011.

Transcript of Digital libraries with superimposed information - Ph.D. Defense

Digital Libraries with Superimposed Information

PhD Defense28 January 2011

Uma Murthy

Supporting Scholarly Tasks that Involve Fine-grain Information

2

Acknowledgments• My family• Dr. Edward Fox, Dr. Manuel Pérez-Quiñones, Dr. Ricardo Torres, Dr. Lois

Delcambre, Dr. Naren Ramakrishnan, Dr. Eric Hallerman, Lin Tzy Li, Dr. Marcos Goncalves, Yinlin Chen, Nadia Kozievitch, Evandro Ramos, Tiago Falcao, Kapil Ahuja, Dr. John Pitrelli, Dr. Ganesh Ramaswamy, Dr. Andrea Kavanaugh, Dr. Lillian Cassel, Dr. Deborah Tatar, Dr. Donald Orth, Seungwon Yang, Lokeya Venkatachalam, Seonho Kim, Doug Gorton, Ricardo Quintana-Castillo, Monika Akbar, Dave Archer, Susan Price, Rao Shen, Srinivas Vemuri, Xiaoyan Yang, Yonca Haciahmetoglu, Pardha Pyla, Manas Tungare, Sameer Ahuja, Ben Hanrahan, Laurian Vega, Stacy Branham, Tejinder Judge, Rhonda Phillips, Ramya Ravichandar, Hari Pyla, Manjula Iyer, Dr. Noel Greis, Dr. Jack Olin, Venkat Srinivasan, …

• NSF grants (Superimposed information, Digital Government, DL curriculum, CTR, ECDL), Microsoft tablet PC grant, CS department, and Graduate school

3

Motivation: many scholarly tasks involve working with subdocuments

4

Problems• Information is heterogeneous, voluminous,

distributed across locations, and it is challenging to manage, organize, access, retrieve, and use.

• Tools/methods (including paper-based and digital) are not well-integrated.

Ineffective and inefficient task execution

5

A digital library = repository of collections and metadata + services

6

ScenarioFind me species that are darters that have a dorsal fin that looks like this, which is connected to another dorsal fin that looks like this, which might have an orange hue on its edge

Search for subdocuments, in context of other information, incl. other subdocuments

Use it in another task/context

7

Superimposed information enables working with contextualized subdocuments

base (existing) information

superimposed (new) information

marks

8

Hypothesis

A digital library with superimposed information (SI-DL) provides enhanced support to scholarly tasks that involve working with subdocuments

+Scholarly tasks with subdocuments

Provides enhancedsupport to

SI DL

9

Research questionsAn SI-DL RQ1 – How can we model data and services in an

SI-DL? How can we integrate SI into DLs? Can we formally define an SI-DL?RQ4 – Can we prototype an SI-DL to support working with subimages in fish identification?

Subdocument use in scholarly tasks

RQ2 – What constitutes a subimage and SI in fish identification?RQ3 – What are the strategies/contexts of working with subimages in fish identification?

SI-DL support of scholarly tasks that involve working with subdocuments

RQ5 – How does an SI-DL support fish identification and how does that compare with traditional methods of fish identification?RQ6 – How does an SI-DL support working with subimages in fish ID?

10

Research approach

11

Research approach - theory

12

Research approach - practical/user

13

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

14

Review of work done and results

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

•SIMPEL•Enhanced

CMapTools

15

Review of work done and results

•SIMPEL•Enhanced

CMapTools

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

•Case studies of SI use (music & fisheries)•Pilot user studies

16

SuperIDR-v1

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

17

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

18

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

19

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SI-Dl metamodel – v2• Improved metamodel• Case study

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

20

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR-qualitative study

SI-DL design guidelines

21

Subimage and SuperIDR use – a qualitative study

How do people use subimages in fish identification and how does SuperIDR support that use?

Characteristics of subimages and related information

Contexts and strategies of working with subimages in fish identification

SuperIDR support for working with subimages in fish identification

22

Rationale: Maximize Use of SuperIDR

• Recruit people with interest in fish ID• Have a longer duration of use in natural setting and

in targeted tasks• Have them use SuperIDR on their own (data on use

in the wild) and in targeted tasks (opportunity to observe use)

• Collect qualitative data, in multiple ways and from multiple sources, on subimage and SuperIDR use in fish ID

23

Study setup

24

Study procedures

Data collectedInterview responses

Diary entriesLog data of SuperIDR use

Screen captures of task executionSpoken thoughts during task execution

Species id materialsDatabase image

Species id responses

25

Participants:3 groups

P2 (male), P5 (female), P6 (male): Relatively less experienced, undergraduates (UG) or recent UG

P1 (male), P5 (female): Moderately experienced Master’s students, working on theses and/or teaching/research

P3 (male), P4 (female): Highly experienced PhD students, working on research projects

Analyzed participants based on fisheries and fish identification experience, current projects and fish identification practices

26

Subimage/annotation characteristics

940 subimages, annotations, most focusing on part of the fish (image)

27

morphological description, size, color, presence, counts, location

Location

Color

28

Co-presence, morphological comparisons, multiple parts description, connections/relationships, comparison with other information-objects

Connections/relationships

Comparison with other information objects

29

information object as a whole, combination of types

Information object as a whole

Combination of color and count

30

Strategies and contexts that suggest subimage use in fish identification

• In learning methods• In identification (top-down approach,

compare similar species)• To help identify fishes quickly (identify in field

versus the lab or the classroom)• In fishes of the same species (to deal with

variability in appearance)• To verify species using manual inspection

31

Subimage use in SuperIDR

• Marking and annotating subimages (940 subimages and annotations)

• Browsing through subimages in species description, subimages in comparison, subimages in search results

• Text, image, and combined search, complex objects as queries

32

Subimages in species learning methods

33

Manually inspecting subimages while comparing similar species

Complex object as a query

35

“It [SuperIDR] is pulling together different ways of getting to information ... So, not only do I have a taxonomy [and] dichotomous key, but it is also supported by images, many images that I have loaded in myself, that I can compare and contrast right there in the program [SuperIDR]. I can annotate the images, so I know that I kind of looking somewhat into their future [use]. And it kind of just pulls all those tools together, more so than [pulling together] information. It gives me many ways of accessing the same information. The more ways you can come to that information, the better [it is]. Because it is always going to make you more confident about the decision that you are making." [P1 interview]

It depends on how distinct that species [is] and how many other species are similar to that species, I guess … I would never trust the result, I guess, 100\% …you know, based on just one picture and a little bit of written text. I would always want to pull up other species that are somewhat similar and just do a visual inspection myself to be sure that it just was not some bad [query] image that I used or a bad search term." [P3 interview]

“... It would not work if you said that this fish has dark spots. You know you get hundreds of species with dark spots. But, if you got down to a few species and you need to know how many they have ..." [P1 interview]

Context

Manually working with information

SI-DL

36

Guidelines for design of an SI-DLSubdocument Context Multiple ways Manual and

automatic

Theory Subdocument Address, view in context

Apply DL services on SI

Apply DL services on SI

Tools/applications

Subimages in SuperIDR, marks in SIMPEL and CMapTools

Marked region, view in context

Text retrieval, CBIR, combined search, browse, comparison

M: Browse, compareA: CBIR

User studies

Snippets of music, parts of fishes

Context of: musical composition, whole fish/species/family; Co-presence of parts,

Visual marking and notes on fishes, search, browse, compare fish information (incl. subimages)

M: Browse, compareA: Annotation, CBIR, ontology for similar words

37

Conclusions• Working with subdocuments is important and necessary in many

scholarly tasks• An SI-DL provides enhanced support to such scholarly tasks

– Treating subdocuments as first-class objects facilitates management, access, retrieval, and use of subdocuments and associated information

Contributions• Superimposed applications• SI-DL definition (metamodel) and prototype (SuperIDR)• Findings from user studies on use of SI in scholarly tasks

– Insights about subimage use in species identification• Guidelines for SI-DL design• Datasets (images, subimages, annotations)*

38

Future work

• Improved CBIR of subimages and improved combined search (e.g. transfer learning)

• Leverage existing collections to study applicability in other domains

• Crowdsourcing social media to study SI use in a social network context and the

• Participatory SI-DL, when personal and institutional DLs come together

• Comparison of various forms and functions of subdocuments and associated

39

Contributions and publicationsTheory Tools/Applications User studies

SI-DL (RQ1, RQ4) SI-DL metamodel, [5, 8, 9, 15]

SIMPEL, Enhanced CMapTools, SuperIDR [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13]

Use of subdocuments in scholarly tasks (RQ2, RQ3)

Qualitative study [13]

SI-DL support of scholarly tasks with subdocuments (RQ5, RQ6)

SIMPEL, Enhanced CMapTools, SuperIDR 1, 2, 3, 4, 6, 7, 10, 11, 12, 13, 14

Case studies, pilot user studies, SuperIDR formative and classroom evaluation, SuperIDR Qualitative study[1, 2, 3, 4, 8, 14]

40

Publications related to this researchPublished1. SuperIDR: A Tablet PC Tool for Image Description and Retrieval (WIPTE, 2010)2. A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval (ECDL, 2010)3. Superimposed image description and retrieval for fish species identification (ECDL 2009)4. Species identification: fish images with cbir and annotations (JCDL poster, 2009)5. Superimposed information architecture for digital libraries (ECDL, 2008)6. From concepts to implementation and visualization: tools from a team-based approach to IR (SIGIR demo,

2008)7. Further development of a digital library curriculum: Evaluation approaches and new tools (ICADL, 2007)8. A superimposed information-supported digital library (JCDL doctoral consortium, 2007)9. Extending the 5S digital library (DL) framework: From a minimal DL towards a DL reference model (DLF

workshop, JCDL, 2007)10. Enhancing concept mapping tools below and above to facilitate the use of superimposed information (CMC,

2006)11. Sierra - a superimposed application for enhanced image description and retrieval (ECDL demo, 2006)12. Using superimposed and context information to find and re-find sub-documents (PIM, 2006)13. SIMPEL: a superimposed multimedia presentation editor and player (JCDL demo, 2006)

Planned14. A qualitative study on the use of subimages and of SuperIDR – a prototype digital library with superimposed

information – in fish species identification (JCDL, 2011)15. Extending the 5s framework to provide support for cbir, complex objects, and superimposed information

(journal paper)

41

Other published work• Pedagogical Enhancements to a Course on Information Retrieval (TLIR, 2011)• Sustainability of Bits, not just Atoms (CHI sustainability workshop, 2010)• Using an iPhone Application for Diversity Recruitment (ASEE-SE, 2009)• Building an ontology for crisis, tragedy and recovery (NKOS 2009)• Curatorial Work and Learning in Virtual Environments: A Virtual World

Project to Support the NDIIPP Community (JCDL Digital Preservation workshop, 2009)

• A Methodology and Tool Suite for Evaluation of Accuracy of Interoperating Statistical Natural Language Processing Engines (Interspeech 2008)

• VizBlog: a discovery tool for the blogosphere. (DigGov 2007)• Re-finding from a Human Information Processing Perspective (PIM 2006)

42

Thank you

?

?

43

Back up slides

Photo attributions (Flickr)

• A digital library by HacksHaven• Art History With Chris And Mac 6/9: Manet:

Lecture (Mme Manet and Leon) by moonflowerdragon

• Korean music by Homies In Heaven• Old annotations by Lorianne DiSabato• Reading Annotation by Rosa Say

SuperIDR architecture

46

Species learning methods

Variability in fishes of same species

47

Summary of findings of qualitative study

• 13 types of subimages/annotations from 940 subimages/annotations

• Subimages are important and necessary in fish identification

• Identification top down way• Learning using multiple methods• Context is important• Combined search and using a complex object as a

query• SI-DL – bringing together capabilities

48

Morphological comparison

49

Participatory SI-DL [Marchionini, 2010]