Digital libraries with superimposed information - Ph.D. Defense

49
Digital Libraries with Superimposed Information PhD Defense 28 January 2011 Uma Murthy Supporting Scholarly Tasks that Involve Fine-grain Information

description

Slides from my Ph.D. defense that took place on Jan 28, 2011.

Transcript of Digital libraries with superimposed information - Ph.D. Defense

Page 1: Digital libraries with superimposed information - Ph.D. Defense

Digital Libraries with Superimposed Information

PhD Defense28 January 2011

Uma Murthy

Supporting Scholarly Tasks that Involve Fine-grain Information

Page 2: Digital libraries with superimposed information - Ph.D. Defense

2

Acknowledgments• My family• Dr. Edward Fox, Dr. Manuel Pérez-Quiñones, Dr. Ricardo Torres, Dr. Lois

Delcambre, Dr. Naren Ramakrishnan, Dr. Eric Hallerman, Lin Tzy Li, Dr. Marcos Goncalves, Yinlin Chen, Nadia Kozievitch, Evandro Ramos, Tiago Falcao, Kapil Ahuja, Dr. John Pitrelli, Dr. Ganesh Ramaswamy, Dr. Andrea Kavanaugh, Dr. Lillian Cassel, Dr. Deborah Tatar, Dr. Donald Orth, Seungwon Yang, Lokeya Venkatachalam, Seonho Kim, Doug Gorton, Ricardo Quintana-Castillo, Monika Akbar, Dave Archer, Susan Price, Rao Shen, Srinivas Vemuri, Xiaoyan Yang, Yonca Haciahmetoglu, Pardha Pyla, Manas Tungare, Sameer Ahuja, Ben Hanrahan, Laurian Vega, Stacy Branham, Tejinder Judge, Rhonda Phillips, Ramya Ravichandar, Hari Pyla, Manjula Iyer, Dr. Noel Greis, Dr. Jack Olin, Venkat Srinivasan, …

• NSF grants (Superimposed information, Digital Government, DL curriculum, CTR, ECDL), Microsoft tablet PC grant, CS department, and Graduate school

Page 3: Digital libraries with superimposed information - Ph.D. Defense

3

Motivation: many scholarly tasks involve working with subdocuments

Page 4: Digital libraries with superimposed information - Ph.D. Defense

4

Problems• Information is heterogeneous, voluminous,

distributed across locations, and it is challenging to manage, organize, access, retrieve, and use.

• Tools/methods (including paper-based and digital) are not well-integrated.

Ineffective and inefficient task execution

Page 5: Digital libraries with superimposed information - Ph.D. Defense

5

A digital library = repository of collections and metadata + services

Page 6: Digital libraries with superimposed information - Ph.D. Defense

6

ScenarioFind me species that are darters that have a dorsal fin that looks like this, which is connected to another dorsal fin that looks like this, which might have an orange hue on its edge

Search for subdocuments, in context of other information, incl. other subdocuments

Use it in another task/context

Page 7: Digital libraries with superimposed information - Ph.D. Defense

7

Superimposed information enables working with contextualized subdocuments

base (existing) information

superimposed (new) information

marks

Page 8: Digital libraries with superimposed information - Ph.D. Defense

8

Hypothesis

A digital library with superimposed information (SI-DL) provides enhanced support to scholarly tasks that involve working with subdocuments

+Scholarly tasks with subdocuments

Provides enhancedsupport to

SI DL

Page 9: Digital libraries with superimposed information - Ph.D. Defense

9

Research questionsAn SI-DL RQ1 – How can we model data and services in an

SI-DL? How can we integrate SI into DLs? Can we formally define an SI-DL?RQ4 – Can we prototype an SI-DL to support working with subimages in fish identification?

Subdocument use in scholarly tasks

RQ2 – What constitutes a subimage and SI in fish identification?RQ3 – What are the strategies/contexts of working with subimages in fish identification?

SI-DL support of scholarly tasks that involve working with subdocuments

RQ5 – How does an SI-DL support fish identification and how does that compare with traditional methods of fish identification?RQ6 – How does an SI-DL support working with subimages in fish ID?

Page 10: Digital libraries with superimposed information - Ph.D. Defense

10

Research approach

Page 11: Digital libraries with superimposed information - Ph.D. Defense

11

Research approach - theory

Page 12: Digital libraries with superimposed information - Ph.D. Defense

12

Research approach - practical/user

Page 13: Digital libraries with superimposed information - Ph.D. Defense

13

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

Page 14: Digital libraries with superimposed information - Ph.D. Defense

14

Review of work done and results

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

•SIMPEL•Enhanced

CMapTools

Page 15: Digital libraries with superimposed information - Ph.D. Defense

15

Review of work done and results

•SIMPEL•Enhanced

CMapTools

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

•Case studies of SI use (music & fisheries)•Pilot user studies

Page 16: Digital libraries with superimposed information - Ph.D. Defense

16

SuperIDR-v1

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

Page 17: Digital libraries with superimposed information - Ph.D. Defense

17

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

Page 18: Digital libraries with superimposed information - Ph.D. Defense

18

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

Page 19: Digital libraries with superimposed information - Ph.D. Defense

19

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SI-Dl metamodel – v2• Improved metamodel• Case study

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SuperIDR-qualitative study

SI-DL design guidelines

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

Page 20: Digital libraries with superimposed information - Ph.D. Defense

20

Review of work done and results

•SIMPEL•Enhanced

CMapTools

•Case studies of SI use (music & fisheries)•Pilot user studies

SuperIDR-v1

SI-Dl metamodel - v1

• 5S analysis of SI• Initial metamodel

SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design

SuperIDR-classroom study

• Fish id. – learning and identification• SuperIDR improves on

traditional methods• Insufficient data on how

SuperIDR was used

SI-Dl metamodel – v2• Improved metamodel• Case study

SuperIDR-v2

• Lucene indexing•Usability

improvements

SuperIDR-v3

• Image mgmt.• Combined search• Comparison•Usability

improvements

SuperIDR-qualitative study

SI-DL design guidelines

Page 21: Digital libraries with superimposed information - Ph.D. Defense

21

Subimage and SuperIDR use – a qualitative study

How do people use subimages in fish identification and how does SuperIDR support that use?

Characteristics of subimages and related information

Contexts and strategies of working with subimages in fish identification

SuperIDR support for working with subimages in fish identification

Page 22: Digital libraries with superimposed information - Ph.D. Defense

22

Rationale: Maximize Use of SuperIDR

• Recruit people with interest in fish ID• Have a longer duration of use in natural setting and

in targeted tasks• Have them use SuperIDR on their own (data on use

in the wild) and in targeted tasks (opportunity to observe use)

• Collect qualitative data, in multiple ways and from multiple sources, on subimage and SuperIDR use in fish ID

Page 23: Digital libraries with superimposed information - Ph.D. Defense

23

Study setup

Page 24: Digital libraries with superimposed information - Ph.D. Defense

24

Study procedures

Data collectedInterview responses

Diary entriesLog data of SuperIDR use

Screen captures of task executionSpoken thoughts during task execution

Species id materialsDatabase image

Species id responses

Page 25: Digital libraries with superimposed information - Ph.D. Defense

25

Participants:3 groups

P2 (male), P5 (female), P6 (male): Relatively less experienced, undergraduates (UG) or recent UG

P1 (male), P5 (female): Moderately experienced Master’s students, working on theses and/or teaching/research

P3 (male), P4 (female): Highly experienced PhD students, working on research projects

Analyzed participants based on fisheries and fish identification experience, current projects and fish identification practices

Page 26: Digital libraries with superimposed information - Ph.D. Defense

26

Subimage/annotation characteristics

940 subimages, annotations, most focusing on part of the fish (image)

Page 27: Digital libraries with superimposed information - Ph.D. Defense

27

morphological description, size, color, presence, counts, location

Location

Color

Page 28: Digital libraries with superimposed information - Ph.D. Defense

28

Co-presence, morphological comparisons, multiple parts description, connections/relationships, comparison with other information-objects

Connections/relationships

Comparison with other information objects

Page 29: Digital libraries with superimposed information - Ph.D. Defense

29

information object as a whole, combination of types

Information object as a whole

Combination of color and count

Page 30: Digital libraries with superimposed information - Ph.D. Defense

30

Strategies and contexts that suggest subimage use in fish identification

• In learning methods• In identification (top-down approach,

compare similar species)• To help identify fishes quickly (identify in field

versus the lab or the classroom)• In fishes of the same species (to deal with

variability in appearance)• To verify species using manual inspection

Page 31: Digital libraries with superimposed information - Ph.D. Defense

31

Subimage use in SuperIDR

• Marking and annotating subimages (940 subimages and annotations)

• Browsing through subimages in species description, subimages in comparison, subimages in search results

• Text, image, and combined search, complex objects as queries

Page 32: Digital libraries with superimposed information - Ph.D. Defense

32

Subimages in species learning methods

Page 33: Digital libraries with superimposed information - Ph.D. Defense

33

Manually inspecting subimages while comparing similar species

Page 34: Digital libraries with superimposed information - Ph.D. Defense

Complex object as a query

Page 35: Digital libraries with superimposed information - Ph.D. Defense

35

“It [SuperIDR] is pulling together different ways of getting to information ... So, not only do I have a taxonomy [and] dichotomous key, but it is also supported by images, many images that I have loaded in myself, that I can compare and contrast right there in the program [SuperIDR]. I can annotate the images, so I know that I kind of looking somewhat into their future [use]. And it kind of just pulls all those tools together, more so than [pulling together] information. It gives me many ways of accessing the same information. The more ways you can come to that information, the better [it is]. Because it is always going to make you more confident about the decision that you are making." [P1 interview]

It depends on how distinct that species [is] and how many other species are similar to that species, I guess … I would never trust the result, I guess, 100\% …you know, based on just one picture and a little bit of written text. I would always want to pull up other species that are somewhat similar and just do a visual inspection myself to be sure that it just was not some bad [query] image that I used or a bad search term." [P3 interview]

“... It would not work if you said that this fish has dark spots. You know you get hundreds of species with dark spots. But, if you got down to a few species and you need to know how many they have ..." [P1 interview]

Context

Manually working with information

SI-DL

Page 36: Digital libraries with superimposed information - Ph.D. Defense

36

Guidelines for design of an SI-DLSubdocument Context Multiple ways Manual and

automatic

Theory Subdocument Address, view in context

Apply DL services on SI

Apply DL services on SI

Tools/applications

Subimages in SuperIDR, marks in SIMPEL and CMapTools

Marked region, view in context

Text retrieval, CBIR, combined search, browse, comparison

M: Browse, compareA: CBIR

User studies

Snippets of music, parts of fishes

Context of: musical composition, whole fish/species/family; Co-presence of parts,

Visual marking and notes on fishes, search, browse, compare fish information (incl. subimages)

M: Browse, compareA: Annotation, CBIR, ontology for similar words

Page 37: Digital libraries with superimposed information - Ph.D. Defense

37

Conclusions• Working with subdocuments is important and necessary in many

scholarly tasks• An SI-DL provides enhanced support to such scholarly tasks

– Treating subdocuments as first-class objects facilitates management, access, retrieval, and use of subdocuments and associated information

Contributions• Superimposed applications• SI-DL definition (metamodel) and prototype (SuperIDR)• Findings from user studies on use of SI in scholarly tasks

– Insights about subimage use in species identification• Guidelines for SI-DL design• Datasets (images, subimages, annotations)*

Page 38: Digital libraries with superimposed information - Ph.D. Defense

38

Future work

• Improved CBIR of subimages and improved combined search (e.g. transfer learning)

• Leverage existing collections to study applicability in other domains

• Crowdsourcing social media to study SI use in a social network context and the

• Participatory SI-DL, when personal and institutional DLs come together

• Comparison of various forms and functions of subdocuments and associated

Page 39: Digital libraries with superimposed information - Ph.D. Defense

39

Contributions and publicationsTheory Tools/Applications User studies

SI-DL (RQ1, RQ4) SI-DL metamodel, [5, 8, 9, 15]

SIMPEL, Enhanced CMapTools, SuperIDR [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13]

Use of subdocuments in scholarly tasks (RQ2, RQ3)

Qualitative study [13]

SI-DL support of scholarly tasks with subdocuments (RQ5, RQ6)

SIMPEL, Enhanced CMapTools, SuperIDR 1, 2, 3, 4, 6, 7, 10, 11, 12, 13, 14

Case studies, pilot user studies, SuperIDR formative and classroom evaluation, SuperIDR Qualitative study[1, 2, 3, 4, 8, 14]

Page 40: Digital libraries with superimposed information - Ph.D. Defense

40

Publications related to this researchPublished1. SuperIDR: A Tablet PC Tool for Image Description and Retrieval (WIPTE, 2010)2. A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval (ECDL, 2010)3. Superimposed image description and retrieval for fish species identification (ECDL 2009)4. Species identification: fish images with cbir and annotations (JCDL poster, 2009)5. Superimposed information architecture for digital libraries (ECDL, 2008)6. From concepts to implementation and visualization: tools from a team-based approach to IR (SIGIR demo,

2008)7. Further development of a digital library curriculum: Evaluation approaches and new tools (ICADL, 2007)8. A superimposed information-supported digital library (JCDL doctoral consortium, 2007)9. Extending the 5S digital library (DL) framework: From a minimal DL towards a DL reference model (DLF

workshop, JCDL, 2007)10. Enhancing concept mapping tools below and above to facilitate the use of superimposed information (CMC,

2006)11. Sierra - a superimposed application for enhanced image description and retrieval (ECDL demo, 2006)12. Using superimposed and context information to find and re-find sub-documents (PIM, 2006)13. SIMPEL: a superimposed multimedia presentation editor and player (JCDL demo, 2006)

Planned14. A qualitative study on the use of subimages and of SuperIDR – a prototype digital library with superimposed

information – in fish species identification (JCDL, 2011)15. Extending the 5s framework to provide support for cbir, complex objects, and superimposed information

(journal paper)

Page 41: Digital libraries with superimposed information - Ph.D. Defense

41

Other published work• Pedagogical Enhancements to a Course on Information Retrieval (TLIR, 2011)• Sustainability of Bits, not just Atoms (CHI sustainability workshop, 2010)• Using an iPhone Application for Diversity Recruitment (ASEE-SE, 2009)• Building an ontology for crisis, tragedy and recovery (NKOS 2009)• Curatorial Work and Learning in Virtual Environments: A Virtual World

Project to Support the NDIIPP Community (JCDL Digital Preservation workshop, 2009)

• A Methodology and Tool Suite for Evaluation of Accuracy of Interoperating Statistical Natural Language Processing Engines (Interspeech 2008)

• VizBlog: a discovery tool for the blogosphere. (DigGov 2007)• Re-finding from a Human Information Processing Perspective (PIM 2006)

Page 42: Digital libraries with superimposed information - Ph.D. Defense

42

Thank you

?

?

Page 43: Digital libraries with superimposed information - Ph.D. Defense

43

Back up slides

Page 44: Digital libraries with superimposed information - Ph.D. Defense

Photo attributions (Flickr)

• A digital library by HacksHaven• Art History With Chris And Mac 6/9: Manet:

Lecture (Mme Manet and Leon) by moonflowerdragon

• Korean music by Homies In Heaven• Old annotations by Lorianne DiSabato• Reading Annotation by Rosa Say

Page 45: Digital libraries with superimposed information - Ph.D. Defense

SuperIDR architecture

Page 46: Digital libraries with superimposed information - Ph.D. Defense

46

Species learning methods

Variability in fishes of same species

Page 47: Digital libraries with superimposed information - Ph.D. Defense

47

Summary of findings of qualitative study

• 13 types of subimages/annotations from 940 subimages/annotations

• Subimages are important and necessary in fish identification

• Identification top down way• Learning using multiple methods• Context is important• Combined search and using a complex object as a

query• SI-DL – bringing together capabilities

Page 48: Digital libraries with superimposed information - Ph.D. Defense

48

Morphological comparison

Page 49: Digital libraries with superimposed information - Ph.D. Defense

49

Participatory SI-DL [Marchionini, 2010]