Digital Libraries with Superimposed Information
PhD Defense28 January 2011
Uma Murthy
Supporting Scholarly Tasks that Involve Fine-grain Information
2
Acknowledgments• My family• Dr. Edward Fox, Dr. Manuel Pérez-Quiñones, Dr. Ricardo Torres, Dr. Lois
Delcambre, Dr. Naren Ramakrishnan, Dr. Eric Hallerman, Lin Tzy Li, Dr. Marcos Goncalves, Yinlin Chen, Nadia Kozievitch, Evandro Ramos, Tiago Falcao, Kapil Ahuja, Dr. John Pitrelli, Dr. Ganesh Ramaswamy, Dr. Andrea Kavanaugh, Dr. Lillian Cassel, Dr. Deborah Tatar, Dr. Donald Orth, Seungwon Yang, Lokeya Venkatachalam, Seonho Kim, Doug Gorton, Ricardo Quintana-Castillo, Monika Akbar, Dave Archer, Susan Price, Rao Shen, Srinivas Vemuri, Xiaoyan Yang, Yonca Haciahmetoglu, Pardha Pyla, Manas Tungare, Sameer Ahuja, Ben Hanrahan, Laurian Vega, Stacy Branham, Tejinder Judge, Rhonda Phillips, Ramya Ravichandar, Hari Pyla, Manjula Iyer, Dr. Noel Greis, Dr. Jack Olin, Venkat Srinivasan, …
• NSF grants (Superimposed information, Digital Government, DL curriculum, CTR, ECDL), Microsoft tablet PC grant, CS department, and Graduate school
3
Motivation: many scholarly tasks involve working with subdocuments
4
Problems• Information is heterogeneous, voluminous,
distributed across locations, and it is challenging to manage, organize, access, retrieve, and use.
• Tools/methods (including paper-based and digital) are not well-integrated.
Ineffective and inefficient task execution
5
A digital library = repository of collections and metadata + services
6
ScenarioFind me species that are darters that have a dorsal fin that looks like this, which is connected to another dorsal fin that looks like this, which might have an orange hue on its edge
Search for subdocuments, in context of other information, incl. other subdocuments
Use it in another task/context
7
Superimposed information enables working with contextualized subdocuments
base (existing) information
superimposed (new) information
marks
8
Hypothesis
A digital library with superimposed information (SI-DL) provides enhanced support to scholarly tasks that involve working with subdocuments
+Scholarly tasks with subdocuments
Provides enhancedsupport to
SI DL
9
Research questionsAn SI-DL RQ1 – How can we model data and services in an
SI-DL? How can we integrate SI into DLs? Can we formally define an SI-DL?RQ4 – Can we prototype an SI-DL to support working with subimages in fish identification?
Subdocument use in scholarly tasks
RQ2 – What constitutes a subimage and SI in fish identification?RQ3 – What are the strategies/contexts of working with subimages in fish identification?
SI-DL support of scholarly tasks that involve working with subdocuments
RQ5 – How does an SI-DL support fish identification and how does that compare with traditional methods of fish identification?RQ6 – How does an SI-DL support working with subimages in fish ID?
10
Research approach
11
Research approach - theory
12
Research approach - practical/user
13
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
14
Review of work done and results
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
•SIMPEL•Enhanced
CMapTools
15
Review of work done and results
•SIMPEL•Enhanced
CMapTools
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
•Case studies of SI use (music & fisheries)•Pilot user studies
16
SuperIDR-v1
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
17
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
18
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
19
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SI-Dl metamodel – v2• Improved metamodel• Case study
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SuperIDR-qualitative study
SI-DL design guidelines
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
20
Review of work done and results
•SIMPEL•Enhanced
CMapTools
•Case studies of SI use (music & fisheries)•Pilot user studies
SuperIDR-v1
SI-Dl metamodel - v1
• 5S analysis of SI• Initial metamodel
SuperIDR – formative evaluation• SuperIDR improvements• Classroom study design
SuperIDR-classroom study
• Fish id. – learning and identification• SuperIDR improves on
traditional methods• Insufficient data on how
SuperIDR was used
SI-Dl metamodel – v2• Improved metamodel• Case study
SuperIDR-v2
• Lucene indexing•Usability
improvements
SuperIDR-v3
• Image mgmt.• Combined search• Comparison•Usability
improvements
SuperIDR-qualitative study
SI-DL design guidelines
21
Subimage and SuperIDR use – a qualitative study
How do people use subimages in fish identification and how does SuperIDR support that use?
Characteristics of subimages and related information
Contexts and strategies of working with subimages in fish identification
SuperIDR support for working with subimages in fish identification
22
Rationale: Maximize Use of SuperIDR
• Recruit people with interest in fish ID• Have a longer duration of use in natural setting and
in targeted tasks• Have them use SuperIDR on their own (data on use
in the wild) and in targeted tasks (opportunity to observe use)
• Collect qualitative data, in multiple ways and from multiple sources, on subimage and SuperIDR use in fish ID
23
Study setup
24
Study procedures
Data collectedInterview responses
Diary entriesLog data of SuperIDR use
Screen captures of task executionSpoken thoughts during task execution
Species id materialsDatabase image
Species id responses
25
Participants:3 groups
P2 (male), P5 (female), P6 (male): Relatively less experienced, undergraduates (UG) or recent UG
P1 (male), P5 (female): Moderately experienced Master’s students, working on theses and/or teaching/research
P3 (male), P4 (female): Highly experienced PhD students, working on research projects
Analyzed participants based on fisheries and fish identification experience, current projects and fish identification practices
26
Subimage/annotation characteristics
940 subimages, annotations, most focusing on part of the fish (image)
27
morphological description, size, color, presence, counts, location
Location
Color
28
Co-presence, morphological comparisons, multiple parts description, connections/relationships, comparison with other information-objects
Connections/relationships
Comparison with other information objects
29
information object as a whole, combination of types
Information object as a whole
Combination of color and count
30
Strategies and contexts that suggest subimage use in fish identification
• In learning methods• In identification (top-down approach,
compare similar species)• To help identify fishes quickly (identify in field
versus the lab or the classroom)• In fishes of the same species (to deal with
variability in appearance)• To verify species using manual inspection
31
Subimage use in SuperIDR
• Marking and annotating subimages (940 subimages and annotations)
• Browsing through subimages in species description, subimages in comparison, subimages in search results
• Text, image, and combined search, complex objects as queries
32
Subimages in species learning methods
33
Manually inspecting subimages while comparing similar species
Complex object as a query
35
“It [SuperIDR] is pulling together different ways of getting to information ... So, not only do I have a taxonomy [and] dichotomous key, but it is also supported by images, many images that I have loaded in myself, that I can compare and contrast right there in the program [SuperIDR]. I can annotate the images, so I know that I kind of looking somewhat into their future [use]. And it kind of just pulls all those tools together, more so than [pulling together] information. It gives me many ways of accessing the same information. The more ways you can come to that information, the better [it is]. Because it is always going to make you more confident about the decision that you are making." [P1 interview]
It depends on how distinct that species [is] and how many other species are similar to that species, I guess … I would never trust the result, I guess, 100\% …you know, based on just one picture and a little bit of written text. I would always want to pull up other species that are somewhat similar and just do a visual inspection myself to be sure that it just was not some bad [query] image that I used or a bad search term." [P3 interview]
“... It would not work if you said that this fish has dark spots. You know you get hundreds of species with dark spots. But, if you got down to a few species and you need to know how many they have ..." [P1 interview]
Context
Manually working with information
SI-DL
36
Guidelines for design of an SI-DLSubdocument Context Multiple ways Manual and
automatic
Theory Subdocument Address, view in context
Apply DL services on SI
Apply DL services on SI
Tools/applications
Subimages in SuperIDR, marks in SIMPEL and CMapTools
Marked region, view in context
Text retrieval, CBIR, combined search, browse, comparison
M: Browse, compareA: CBIR
User studies
Snippets of music, parts of fishes
Context of: musical composition, whole fish/species/family; Co-presence of parts,
Visual marking and notes on fishes, search, browse, compare fish information (incl. subimages)
M: Browse, compareA: Annotation, CBIR, ontology for similar words
37
Conclusions• Working with subdocuments is important and necessary in many
scholarly tasks• An SI-DL provides enhanced support to such scholarly tasks
– Treating subdocuments as first-class objects facilitates management, access, retrieval, and use of subdocuments and associated information
Contributions• Superimposed applications• SI-DL definition (metamodel) and prototype (SuperIDR)• Findings from user studies on use of SI in scholarly tasks
– Insights about subimage use in species identification• Guidelines for SI-DL design• Datasets (images, subimages, annotations)*
38
Future work
• Improved CBIR of subimages and improved combined search (e.g. transfer learning)
• Leverage existing collections to study applicability in other domains
• Crowdsourcing social media to study SI use in a social network context and the
• Participatory SI-DL, when personal and institutional DLs come together
• Comparison of various forms and functions of subdocuments and associated
39
Contributions and publicationsTheory Tools/Applications User studies
SI-DL (RQ1, RQ4) SI-DL metamodel, [5, 8, 9, 15]
SIMPEL, Enhanced CMapTools, SuperIDR [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13]
Use of subdocuments in scholarly tasks (RQ2, RQ3)
Qualitative study [13]
SI-DL support of scholarly tasks with subdocuments (RQ5, RQ6)
SIMPEL, Enhanced CMapTools, SuperIDR 1, 2, 3, 4, 6, 7, 10, 11, 12, 13, 14
Case studies, pilot user studies, SuperIDR formative and classroom evaluation, SuperIDR Qualitative study[1, 2, 3, 4, 8, 14]
40
Publications related to this researchPublished1. SuperIDR: A Tablet PC Tool for Image Description and Retrieval (WIPTE, 2010)2. A Teaching Tool for Parasitology: Enhancing Learning with Annotation and Image Retrieval (ECDL, 2010)3. Superimposed image description and retrieval for fish species identification (ECDL 2009)4. Species identification: fish images with cbir and annotations (JCDL poster, 2009)5. Superimposed information architecture for digital libraries (ECDL, 2008)6. From concepts to implementation and visualization: tools from a team-based approach to IR (SIGIR demo,
2008)7. Further development of a digital library curriculum: Evaluation approaches and new tools (ICADL, 2007)8. A superimposed information-supported digital library (JCDL doctoral consortium, 2007)9. Extending the 5S digital library (DL) framework: From a minimal DL towards a DL reference model (DLF
workshop, JCDL, 2007)10. Enhancing concept mapping tools below and above to facilitate the use of superimposed information (CMC,
2006)11. Sierra - a superimposed application for enhanced image description and retrieval (ECDL demo, 2006)12. Using superimposed and context information to find and re-find sub-documents (PIM, 2006)13. SIMPEL: a superimposed multimedia presentation editor and player (JCDL demo, 2006)
Planned14. A qualitative study on the use of subimages and of SuperIDR – a prototype digital library with superimposed
information – in fish species identification (JCDL, 2011)15. Extending the 5s framework to provide support for cbir, complex objects, and superimposed information
(journal paper)
41
Other published work• Pedagogical Enhancements to a Course on Information Retrieval (TLIR, 2011)• Sustainability of Bits, not just Atoms (CHI sustainability workshop, 2010)• Using an iPhone Application for Diversity Recruitment (ASEE-SE, 2009)• Building an ontology for crisis, tragedy and recovery (NKOS 2009)• Curatorial Work and Learning in Virtual Environments: A Virtual World
Project to Support the NDIIPP Community (JCDL Digital Preservation workshop, 2009)
• A Methodology and Tool Suite for Evaluation of Accuracy of Interoperating Statistical Natural Language Processing Engines (Interspeech 2008)
• VizBlog: a discovery tool for the blogosphere. (DigGov 2007)• Re-finding from a Human Information Processing Perspective (PIM 2006)
42
Thank you
?
?
43
Back up slides
Photo attributions (Flickr)
• A digital library by HacksHaven• Art History With Chris And Mac 6/9: Manet:
Lecture (Mme Manet and Leon) by moonflowerdragon
• Korean music by Homies In Heaven• Old annotations by Lorianne DiSabato• Reading Annotation by Rosa Say
SuperIDR architecture
46
Species learning methods
Variability in fishes of same species
47
Summary of findings of qualitative study
• 13 types of subimages/annotations from 940 subimages/annotations
• Subimages are important and necessary in fish identification
• Identification top down way• Learning using multiple methods• Context is important• Combined search and using a complex object as a
query• SI-DL – bringing together capabilities
48
Morphological comparison
49
Participatory SI-DL [Marchionini, 2010]
Top Related