BESDUIA Benchmark for
End-User Structured Data User Interfaces
Persistent URI: http://w3id.org/BESDUI
Authors
Roberto GarcíaGRIHO - HCI & Data Integration Research GroupUniversitat de Lleida, Spain
Eirik BakkeComputer Science and Artificial Intelligence LaboratoryMIT, USA
Rosa GilGRIHO - HCI & Data Integration Research GroupUniversitat de Lleida, Spain
David R. KargerComputer Science and Artificial Intelligence LaboratoryMIT, USA
Juan Manuel GimenoGRIHO - HCI & Data Integration Research GroupUniversitat de Lleida, Spain
Motivation• Inability to reach users traditionally alleged as one
of the main barriers for Semantic Web uptake• No killer app for the Semantic Web?
Desired outcome?• Client applications should hide the complexities of
semantic technologies• For specific tasks, task-specific user interfaces
better satisfy user needs without breaking user experience
Motivation• Anyway, opportunity for Semantic Web user interfaces: datasets
without dedicated user interface• New data collections or rarely used • Combination of existing datasets
• Provide users power of Web-wide connected data to explore and discover unforeseen connections… • Semantic Web killer app?
• Current proposals: • Linked Data browsers, Controlled Natural Language query engines,
faceted browsers,…• Difficult to compare from the user perspective
• What ways of exploring the data they provide?• How efficient they are from a Quality in Use perspective?
Proposal• Benchmark for comparing user interfaces
• Set of typical user tasks • Procedure for measuring performance per task • Low cost and easy to apply, not requiring the
intervention of real users• For UI tools based on semantic or relational data
• Longer term• Trigger a community discussion leading to a
framework for comparing, measuring,… …encourage better semantic search/exploration tools
User Tasks• Criteria:
• Avoid introducing bias from our a priori conception of the problem or experience developing our own tools
• Looked outward to find sets of typical end-user tasks related to structured data exploration
• Applicable both to relational and semantic data• Somewhere to start:
• Berlin SPARQL Benchmark (BSBM), Explore Use Case• Intended for measuring the computational performance
but based on a set of realistic queries inspired by common information needs
User Tasks1. BSBM-1 Find products for a given set of generic features COMBINED 2. ADDED Find products for a given set of alternative features 3. BSBM-2 Retrieve basic information about a specific product for display purposes4. BSBM-3 Find products having some specific features and not having one feature5. BSBM-4 Find products matching two different sets of features6. BSBM-5 Find product that are similar to a given product7. BSBM-6 Find products having a label name that contains a specific string some text8. BSBM-7 Retrieve in-depth information about a specific product including offers
and reviews9. BSBM-8 Give me recent reviews in English for a specific product10. BSBM-9 Get Information about a reviewer11. BSBM-10 Get offers for a given product which fulfill specific requirements
BSBM-11 Get all information about an offer12. BSBM-12 Export the chosen offer into another information system which uses a
different schema
User Tasks• BESDUI includes for each Task, considering the
sample dataset:• Information need:
• “List products of type sheeny with product features stroboscopes OR gadgeteers, and a productPropertyNumeric1 greater than 450”
• Expected output:• “aliter tiredest”, “auditoriums reducing pappies”,
“boozed”, “byplay”, “closely jerries”
User Tasks• Set of tasks is not closed, work in progress, contributions
appreciated• However, quite complete.
References for evaluation:• Information Seeking Strategies (Belkin et al., 1995)
• All dimensions covered by the current tasks• Method of Interaction:
Searching (known item) / Scanning (unknown)• Goal of Interaction:
Learning / Selecting (for retrieval)• Mode of Retrieval:
Recognition (by association) / Specification (identified items)• Resource Considered:
Information / Meta-information
User Tasks• Frameworks of Information Exploration - Towards the
Evaluation of Exploration Systems (Nunes & Schwabe, 2016)
• Work in progress… but complete for some operations and criteria
• Boolean Expressivity• Conjunction values Same Relation and Different Relations
Product feature “A” and feature “B”Product feature “A” and price “100”
• Disjunction values Same Relation and Different RelationsProduct feature “A” or feature “B”Product feature “A” or price “100”
• Negation
Metrics
• Measure Quality in Use (ISO/IEC 25010:2011)
Metrics
BESDUI
Alpha Frontal Asymmetry related to Valence (Pleasure)
“Method for Improving EEG Based Emotion Recognition…” (López-Gil et al., 2016)
“Using SWET-QUM to Compare the Quality in Use of Semantic Web Exploration Tools” (González et al., 2013) http://rhizomik.net/swet-qum/
Metrics• Effectiveness
degree to which users can achieve the tasks with precision and completeness
• BESDUI Metric:Capability: Is performing the task possible with the given system? 0% No – 100% Yes (50% if task has 2 parts)
• Efficiencydegree to which users can achieve tasks investing appropriate amount of resources
• BESDUI Metrics:Operation Count: How many basic steps (mouse clicks, keyboard entry, scrolling) must be performed to carry out the task?Time: How quickly can these steps be executed? Map operations to time using Keystroke Level Model (Card et al, 1980)
Time Efficiency: capability / time, “goals per second” measure
KLM Operator Time (secs.)
K: button press or keystroke 0.2
P: pointing to a target on a display with a mouse 1.1
H: homing the hand(s) on the keyboard or other device 0.4
Applying BESDUI1. Anyone, but preferably an experienced tool
user, loads the dataset and performs the 12 Tasks
2. For each one, record if the tool capable of completing it. If so, detail all interaction steps required
3. Map interaction steps to task time (using provided spreadsheet)
Applying BESDUI• Task 1:
“Look for products of type sheeny with product features stroboscopes AND gadgeteers, and a productPropertyNumeric1 greater than 450”
• Tools• Rhizomer:
• Capability: 0% no support for conjunction of values same property
• Virtuoso FCT (Faceted Browser):• Capability: 100%
Virtuoso FCT – Task 1
1. Type “sheeny” and “Enter”, then click “ProductType10”.2. Click “Go” for “Start New Facet”, then click “Options”.3. For “Interence Rule” Click and Select rules graph then “Apply”.4. Click “Attributes”, then “productFeature” and “stroboscopes”.5. Click “Attributes”, then “productFeature” and “gadgeteers”.6. Click “Attributes” and “productPropertyNumeric1”.7. Click “Add condition: None” and select “>”.8. Type “450” and click “Set Condition”.
9K, 2P, 3H 2K, 2P2K, 2P3K, 3P3K, 3P2K, 2P2K, 2P5K, 2P, 2H
Applying BESDUI• Task 2:
“Look for products of type sheeny with product features stroboscopes OR gadgeteers, and a productPropertyNumeric1 greater than 450”
• Tools• Rhizomer:
• Capability: 100% • Virtuoso FCT:
• Capability: 100%
Rhizomer – Task 2
1. Click menu “ProductType” and then “Sheeny” submenu.2. Click “Show values” for facet “Product Feature”.3. Click facet value “stroboscopes”.4. Type in input “Search Product Feature” “gad...” 5. Select “gadgeteers” from autocomplete6. Set left side of “Product Property Numeric1”slider to “450”.
2K, 2P, 1H 1K, 1P1K, 1P4K, 1P, 1H1K, 1P, 1H1K, 2P
Results
Rhizomer Virtuoso FCT
Task Capability Operation Count
Time(seconds) Capability Operation
CountTime
(seconds)
1 0% - - 100% 51 (28K, 18P, 5H)
27.4
2 100% 21 (10K, 8P, 3H)
12.0 100% 53 (29K, 19P, 5H)
28.7
… … … … … … …
* BESDUI provides spreadsheet to compute these metrics
Results• Currently, BESDUI applied to:
• Rhizomer a semantic data exploration tool with facets and pivoting
• Virtuoso FCTthe faceted browser for the Virtuoso RDF data store
• Sieuferd a general-purpose user interface for relational databases
• PepeSearcha search interface for querying SPARQL endpoints
Results & Conclusions
• Sieuferd the most capable but less performant, most complex user interface
• PepeSearch the less capable but more performant, less complex user interface
• Rhizomer best effectiveness/efficiency ratio, more “goals per second”
Averages per Tool Capability K
(0.2s)P
(1.1s)H
(0.4s)Operator
Count Time Time Efficiency (Capability/Time)
Rhizomer 58% 15.9 10.9 2.6 29.3 16.1 3.60Virtuoso FCT 54% 20.4 12.7 3.0 36.1 19.3 2.80Sieuferd 96% 48.7 19.7 2.9 71.3 32.6 2.94PepeSearch 25% 10.3 5.3 5.3 21.0 10.1 2.48
Conclusions• Importance of benchmarks to drive research in a
domain• Simple benchmark (too much?) but adoption key• BSBM useful source of tasks and data
• Synthetic nature results in funny product names like “waterskiing sharpness horseshoes”…but no significant impact (no real users)
• Measure UI without having to involve users• Less reliable but cheaper• Ideal during early dev stages or to compare tools
Future Work• Continue tasks review and extend set of users tasks• Consider additional tools:
• Direct manipulation (Explorator, Tabulator,…)• Interactive Query Building (YASGUI, iSPARQL…)• Relational data (Cipher, BrioQuery,…)• …
• Improve metrics to consider users mental effort• SPARQL command line best UI from a KLM point of view• Considering GOMS, includes cognitive and perceptual operators
• Compare results with real users tests
• Available as GitHub repository: http://w3id.org/BESDUI• Please, FORK and CONTRIBUTE!
Thank you for your attentionQuestions?
[email protected]://rhizomik.net/~roberto/
BESDUI Persistent URI:http://w3id.org/BESDUI
Top Related