Making Mashups with Marmite
-
Upload
derek-hood -
Category
Documents
-
view
23 -
download
1
description
Transcript of Making Mashups with Marmite
Making Mashups with Marmite
Jeff WongJason I. Hong
Carnegie Mellon University
The Big Picture Problem
• Lots of content out there on the web– But not always in a form amenable to your needs
– Ex. Easy to get a list of hotels in San Jose, not so easy to sort by distance to convention center
• Two observations:– In many cases, all of the data and services people need
already exist, but not connected together
– Unlikely that a web site can predict all possible needs
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
– Ex. MySpace child predators
– Ex. Friendster locations
– Ex. Most popular videos on YouTube, Yahoo Video, …
A Solution: Mashups
• Rapidly growing community of users creating “mashups” combining content from multiple web sites– Ex. Housingmaps.com
– Ex. MySpace child predators
– Ex. Friendster locations
– Ex. Most popular videos on YouTube, Yahoo Video, …
• ProgrammableWeb.com statistics– ~1500 mashups created since April 2005
– 356 open web-based APIs available
But Creating Mashups is Hard
• Requires lots of skill to create a mashup– Ex. Housingmaps creator has PhD in computer science
– Ex. MySpace child predator list took months
• Requires programming expertise in many areas– Web crawling
– Text parsing
– Pattern matching
– Databases
– HTML
MarmiteEnd-User Programming for Mashups
• Main idea: make it easy to create web mashups
• Use a dataflow approach connecting small operators– Inspired by Unix pipes and Apple’s Automator
• Example:– Get all events from Upcoming.org
– Filter out events that are too old
– Put them all onto a map
• Runs inside of a standard web browser
Set of Operators
Data Flow View
Data View
Using Marmite (Envisioned)
• Extract content from one or more web pages – names, addresses, dates, phone #, URLs
• Process it in a data flow manner– filtering out values or adding metadata
– integrating with other data sources (similar to a database join operation)
• Direct the output to a variety of sinks– databases, map services, text files, visualizations, web
pages, or source code that can be further edited
Marmite
• Motivation and Examples• Features and Design Rationale• User Evaluation
Features and Design Rationale
• Conducted a series of quick evaluations to understand design space and potential problems– Automator
– Lo-fi prototypes
Automator
Informal Automator Evaluation
• Had three novices try three simple web-based tasks– Warm-up task
– Traverse a set of web pages
– Download a set of images
• Some findings:– Some difficulties knowing how to start and what to do next
– Little feedback about state of system between operations
– Difficult to iterate due to network speed issues
Lo-Fi Prototypes
• 6 paper prototypes with 20 participants
Design Solutions
• Problem: how to start and what to do next• Solution: Suggest next actions
– Weak data typing to find types (addresses, numbers, etc)
– Filter operators to only show relevant ones
– Suggest operators that might be applicable
Design Solutions
• Problem: little feedback about state of system between operations
• Solution: link data flow and data view together– Many systems take program-centric view (ex. Automator)
or data-centric view (ex. spreadsheets)
– Use hybrid data flow / data view, showing an operation and its effects together
– Data view usually “spreadsheet”, other views possible too (for example, maps)
Design Solutions
• Problem: difficult to iterate due to network speeds• Solution: cache data, let people “replay” data
– Reload, pause, play
Other Design Findings
• Screen real estate issues– Collapsible operators, leaving a readable label
Extracting Generic Content
• Can’t have pre-defined extractor operators for every possible web site– Need a more general way of extracting data from pages
• Developed a generic wizard UI for selecting links– Content from that set could be extracted via other operators
– Uses Solvent (MIT), an XPath-based algorithm for finding patterns in web pages
• Finds “groups” of related web content based on how HTML is structured
Marmite
Operators
• Operators have input types – Operator uses this to guess which columns it wants
• Operators have output types
Implementation
• JavaScript (for underlying code) and Extensible Binding Language (XBL for UI)
• Operators currently in JavaScript– Ideally could be scriptable in any programming language
– Currently ~15 operators
Marmite
• Motivation and Examples• Features and Design Rationale• User Evaluation
Evaluation
• Informal user study with 6 people– 2 novices
– 2 people with spreadsheet experience (formulas)
– 2 people with programming experience
• Tasks (in increasing difficulty)– Warmup task showing how to retrieve a set of addresses
and how to geocode an address
– Search for and filter out events further than a week away
– Compile a list of events from two event services and plot them on a map
– Recreate the housingmaps site
Results
• Three people able to complete all tasks in ~1 hour– First two users confused about suggested actions
(automatically popped up, made manual for other 4 users)
– Novice made some progress, not able to finish all tasks
• Able to re-create housingmaps in ~15 minutes
Marmite
More Results
• Biggest barrier was understanding the data flow– Did not understand input and output concept
– Applied operators as one-off, did not realize that it was a static representation of flow
– Did not understand data flow and data view were linked
Future Directions
• Short-term– Better screen-scraping operators
– More operators
– Better connection with web services (WSDL and REST)
– Better help for starting a data flow
• Long-term– Intelligence analysis
– Better visualizations
– Location-based services
Conclusions
• Marmite, a tool for creating web-based mashups– Extract content from one or more web pages
– Process it in a data flow manner
– Direct the output to a variety of sinks
• Hybrid data flow / data view• User evaluation shows some promising results
Jeff Wong, Jason Hong, Making Mashups with Marmite: Re-purposing Web Content through End-User Programming, CHI 2007
Marmite
Types of Operators
• Sources– Add data into Marmite by querying databases, extracting
information from web pages, and so on.
• Processors– modify, combine, or delete existing rows. Example operators
include geocoding (converting street addresses to latitude and longitude) and filtering. Processor operators might add or remove columns as well
• Sinks– redirect the flow the data out of Marmite. Examples include
showing data on a map, saving it to a file, or to a web page.