Harpers.org: a Semantic Web(ish) site for Harper’s Magazine
Paul FordAssociate Web Editor, [email protected]
Harper’s is…
- A magazine of literature, politics, culture, and the arts published continuously from 1850
- A small non-profit
Available content
- The Weekly Review, an emailed summary of world events, from 2000
- The Harper’s Index, a statistical portrait of the world, from 1998
- Public domain, scanned-in archives from 1850-1982
- Readings- Occasional features
And that’s it.
- Maybe full text of issues will be offered someday, but not soon. So…
- How do we get more value out of limited content?
Solution
- Hack up the what we have into bits by content type, then…
- Reassemble it according to link targets…
- Which are arranged in a taxonomy…
- Creating a very small “Semantic Web” for Harpers.org
A quick demo…
- >>>
How it works
- Simple set of ontological relationships (partOf, supervisorOf)
- Taxonomy of content- & narrative content
- that is split into smaller pieces
- & links into the taxonomy
Markup
- Text: “Country Y announced that it had cut off relations with country Z. On Wednesday, something happened to persons X and Y.”
Markup
<event> Country Y announced that it had
cut off relations with country Z.</event>
<event>On Wednesday, something
happened to persons W and X.</event>
Markup
<event on=“2004-03-12” id=“24848”>
Country Y announced that it had cut off relations with country Z.
</event>
Markup
<event on=“2004-03-12” id=“24848”>
<link to=“#CountryY”>Country Y</link> announced that it had cut off relations with <link to=“#CountryZ”>country Z</link>.
</event>
Conditionals
- Some text required conditional markup
- Text: “Country Y announced that it had cut off relations with country Z, and on Wednesday, something happened to persons X and Y.”
Conditionals: ugly, but simple<event>Country Y announced that it had cut off
relations with country Z <cond is=“id”>, and</cond> <cond not=“id”>.</cond></event><event> <cond is=“id”>on</cond> <cond not=“id”>On</cond>on Wednesday, something happened
to persons X and Y.</event>
Conditionals: ugly, but simple- Narrative version
- Country Y announced that it had cut off relations with country Z, and on Wednesday, something happened to persons X and Y.
- Timeline-friendly version- Country Y announced that it
had cut off relations with country Z.
- On Wednesday, something happened to persons X and Y.
All of it gets slurped up
- And turned into a set of triples
- Then processed in-memory- With HTML pages spit out
as a result
Hard, then easy
- Hard to get started (lots of events, facts, and links)
- Easy to keep going, if you don’t mind the markup and use a good text editor
Tools used
- emacs, vi, bbedit- XSLT2.0 (SAXON)- CVS
Why not RDF?
- Not right for redundant content and conditionals
- Easy enough to transform arbitrary structured XML into RDF with XSLT, as needed
- (Or into RSS1.0, RSS2.0, Atom, etc.)
?
For free…
- From 300 individual pages…
- To 1100 pages of “remixed” content – all unique and relevant
- And Google-friendly
And also for free…
- Semantically relevant in-site advertising, if we want it
- Topic-sorted, reusable content
- Permanent, readable URIs
Do people get it?
- Some do, and others just navigate the site as usual
- Harper’s was fine with the learning curve
- “Odd but useful” – Gawker
Results
- Uptick in traffic and subscription revenues
- Low cost of maintenance- Ever-increasing database of
facts and events – adding one Weekly Review adds value to 50 different pages
- Happy client
Why the SemWeb(ish) framework?
- Leaves plenty of room to grow- Web-only content- Full text of issues- Subscriber services- Etc
- Take advantage of new SemWeb tools- Incorporate RDF sources into the
taxonomy- Anticipate Semantic Web browsers
Next?
Make it pretty
- Redesign- Hide some of the
navigation- Turn links on and off
Make it scale
- Currently maxes out at about 20-30 megs of content, due to limits of in-memory DOM representation (10-12x XML document size)
- Use a publicly available storage layer (Kowari, Jena, etc)
- Go triple-crazy
Make it easy to query and navigate
- “Show me everything related to George Bush and Iraq.”
or- “Show me everything related
to politicians and the Middle East.”
- New navigation- ?
Top Related