CIVIUM: GIS For Everyone, The Information Commons, and The Universal Database CMU HCII 4/16/2003...
-
Upload
shauna-chandler -
Category
Documents
-
view
215 -
download
0
Transcript of CIVIUM: GIS For Everyone, The Information Commons, and The Universal Database CMU HCII 4/16/2003...
CIVIUM:GIS For Everyone,
The Information Commons, and
The Universal Database
CMU HCII
4/16/2003
Peter Lucas
MAYA Design
or:
Gnutella meets
Encyclopedia Galactica
The Universal Database The Information Commons Information-Centric GIS
Convergence of three lines of research:
Toward the Universal Database
Topic: Persistent State in a Distributed World
Premise: If the Net is becoming One Huge Computer, don’t we also need One Huge Database?
Requirement: Information Liquidity
Toward the Universal Database
Conflicting Identity Spaces
Conflicting Schemata
Impediments to Data Liquidity:
Toward the Universal Database
Two new ideas: U-forms Shepherds
One old idea: Layered Semantics
The VIA Repository
U-forms
A generic “container” for mobile data The u-form is an abstract data type, not a
representational format. A u-form is simply a bundle of name-value
pairs associated with a universally-unique identifier (UUID).
attribute name 1 value 1attribute name 2 value 2
… …
attribute name n value n
<UUID>
Shepherds
“Shepherd”
A Shepherd Space
Shepherds
VIA Repository Ultrapeer Server A terabyte-scale distributed repository server (JOSHUA.MAYA.COM)
Data stored uniformly according to the VIA data model: Each item has universally-unique identifier Non-relational storage model Extremely modular Not dependent on fixed schema Rich, mature type system with many language bindings Supports arbitrary collections and entity/relation style programming
Data may be accessed many ways Web browser CSV XML Native language bindings (C/C++, Java, Python, Palm OS) Future: web services
Peer-to-peer “Shepherds” replication architecture permits real-time distributed replication of some or all of database for high-availability access
Distributed Indexing supports disconnected operations
What if we wanted to do a large, public works project in Cyberspace?
Obvious answer: The Digital Commons (AKA virtual encyclopedia)
The Information Commons
Civium Information Space
Open, completely distributed public information space
Information liberated from the machines (don’t lose data when a web-page goes away)
Resolve tension between Rigid editorial control (e.g. Yahoo!) and spontaneous, user-contributed chaos (e.g., Everything2)
Goals:
Civium Information Space
3 Rivers Connect -- regional nonprofit whose mission is to prototype the “Information Commons” in SW Pennsylvania
CIVIUM will be a worldwide generalization of this concept
Intended to create an enduring public information resource
Probably for-profit and not-for-profit aspects
Bulk Imports of Open-source Data
Provides consistent points of reference for users (Places, Airlines, Corporations, Schools, Governments, etc, etc)
Uniform identifiers vastly simplify data fusion problems.
Real world data come with properties (populations, geolocations, census data…) and relationships (distances, transportation networks…) that impose structure and texture on the information space.
Each import enriches the semantic “web of facts”
Civium Information Space
Existing data Places (5.5 million items)
• All worldwide geopolitical entities (every last village!)– Locations (lat/long)– Populations (cities > 100,000)– Administrative units (countries, provinces)– Alternate feature names
• Physical features– Schools, parks, mountains, churches, cemeteries, etc– Marked with feature type, lat/long, often nearby city– Worldwide coverage uneven– Not exhaustive (e.g., not all schools)
Airports• Essentially all commercial and military airports, many airstrips• Runway length• ICAO/FAA codes• Locality; lat/long/elevation
Civium Information Space
Existing data (con’t) U.S. Military Bases U.S. National Parks Amtrak Stations Sample of detailed regional data (SW Pennsylvania)
• Roads
• Hydrography
• Points of interest
• Zip codes
• Landmark buildings
• Cultural/recreational items
• much more
Civium Information Space
Will have soon:
• Complete US Census data– Block-level demographics
» Incomes» Ethnicity» Population density
– All roads/railroads, etc– Landmark buildings– Street address ranges– Zip codes & Zip Code Tabulation Areas (ZCTAs)– Will form basis of ability to geocode by address
and lat/long
Civium Information Space
Will have soon (con’t):• All US schools/colleges and libraries
– Public/private– Demographics– District data– Budget data
• Worldwide transportation network– Highways– Rail– Ship
• Worldwide population density data– Square kilometer resolution– Independent of political borders
• Corporations• Ships at sea• Real-time and historical Weather Reports
Schemata
Underlying storage mechanism completely schema-free
Schemata are “layered” using VIA “Roles” mechanism
A “Role” is specification of the semantics of specified attribute names.
A U-form may play any number of roles (so long as they do not conflict with each other)
Roles may be added to a u-form at any time Introspective: roles are represented as
u-forms (and therefore have UUIDs) U-forms may have overlapping role-sets, so
ontology compatibility and evolution can be negotiated by partners incrementally
Schemata
Examples of RolesEntity
Attributes TYPE Value
LABEL string A one word description of thecontents of this u-form
NAME string A multi-word description of the
contents of this u-formDESCRIPTION string A short text description of the
contents of this u-form
Schemata
Examples of RolesPerson (simplified)
inherits from: ENTITY
Attributes TYPE ValueNAME string The person’s full nameFAMILYNAME string The person’s family nameGIVENNAME string The person’s given nameBIRTHDATE date The person’s birthdateADDRESS UUID Relation to a u-form of
role “address”TELEPHONE UUID relation to a u-form of
role “telephone_numberTITLE string The person’s job title
Schemata
Relations to Formal Ontologies
Role mechanism is not a substitute for formal ontology efforts (e.g., CyCorp.)
MAYA limits its data characterization efforts only to the most basic roles -- motivation primarily to provide a controlled space of attribute names, not to organize knowledge
Roles provide a syntax for capturing external ontologies Conflicting schemata can be incrementally reconciled by
creating redundant attributes and mapping rules between their values
GeoBrowser Demo
GeoBrowser Demo
Timeline Visualization
Encouraging Input
Must define a clear value proposition. Candidate:
Give us your bits We will aggregate, organize, and visualize You get back topsight
Other vital considerations Must establish reputation as a venue that
“matters” Low transaction costs essential: people will
contribute if it is easy
Encouraging Input
Potential model for user contributions:
“Every item a discussion forum”
Each data display (metaphorically) has a “front side” and a “back side”
Front side contains the data and/or visualization
Back side contains user comments, reviews, ratings, etc. (similar to Amazon user reviews)
Zoom In to South-central Asia
View w/o Geo-political Borders
Drag-select Region
Pop-up Display: Refugee Data
QuickTime™ and a decompressorare needed to see this picture.
Desktop “Front Side”
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Desktop “Back Side”
Palmtop “Front Side”
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Palmtop “Back side”:
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Summary
Civium will comprise an open, public information space not controlled by anyone
Peer-to-peer architecture and replication decouple the data from any particular set of storage venues
Centrally-maintained “armature” of bulk-imported public data serves as trellis upon which user-contributed data will accrue
Ultra-peer network of terabyte-scale machines provides framework for access.