From Text and Data to Knowledge: The Social Semantic Web...

84
From Text and Data to Knowledge: The Social Semantic Web in Action Dr. Mark Greaves [email protected] © 2012 Vulcan Inc. Jesse Wang [email protected]

Transcript of From Text and Data to Knowledge: The Social Semantic Web...

From Text and Data to Knowledge

The Social Semantic Web in Action

Dr Mark Greaves markgvulcancom

copy 2012 Vulcan Inc

Jesse Wang jessewvulcancom

What is Vulcan

Vulcan is the asset management firm for Paul G Allen

ndash Vulcan (wwwvulcancom) creates and advances

a variety of world-class endeavors and high

impact initiatives that change and improve the

way we live learn do business and experience

the world

Major Operating Groups

ndash Vulcan Capital

ndash Vulcan Sports and Entertainment

bull Seattle Seahawks Portland Trail Blazers

ndash Vulcan Productions

ndash Vulcan Real Estate

ndash Paul G Allen Family Foundation

ndash Allen Institute for Brain Sciences

ndash Vulcan Technology (Stratolaunch Halo Marine)

2

Paul Allen and Bill Gates in 1968

The Founding of Microsoft

Paul Allenrsquos Concept of the Digital Aristotle

ldquoDigital Aristotlerdquo ndash Inspired by the broad science fiction vision

of Big AI

ndash The volume of scientific knowledge has outpaced our ability to manage it

ndash Need systems to reason and answer questions rather than simply retrieve 1000s of relevant documents

bull Example ldquoWhat are the reaction products if metallic copper is heated strongly with concentrated hydrochloric acidrdquo

Project Formation ndash Initial Workshops conducted in 2001

ndash Digital Aristotle vision formulated

ndash Project Halo created in 2002

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

What is Vulcan

Vulcan is the asset management firm for Paul G Allen

ndash Vulcan (wwwvulcancom) creates and advances

a variety of world-class endeavors and high

impact initiatives that change and improve the

way we live learn do business and experience

the world

Major Operating Groups

ndash Vulcan Capital

ndash Vulcan Sports and Entertainment

bull Seattle Seahawks Portland Trail Blazers

ndash Vulcan Productions

ndash Vulcan Real Estate

ndash Paul G Allen Family Foundation

ndash Allen Institute for Brain Sciences

ndash Vulcan Technology (Stratolaunch Halo Marine)

2

Paul Allen and Bill Gates in 1968

The Founding of Microsoft

Paul Allenrsquos Concept of the Digital Aristotle

ldquoDigital Aristotlerdquo ndash Inspired by the broad science fiction vision

of Big AI

ndash The volume of scientific knowledge has outpaced our ability to manage it

ndash Need systems to reason and answer questions rather than simply retrieve 1000s of relevant documents

bull Example ldquoWhat are the reaction products if metallic copper is heated strongly with concentrated hydrochloric acidrdquo

Project Formation ndash Initial Workshops conducted in 2001

ndash Digital Aristotle vision formulated

ndash Project Halo created in 2002

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Paul Allen and Bill Gates in 1968

The Founding of Microsoft

Paul Allenrsquos Concept of the Digital Aristotle

ldquoDigital Aristotlerdquo ndash Inspired by the broad science fiction vision

of Big AI

ndash The volume of scientific knowledge has outpaced our ability to manage it

ndash Need systems to reason and answer questions rather than simply retrieve 1000s of relevant documents

bull Example ldquoWhat are the reaction products if metallic copper is heated strongly with concentrated hydrochloric acidrdquo

Project Formation ndash Initial Workshops conducted in 2001

ndash Digital Aristotle vision formulated

ndash Project Halo created in 2002

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

The Founding of Microsoft

Paul Allenrsquos Concept of the Digital Aristotle

ldquoDigital Aristotlerdquo ndash Inspired by the broad science fiction vision

of Big AI

ndash The volume of scientific knowledge has outpaced our ability to manage it

ndash Need systems to reason and answer questions rather than simply retrieve 1000s of relevant documents

bull Example ldquoWhat are the reaction products if metallic copper is heated strongly with concentrated hydrochloric acidrdquo

Project Formation ndash Initial Workshops conducted in 2001

ndash Digital Aristotle vision formulated

ndash Project Halo created in 2002

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Paul Allenrsquos Concept of the Digital Aristotle

ldquoDigital Aristotlerdquo ndash Inspired by the broad science fiction vision

of Big AI

ndash The volume of scientific knowledge has outpaced our ability to manage it

ndash Need systems to reason and answer questions rather than simply retrieve 1000s of relevant documents

bull Example ldquoWhat are the reaction products if metallic copper is heated strongly with concentrated hydrochloric acidrdquo

Project Formation ndash Initial Workshops conducted in 2001

ndash Digital Aristotle vision formulated

ndash Project Halo created in 2002

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

The Digital Aristotle and Project Halo

Project Halo is a staged long-range research effort by Vulcan

Inc towards the development of a Digital Aristotle

Project Halo envisions two primary classes of application

1 A digital textbook capable of answering student questions and

helping students with their work

2 A research assistant with broad interdisciplinary skills to help

scientists make connections

A Digital Aristotle is a reasoning

system capable of answering novel

questions and solving advanced

problems in a broad range of

scientific disciplines

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Where to Begin

AI Grand Challenge ndash Read a chapter and answer

questions in the back of the book (Reddy 2003)

Focus on Science ndash Focus on science where

knowledge is explicitly stated

US Advanced Placement Exam ndash Use this US-based test of human

competence as a metric

Systems AI ndash Include the knowledge knowledge

acquisition reasoning and question answering pieces

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Project Halo Development Strategy

KR

Expert

Single Domain

Expert

Small Team of

Domain and KR

Experts

Community of

Scientists

Teachers and KR

Experts

Logic

Queries

AP Question

Answering

AP QA

General QA

Education

AP QA

General QA

Education

Research

Authors

Uses

Pilot Phase II Inquire DA

3 Partial

AP Syllabi

Single

Textbook Partial

AP Syllabus Complete

Domain

2002-2003 2004-2009 2010-2015 2016-

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

What is Inquire

Inquire is a concept for a new kind of electronic textbook ndash Based on Campbell and Reese

Biology

Inquire is a standard electronic textbook but also ndash Contains a full underlying knowledge

base of the bookrsquos contents

ndash Answers the readerrsquos questions and provides tailored instruction

ndash Core technology applies to other areas of biology and other exact sciences (chemistry physics geology etc)

bull Is less applicable in history art languages philosophy etc

ndash Should support other languages

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Inquirersquos Design Goals

An Inquire user can radic READ the textbook contents

dynamically interactively

radic ASK questions and get explanations on any subject in the book

ndash LEARN and master the subject through individualized tutoring

ndash CREATE and explore their own conceptualizations of the material

A Calculator for Biology ndash Simple questions are answered

from specific knowledge in the text

ndash Fully embedded in the textbook

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Building a Question-Answering Textbook

Inquire Application

Knowledge Authoring

Question Answering

Educational Impact

Factual

Relationship

Biology Core Themes

All Questions

Domain experts

(biologists) enter

knowledge from

Campbellrsquos Biology

textbook into a

structured logical

knowledge base

The knowledge

base is connected

to a electronic

version of the

textbook running

on an iPad that

enables students

to ask questions

about the material

as they read

Measure retention

and understanding

for students using

Inquire against

students using

either a paper or

electronic version

of the textbook

Inquire answers a

wide range of

questions by

performing logical

inference on the

knowledge entered

into the knowledge

base

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Inquire Demo

Digital Textbook iPad Application ndash eBook reader (with standard features)

Knowledge-based features ndash Glossary pages (from KB)

ndash Suggested questions and answers

ndash Free-form QA dialog

AURA Knowledge Server ndash Concept taxonomy from Campbell Biology

ndash AURA Knowledge Base (21 chapters)

ndash Suggested question generation

ndash Reasoning and question answering system

ndash AAAI Best Video of 2012

httpwwwyoutubecomwatchv=fTiW31MBtfA

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Inquire vs Siri vs Watson vs Search Engines

Project Halorsquos Inquire ndash Subject Matter biology

ndash Questions Question suggestion and simple English queries

ndash Answers Formatted data and textbook content exact answers

Applersquos Siri ndash Subject Matter Actions on iPhone plus Wolfram Alpha searches

ndash Questions Voice commands short dialogs

ndash Answers Performs tasks Alpha-formatted pages

IBMrsquos Watson ndash Subject Matter General knowledge trivia attempting medical

ndash Questions Jepoardy-style ldquoreversedrdquo questions

ndash Answers Single word or short phrase answers

Google Yahoo Bing Baidu ndash Subject Matter Open domain web knowledge

ndash Questions keywords and search queries

ndash Answers web page listings

13

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Three Main Components of Inquire

Task Encoding knowledge sentence-by-sentence

ndash Define a computer language for capturing knowledge

ndash Represent background knowledge and explicit knowledge

Systems AURA Semantic Media Wiki (SMW)

Metrics ndash Coverage of book at

various depths

ndash Time per sentence

RampD thrusts ndash SMW Lower authoring

costs by crowdsourcing

ndash SILK Capturing more knowledge

Task Map a student question to an answerable query

ndash English language interpretation

ndash Suggest good questions to student

Systems AURA Question Processor

Metrics ndash Usefulness of suggested

questions

ndash Accuracy of free-form question processing

RampD thrusts ndash Rephrasing

ndash Textual Entailment

ndash Learning suggested question mappings

ndash Question generation from the knowledge base

Task Generate a useful answer to a student question

ndash Derive an answer from the knowledge base

ndash English generation and media selection

ndash Answer presentation and relevance pruning

Systems AURA Cyc Text Analysis

Metrics ndash Answer accuracy

ndash Answer usefulness to student users

ndash Time to answer

RampD thrusts ndash Hybrid system Answer

more questions

Knowledge Entry Question Processing Question Answering

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Knowledge from Campbell Biology in Inquire

Most-used biology textbook in the US both high school and college

Total Core Sentences 28878

56 Chapters 262 Sections ndash Avg 47 sectionschapter 515

sentenceschapter 109 sentencessection

ndash Relevant sentences in book (extrapolation from Chaps 6910) 49 14150 sentences

750 Core library concepts

~6000 total concepts in taxonomy ndash ~3000 concepts in Chaps 2-22

~65000 knowledge base assertions

~50000 suggested questions

4 hourssentence authoring time ndash Design planning encoding testing

refining new relation development

60-70 new encoding relationsyear

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Inquirersquos Existing Knowledge Entry Process

Multiple 3-person teams

Highly accurate but costly ndash ~4 person-hours per

sentence

ndash Intensive testing and knowledge refinement

Biology Teams and AI experts

represent each sentence

Senior

SME

Senior KR

Expert

Concept

Map

Planning

Concept

Map Entry

Concept

Map

Testing

Content

Review

KR

Review

Assess amp

Correct

Knowledge Entry Group

Delhi India

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

Universal Truth Authoring Concept Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Knowledge Entry Process

3) Encoding Planning -- 35 time

Group Common UTs ID KRKE Issues

ID Already Encoded Write How to Encode

Pre-Planning QA Check

Status Labeling Encoding Complete KR Issue (Closed)

2) Reaching Consensus -- 14 time

UT Authoring cMap Chosen QA Check

1) Determining Relevance -- 2 time

Highlighting Diagram Analysis QA Check

Status Labeling Relevant Irrelevant (Closed)

6) Question-Based Testing -- 14 time

Use Minimal Test Suite Reasoning JIRA Issues Filed

Encoder Fills KB Gaps

QA Check with Screenshots of ldquoPassing Comparison and Relationship Questions

5) Key Term Review -- 25 time

KR Evaluated by Modeling Expert and Biologist Encoder Makes Changes

KR Evaluated by Modeling Expert and Biologist

QA Check

4) Encoding -- 10 time

Encode File JIRA Issues QA Check

Status Labeling Encoding Complete KE Issue

Peach = EVS Light Blue = SRI

Planning (51)

Testing (39)

Encoding (10)

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

This is the well-known Knowledge Acquisition Problem ndash AI authoring is too expensive too slow not scalable

Three Possible Solutions ndash Automatic Machine ParsingEditing

bull Not good enough for biology textbook sentences

bull Error rates are too high

bull Need humans in the loop

ndash Mechanical Turk Authoring bull Biology expertise is difficult to get

bull Mechanical Turk uses individuals but the Knowledge Entry task appears to require coordination judgment discussion and working together

ndash Social Authoring and Crowdsourcing bull Wikipedia showed this could work for text

bull In 2008 Vulcan started a pilot project to explore social knowledge authoring

How Can We Improve Knowledge Entry

19

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Crowdsourcing for Better Knowledge Acquisition

20

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

edit wow I can change the web

letrsquos share and publish knowledge

to make an [[encyclopedia]]

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Success of Wikis

27

Actual number of articles on enwikipediaorg (thick

blue line) compared with a Gompertz model that leads

eventually to a maximum of about 44 million articles

(thin green line)

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

A Key Feature of Wiki

28

This distinguishes wikis from other publication tools

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Consensus in Wikis Comes from

Collaboration ndash ~17 editspage on average in

Wikipedia (with high variance)

ndash Wikipediarsquos Neutral Point of View

Convention ndash Users follow customs and

conventions to engage with articles effectively

29

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Software Support Makes Wikis Successful

Trivial to edit by anyone Tracking of all changes one-

step rollback Every article has a ldquoTalkrdquo page

for discussion Notification facility allows anyone

to ldquowatchrdquo an article Sufficient security on pages

logins can be required A hierarchy of administrators

gardeners and editors Software Bots recognize certain

kinds of vandalism and auto-revert or recognize articles that need work and flag them for editors

30

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

However Finding Information Is hellip

Wikipedia has articles abouthellip

bull hellip all movies with cast director

budget running time grosshellip

hellip all German cars with engine

size accelerating datahellip

Can you find

Sci-Fi movies made between

2000-2008 that cost less than

$10000 and made $30000

Skyscrapers with 50+ floors

and built after 2000 in

Shanghai (or Chinese cities

with 1000000+ people)

31

Or German(Porsche) cars that

accelerate from 0-100kmh in 5

seconds

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Can Search Solve the Problem

32

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Answer is Hidden Deeply in

33

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

To Find More Info

All Porsche vehicles made in

Germany that accelerate from 1-

100 kmh less than 4 seconds

Sci-Fi movies made between

2000-2008 that cost less than

$10M and gross more than $30M

A map showing where all

Mercedes-Benz vehicles are

manufactured

All skyscrapers in China (Japan

Thailandhellip) of 50 (406070) floors

or more and built in year 2000

(20012002) and after sorted by

built year floorshellip grouped by

cities regionshellip

35

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

What is a Semantic Wiki

A wiki that has an underlying model of the

knowledge described in its pages

To allow users to make their knowledge explicit and formal

Semantic Web Compatible

36

Semantic Wiki

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Two Perspectives

Wikis for Metadata

Metadata for Wikis

37

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

A SIMPLE EXAMPLE SEMANTIC MOVIES

Sci-Fi Movie Wiki Live Demo

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Basics of Semantic Wikis

Still a wiki with regular wiki features

ndash CategoryTags Namespaces Title Versioning

Typed Content (built-ins + user created eg categories)

ndash PageCard Date Number URLEmail String hellip

Typed Links (eg properties)

ndash ldquocapital_ofrdquo ldquocontainsrdquo ldquoborn_inrdquohellip

Querying Interface Support

ndash Eg ldquo[[CategoryMember]] [[Agelt30]]rdquo (in SMW)

41

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Characteristics of Semantic Wikis

42

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

What is the Promise of Semantic Wikis

Semantic Wikis facilitate Consensus over Data

Combine low-expressivity data authorship with the best features of traditional wikis

User-governed user-maintained user-defined

Easy to use as an extension of text authoring

43

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

One Key Helpful Feature of Semantic Wikis

Semantic Wikis are ldquoSchema-Lastrdquo Databases require DBAs and schema design

Semantic Wikis develop and maintain the schema in the wiki

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

List of Semantic Wikis

AceWiki

ArtificialMemory

Wagn - Ruby on Rails-based

KiWi ndash Knowledge in a Wiki

Knoodl ndash Semantic

Collaboration tool and

application platform

Metaweb - the software that

powers Freebase

OntoWiki

OpenRecord

PhpWiki

Semantic MediaWiki - an

extension to MediaWiki that

turns it into a semantic wiki

Swirrl - a spreadsheet-based

semantic wiki application

TaOPis - has a semantic wiki

subsystem based on Frame

logic

TikiWiki CMSGroupware

integrates Semantic links as a

core feature

zAgile Wikidsmart - semantically

enables Confluence

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

SEMANTIC EXTENSIONS Semantic MediaWiki and

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic MediaWiki Software

Born at AIFB in 2005

ndash Open source (GPL) well documented

ndash Vulcan kicks off SMW+ in 2007

Active development

ndash Commercial support available

ndash World-wide community

International Conferences

ndash Last SMWCon 425-27 2012 in Carlsbad CA

ndash Next SMWCon 1024-26 2012 in Koln Germany

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic MediaWiki (SMW) Markup Syntax

Beijing University is located in

[[Has locationBeijing]] with

[[Has population30000|about 30 thousands]]

students

In page PropertyHas locationrdquo [[Has typePage]]

In page PropertyHas populationrdquo [[Has typenumber]]

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Special Properties

ldquoHas Typerdquo is a pre-defined ldquospecialrdquo property for meta-

data

ndash Example [[Has typeString]]

ldquoAllowed Valuesrdquo is another special property

ndash [[Allows valueLow]]

ndash [[Allows valueMedium]]

ndash [[Allows valueHigh]]

In Halo Extensions there are domain and range support

ndash RDFs expressivity

ndash Semantic Gardening extension also supports ldquoCardinalityrdquo

49

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Define Classes

50

Beijing is a city in [[Has

countryChina]] with population

[[Has population2200000]]

[[CategoryCities]]

Categories are used to define classes because they are better for class inheritance

The Jin Mao Tower (金茂大厦) is an 88-story landmark supertall skyscraper in hellip

[[Categories 1998 architecture | Skyscrapers in Shanghai | Hotels in Shanghai | Skyscrapers over 350 meters | Visitor attractions in Shanghai | Landmarks in Shanghai | Skidmore Owings and Merrill buildings]]

CategorySkyscrapers in China Category Skyscrapers by country

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Possible Database-style Query over Data

51

ask

[[CategorySkyscrapers]]

[[Located inChina]]

[[Floor countgt50]]

[[Year builtlt2000]]

hellip

Ex Skyscrapers in China higher than

50 stories built before 2000

ASKSPARQL query target

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic MediaWiki Stack

MediaWiki (XAMPP)

Extension Semantic MediaWiki

More Extensions and Applications

52

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

MediaWiki SMW+

Halo Extension

Usability extension to Semantic MediaWiki

Increases user consensus

Increases use of semantic data

Semantic MediaWiki

Core Semantic Wiki engine

Authoring of explicit knowledge in content

Basic reasoning capabilities

SMW+

Shrink wrap suite of open source software products

Comes with ready to use ontology

Easy to procure and install

MediaWiki

Powerful Wiki engine

Basic CMS feature set

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

SMW Extensions ndashTowards Great Apps

56

bull Halo Extensions Wiki Object Model Semantic Forms Notification hellip

Data IO

bull Semantic Toolbar Semantic Drilldown Faceted Searchhellip

Query and Browsing

bull Semantic Result Printers Tree View Exhibit Flash chartshellip

Visualization

bull HaloACL Deployment Triplestore Connector Simple Ruleshellip

bull Semantic WikiTags and Subversion Integration extensions

bull Linked Data Extension with R2R and SILK from FUBerlin

Other useful extensions

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic Wikipedia

Ultrapedia An SMW proof of concept built to explore general knowledge acquisition in a wiki

Domain German Cars (in Wikipedia)

Wikipedia merged with the power of a database

Help Readers and Writers Be More Productive

57

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Standard View of the Wiki Data

httpwikingvulcancomupindexphpPorsche_996

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Dynamic View of the Acceleration Data

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Graph View of the Acceleration Data

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Dynamic Mapping and Charting

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Information Discovery via Visualization

62

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

FROM WIKI TO APPLICATION We Started With a Wiki Ended with an App

63

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Build Social Semantic Web Apps

64

Fill in data (content)

Develop schema (ontology)

Design UI skin form amp templates

Choose extensions

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

65

Video Semantic Wikis for A New Problem (2008)

Social tag-based characterization

Keyword search over tag data

Inconsistent semantics

Easy to engineer

Increasing technical complexity rarr

larr Increasing User Participation

Algorithm-based object characterization

Database-style search

Consistent semantics

Extremely difficult to engineer

Social database-style characterization

Database search + wiki text search

Semantic consistency via wiki mechanisms

Easy to engineer

Semantic

Entertainment

Wiki

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic Seahawks Football Wiki

66

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Semantic Entertainment Query Result Highlight Reel

Commercial LookFeel

Play-by-play video search

Highlight reel generation

Search on crowd-defined patterns (ldquotouchdowns with big hitsrdquo)

Tree-based navigation widget

Very favorable economics

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

AGILE APPLICATION DEVELOPMENT With Semantic MediaWiki+

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

72

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

MINI-OLYMPIC APPLICATION IN 2 DAYS

From Agile to Quick Agile

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Application Mini-Olympics

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

77

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

78

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

79

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Requirements for Wiki ldquoDevelopersrdquo

One need not

ndash Write code like a hardcore programmer

ndash Design setup RDBMS or make frequent

schema changes

ndash Possess knowledge of a senior system

admin

Instead one need

ndash Configure the wiki with desired extensions

ndash Design and evolve the data model

(schema)

ndash Design Content

bull Customize templates forms styles skin etc

80

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

We Can Build

81

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Effectiveness of SMW as a Platform Choice

Packaged Software

Very quick to

obtain

N Hard to customize

N Expensive

Microsoft Project

Version One

Microsoft

SharePoint

Custom Development

N Slow to develop

Extremely flexible

N High cost to develop

and maintain

NET Framework

J2EE hellip

Ruby on rails

82

SMW+ Suite

Still quick to

program

Easy to customize

Low-moderate cost

Mini-Olympics

Scrum Wiki

RPI map

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Neurowiki An SMW to Acquire Neuroscience Knowledge

Can we use SMW to acquire relationship information ndash From data focus (movies) to relationship focus (Neurowiki) ndash A Neurowiki user cannot contribute data without also contributing

relationships

Experimental goals ndash Create an SMW instance for acquiring both instance and relationship

information between data elements ndash Integrate SMW with the Berlin LDIF system

Product goals ndash A community-driven collaborative framework for neuroscience data

integration bull Support for community-built neuroscience vocabularies bull A library of visualizations and simple analytics

ndash Neuroscientist users can createmodify pages and metadata bull Scientists can own the evolving knowledge structure bull Low barrier for scientist to publish data

ndash Support scientific discovery

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

NEUROWIKI DEMO Using SMW+ to crowdsource knowledge of mapping and linking

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

AURA Wiki Bringing Together Inquire and SMW+

Reduce costs and increase scale of AURA authoring ndash A real high-quality application of SMW+

ndash Focus on creating universally-quantified sentences (UTs)

bull Example ldquoGlycolysis which occurs in the cytosol begins the degradation process by breaking glucose into two molecules of a compound called pyruvaterdquo

ndash UT Glycolysis occurs in Cytosol (concept Glycolysis)

ndash UT During cellular respiration glycolysis is the first step (concept Cellular Respiration)

ndash UT During glycolysis breakdown of 1 glucose results in 2 pyruvate molecules (concept Glycolysis)

ndash Use these UTs as inputs to the AURA authoring process

Use SMW-acquired statements as inputs to the process ndash Potential savings of over 50 of AURA authoring time

Use the social semantic web for real AI authoring 85

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

AURAwiki Strategy

Provide a community of authors with everything they need to collaborate on high-quality UT authoring ndash Top-down Encoding Flow

bull Suggestions and commentary from KE (Knowledge Engineers) and SMEs previous contributions

bull Supported by ontological terms and relationships questions guidelines testing

ndash Bottom-up Encoding Flow bull UT suggestions based upon algorithms (ReVerb by University of Washington)

bull All suggestions validated via Mechanical Turk before use in AURAWiki

ndash Build a ldquoKnowledge Integration Centerrdquo

Current Page Designs ndash Main Page

ndash User Page

ndash Authoring Page

Will perform initial testing by September 15

86

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Conclusion

Inquire is a new application of AI technology ndash Question-answering in textbooks

ndash Precise embedded answers over a complex subject matter

Semantic wikis and SMW+ use social and crowd phenomena as a new way to address the knowledge acquisition problem ndash Crowdsourcing addresses the cost and update problems

ndash Validate using Inquire authoring process

87

From text (textbook sentences)

to data (knowledge encodings)

to knowledge (use in a QA system)

The Social Semantic Web in Action

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Project Halo Team

DIQA

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

89

Thank You

wwwvulcancom

wwwprojecthalocom

wwwinquireprojectcom

wwwsmwpluscom

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

Quiz and Homework Scores

P-value from 2 tailed t-tests HW scores Full vs Ablated 012 Full vs textbook 002 Ablated vs Text 052

Quiz scores Full vs Ablated 0002 Full vs textbook 005 Ablated vs Text 018

81

88

74 75 71

81

HW SCORE QUIZ SCORE

A

B

C

D

n=24

n=25

n=23

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution

0

10

20

30

40

50

A B C D F

o

f st

ud

ents

Inquire Ablated Textbook

textbook

Quiz Score Distribution