Mining the web to improve semantic-based multimedia search and digital libraries

9
Mining the web to improve semantic-based multimedia search and digital libraries http://gate.ac.uk/ http://nlp.shef.ac.uk/ Horacio Saggion Kalina Bontcheva University of Sheffield 21 November 2006 IST Event 2006 Web Mining and Semantic Web: Networking with industry and academia [This work has been partially supported by SEKT (http:// sekt.semanticweb.org / ), PrestoSpace (http:// www.prestospace.org ) and TAO (http://www.tao-project.eu/ projects]

description

Mining the web to improve semantic-based multimedia search and digital libraries http://gate.ac.uk/ http://nlp.shef.ac.uk/ Horacio Saggion Kalina Bontcheva University of Sheffield 21 November 2006 IST Event 2006 Web Mining and Semantic Web: Networking with industry and academia - PowerPoint PPT Presentation

Transcript of Mining the web to improve semantic-based multimedia search and digital libraries

Page 1: Mining the web to improve  semantic-based multimedia search and digital libraries

Mining the web to improve semantic-based multimedia search and

digital libraries

http://gate.ac.uk/ http://nlp.shef.ac.uk/

Horacio SaggionKalina Bontcheva

University of Sheffield

21 November 2006 IST Event 2006

Web Mining and Semantic Web: Networking with industry and academia

[This work has been partially supported by SEKT (http://sekt.semanticweb.org/),

PrestoSpace (http://www.prestospace.org) and TAO (http://www.tao-project.eu/ projects]

Page 2: Mining the web to improve  semantic-based multimedia search and digital libraries

2(9)

Web mining and semantic annotation: why?

• Semantic annotation produces explicit representation of knowledge, given content– Knowledge is often implicit in the data sources – …or hard to extract automatically to a sufficient

accuracy

• Frequently knowledge can be mined from the web and merged with the original content to improve semantic search and reasoning capabilities

Page 3: Mining the web to improve  semantic-based multimedia search and digital libraries

3(9)

Web mining and semantic annotation: how?

• GATE is a widely used open-source infrastructure for text mining (http://gate.ac.uk):– Ten years old, with 1000s of users at 100s of sites– Supports major document formats and languages– Helps build semantic annotation components– Integrate these with content and knowledge mined

from the web– Create, test, and deploy these into an end-to-end

application (some examples next)

Page 4: Mining the web to improve  semantic-based multimedia search and digital libraries

4(9)

RichNews: Multimedia Annotation

• The problem:– Access to archive material in the BBC is provided

by some form of semantic annotation and indexing– Manual annotation is time consuming (up to 10x

real time) and expensive• Rich News (developed within the Prestospace

project) aims to (partially) automate the annotation of news programs

– Developed on BBC TV and radio news– Involving human in the loop is possible if desired

• Recordings of broadcasts go in one end• Index of semantic metadata describing each

news story comes out the other

http://gate.ac.uk/sale/www05/web-assisted-annotation.pdf

Page 5: Mining the web to improve  semantic-based multimedia search and digital libraries

5(9)

Web mining in RichNews• Why web mining:

– Speech recognition produces poor quality transcripts with many mistakes

– Closed captions/subtitles not always available– These news stories can also be found on the BBC

and other web sites• The solution:

– Obtain key terms from the ASR transcripts– Search the web for related stories from same date– Find best matching stories– Obtain semantic annotations from this richer text– Merge with semantic annotations on transcript to

obtain more precise knowledge, grounded in the video streamhttp://gate.ac.uk/sale/www05/web-assisted-annotation.pdf

Page 6: Mining the web to improve  semantic-based multimedia search and digital libraries

6(9)

RichNews Example

Page 7: Mining the web to improve  semantic-based multimedia search and digital libraries

7(9)

TAO – Augmenting Software Artefacts with Semantics

• TAO project – http://www.tao-project.eu • Transitioning Applications to Ontologies• Case study on augmenting software artefacts

with semantics• Learning ontologies from multiple software

artefacts • Knowledge about a software project often

spread across different sources on the web:– Source code, discussion messages, bug descriptions,

documentation

Page 8: Mining the web to improve  semantic-based multimedia search and digital libraries

8(9)

New Challenges

• Moving towards mining and semantically annotating Web 2.0– Opinion mining from blogs and discussion

forums – Mining wikis – Social network analysis

• Mining multimedia content • Initial experiments in ongoing projects, but

we need further work on these emerging social-oriented web

Page 9: Mining the web to improve  semantic-based multimedia search and digital libraries

9(9)

Thank you!

These slides: • http://gate.ac.uk/sale/talks/ist06/ist-event06.ppt

Further details:– RichNews: http://gate.ac.uk/sale/www05/web-

assisted-annotation.pdf – SEKT: http://gate.ac.uk/sale/iswc06/iswc06.pdf

– TAO: http://www.tao-project.eu