Web System Ontology Based
-
Upload
andrea-ferracani -
Category
Documents
-
view
219 -
download
0
Transcript of Web System Ontology Based
8/6/2019 Web System Ontology Based
http://slidepdf.com/reader/full/web-system-ontology-based 1/4
A WEB SYSTEM FOR ONTOLOGY-BASED MULTIMEDIA ANNOTATION, BROWSING ANDSEARCH
M. Bertini, G. Becchi, A. Del Bimbo, A. Ferracani, D. Pezzatini
University of Florence - MICCFirenze, Italy
ABSTRACT
In this paper we present a complete system for semantic and syn-
tactic annotation, browsing and search of multimedia data, that is
based on a service oriented architecture, with web-based interfaces
developed following the Rich Internet Application paradigm.The system has been designed to be: i) flexible and extendable,
allowing users to select only the services they need or to add their
own tools to the multimedia processing pipelines; ii) distributed,
with services that can be executed in a cloud computing infrastruc-
ture and accessed through web applications; iii) user-friendly, with
interfaces that have a uniform interface on every platform and that
have an interaction level similar to that of desktop applications.
Extensive user trials in real-world setup, performed by archive
and broadcaster professionals, have shown the efficacy and usability
of the proposed solution.
Index Terms— Multimedia database, multimedia authoring,
content analysis, content-based retrieval.
1. INTRODUCTION
Recently two surveys to gather user requirements for video annota-
tion and search systems have been conducted within the EU funded
research projects VidiVideo1 and IM3I2. More than 50 profession-
als working in broadcasters, national video archives, photographic
archives and cultural heritage organizations have participated to the
surveys. One of the main outcomes is that multimedia annotation
and management system have to be web-based. In fact, this require-
ment was deemed “mandatory” by 75% of the interviewees and “de-
sirable” by another 20% [1, 2]. Other interesting results are that con-
trolled lexicons and ontologies are widely used, respectively by 64%
and 39% of the interviewees, and that 71% of users requested the
possibility to have combinations of search mechanisms that account
for structured (e.g. metadata, controlled lexicons and ontologies) and
unstructured data (e.g. free text and transcriptions). However, most
of the annotation and search system developed by the scientific mul-
timedia community are desktop applications [3, 4, 5, 6, 7, 8] that
have search and browsing tools designed for participation to scien-
tific competitions, like TRECVID and VideoOlympics, rather than
for end-users, like broadcaster and video archive professionals. Re-
cently some video search engines have been designed as web appli-
cations [9, 10, 11] because of the convenience of using browsers as
clients that access a common search engine.
To satisfy the needs expressed by the surveys we have developed
a system that offers an integrated service-oriented environment for
1http://www.vidivideo.info
2http:/www.im3i.eu
processing, analysing, indexing, tagging, and searching multimedia
content, at the syntactic and semantic level.
2. THE SYSTEM
The system presented in this paper3 provides a service-oriented ar-chitecture (SOA) that allows for multiple viewpoints of multimedia
data inside repositories, providing better ways to reuse, repurpose
and share rich media. This paves the way for a multimedia infor-
mation management platform that is more flexible, adaptable, and
customizable. In fact, a SOA provides methods for systems devel-
opment and integration by packaging system functionalities as inter-
operable services, that are the building blocks of the system. A SOA
infrastructure allows different applications to communicate with one
another, in loosely coupled way, by passing data in a shared format
or by orchestrating the activity of the services. One of the outcomes
of this architectural choice is that system deployment in existing in-
frastructures and workflows does not require to redesign them, since
it becomes possible to simply complement them, adding only the
services that are required. This latter point is particularly importantwhen considering organizations like broadcasters or national video
archives, that can not completely redesign their existing systems.
An overview of the system architecture, composed by four main
layers, is shown in Fig. 1: Interface and Authoring Layer; Archi-
tecture layer and Analysis Layer. Communication between analysis
and interface layers is routed through the architecture layer, which
also takes care of the main repository functions.
The Analysis layer is responsible for extracting low level fea-
tures and semantic annotations from media files, through a series of
processing pipelines that can be executed in a cloud of servers, or-
chestrated by apposite services. The Interface and Authoring layers
are composed of several components, ranging from specialized inter-
faces for annotation and search to basic UI widgets. A main compo-
nent in the figure is the authoring layer. This component is dedicatedto the composition and creation of search, browsing, and editing
interfaces for end-users, combining ready-made interface building
blocks.
Automatic multimedia annotation is performed by user definable
processing pipelines; the system provides a number of services for
syntactic and semantic audio and video content annotation. These
services can be combined in processing pipelines, to create more
complex services. For example, a video annotation pipeline, that
can be created, modified and managed using some of the services
provided by this system, is shown in Fig. 2.
Annotation of visual content is performed using an implemen-
tation of the Bag-of-Visual-Words paradigm, based on a fusion of
MSER [12], SURF [13] and SIFT [14] features and the Pyramid
3Available for testing at: URL hidden for double blind review
8/6/2019 Web System Ontology Based
http://slidepdf.com/reader/full/web-system-ontology-based 2/4
Semantic
search / browse
Syntactic search /
browse
Database
Media
storage
Video/Image
analysis pipeline
Audio analysis
pipeline
user-definedpipeline
Local SOA architecture
System SOA
System
repository
Corporate CMS
End-user
interfaces
Authoring
environment
File
storage
Analysis
Interface layer Authoring layer
SOA Architecture layer
Analysis layer
Fig. 1. Overall view of the system architecture.
Ingestion/ transcoding
Segmentation
BoW annotation
Video streaming
transcodingCBIR indexing
Audio extraction
To the audioprocessing
pipeline
Fig. 2. Example of the automatic annotation pipeline for videos,
built using services provided by the system. Users can create their
own processing pipeline combining the services provided by the sys-
tem, or other existing pipelines.
Matching Kernel [15]. Audio annotation is based on a fusion of
timbral texture features like ZCR, MFCCs, chroma and spectral fea-
tures and SVM classifiers. CBIR retrieval is performed using rhythm
and pitch features for audio and MPEG-7 features for visual data, in
particular using a combination of Scalable Color, Color Layout and
Edge Histogram descriptors. To address the problem of scalability
in large-scale archive these features are indexed using the approx-
imate similarity search approach presented in [16]. Semantic-level
search and browsing is performed through a search engine that uses
the ontology design presented in [17]; ontology-based reasoning us-
ing concept relations, subsumption and WordNet synonyms is em-
ployed for query expansion. The graph of the ontology concepts is
used also to browse the media archives (Fig. 4). The search engine
works also with free text annotations and transcriptions, and can be
used as a web service or through specific web applications. Other
services and specialized interfaces allow tagging and syntactic-level
content-based retrieval. Publishing functionalities are provided by a
set of services and interfaces of the authoring platform. This plat-
form allows to import and publish existing media repositories and
to author web-based environments that let end-users to interact with
the repositories. Authors can create elaborate workflow patterns and
Fig. 3. Screenshots of some of the annotation tools; top) AJAX tool
for tagging, ontology-based and audio transcription video annota-
tion, bottom) adding geographical metadata to concept annotations.
Fig. 4. Screenshot of the browse application: the concept cloud isused to start browsing, the graph shows a reduced view of the on-
tology around a selected concept. Users can inspect instances of the
concept stored in the system or search them in other repositories like
Youtube and Flickr.
search interfaces, that can be embedded in a variety of commercial
CMS systems.
Using AJAX, Flash/Flex, Silverlight and other Rich Internet Ap-
plication (RIA) technologies [18] make it is possible to develop web
applications that are highly responsive [19] and allow more advanced
interaction. The quality of interaction, made essential for users by
modern desktop applications and operating systems, is achieved by
means of drag&drop, advanced widgets and advanced multimedia
8/6/2019 Web System Ontology Based
http://slidepdf.com/reader/full/web-system-ontology-based 3/4
Fig. 5. Screenshots of some of the search tools; top) advanced
ontology-based video search (Google-like search is also available),
bottom) CBIR and image tagging. Video keyframes can be used to
select visually similar videos and images.
support, that are not available in traditional web-based applications
[20]. Other benefits are the improvement of the server performance
because of the distribution of the computational burden also on the
client and the easy distribution of new versions of the application,
that is downloaded by the clients every time that it is used. All theweb applications of the system have been developed according to
the RIA paradigm. In particular the applications of the Interface and
Authoring layers are developed in AJAX and Flash/Flex, while data
is exchanged using SOAP, RSS and JSON for metadata and RTMP
for video streaming. Fig. 3, 4 and 5 show some screenshots of the
manual annotation (to check automatic annotations, add metadata or
create ground truth annotations to train new automatic concept detec-
tors), browse , search (using different modalities) and tagging/CBIR
tools.
3. EXPERIMENTS
The system presented in this paper has been thoroughly tested in
several field trials with the participation of 19 multimedia archiveand broadcaster professionals in The Netherlands, Hungary, Italy
and Germany. The system was running on our servers while users
were at the venue of their organization, using the same PCs they use
for their daily work.
The goal of the field trials was to assess the usability of the sys-
tem, in particular letting the users to interact with the search en-
gine and its interfaces, to pose semantic- and syntactic-level queries,
but also to annotate, automatically and manually, some videos. The
methodology used follows the practices defined in the ISO 9241
standard, and considered the following four factors: usability, effec-
tiveness, efficiency and satisfaction. A set of activities that involved
the various interfaces have been selected. These activities allowed
to test several aspects of both the automatic and manual annotation
system (this activity was performed by a subset of 6 users) and the
features of the search/browse engines using different search modal-
ities (structured/unstructured/similarity based). The trials have been
followed by a debriefing of the users, that had to fill a questionnaire
to evaluate their impressions of the system and the perceived effec-
tiveness and usability. Given the fact that a such system is not yet of such a widespread use, and the fact that the interfaces of these types
of systems may require to understand the meaning and scope of var-
ious widgets, a user manual has been prepared for the users, to let
them obtain a basic understanding of the system. In addition to the
short manual, a simple system walkthrough (about 10 minutes long)
was presented to the users by test monitors. These monitors have
also taken observational notes and recorded verbal feedback from
users during the tests. These notes and the questionnaires have been
considered in a second stage of system design to improve the overall
usability, considering interface and workflow design.
Fig. 6. Overview of usability evaluation for the search tests: over-all usability of the system, usability of the combination of search
modalities.
The overall experience is very positive and the system proved to
be easy to use, despite the objective difficulty of interacting with a
complex system for which the testers received only a very limited
training. Fig 6 reports two results for the search activities. Users ap-
preciated the combination of different interfaces and functions. The
type of search modality that proved to be more suitable for the ma-
jority of the users is the advanced interface, because it lets to build
queries with Boolean/temporal relations between concepts and con-
cepts’ relations, and of the possibility to use geographical and video
metadata, that is appealing for professional archivists.
Also the usability of the annotation components, both automatic
8/6/2019 Web System Ontology Based
http://slidepdf.com/reader/full/web-system-ontology-based 4/4
Fig. 7. Overview of usability evaluation for the annotation tests: overall usability of the automatic annotation system, usability of the manual
annotation tool.
and manual, was satisfactory although some concerns remain regard-
ing the precision of automatic annotation, that is still too low for the
high standards of archivists. Fig. 7 reports two results for the au-
tomatic and manual annotation tools. None of the users had any
previous work experience of automatic video annotation systems but
they were trained in using a manual annotation tool developed within
their organization.
In general the comments recorded during the trials and those
gathered with the anonymous questionnaire have shows a high de-
gree of satisfaction for the system, and have provided interesting
hints for further improvement of the interfaces that, in part, have
already taken into account for further development.
4. CONCLUSIONS
In this paper we have presented a system, based on SOA back-end
and RIA front-end, that has been jointly designed by industrial and
academic partners of EU funded research projects. The system ar-
chitecture make it easily deployable, also in organizations that have
a well established multimedia management workflow.
The system provides functionalities for the management of au-
tomatic multimedia analysis pipelines, manual annotation tools,
searching and browsing tools and the authoring interfaces. It has
been thoroughly tested in a real-world setup by industry profession-
als, with good results, and is still under active development within
the scope of a EU funded technology transfer project.
5. REFERENCES
[1] “Deliverable D7.6 - validation of the user interface of the VIDI-Videosystem,” Tech. Rep., VidiVideo consortium, 2009.
[2] “Deliverable D2.1 - initial user requirements study,” Tech. Rep., IM3Iconsortium, 2009.
[3] J. Pickens, J. Adcock, M. Cooper, and A. Girgensohn, “FXPAL interac-tive search experiments for TRECVID 2008,” in Proc. of the TRECVID
Workshop, 2008.
[4] A. Natsev, W. Jiang, M. Merler, J.R. Smith, J. Tesic, L. Xie, and R. Yan,
“IBM Research TRECVid-2008 video retrieval system,” in Proc. of the
TRECVID Workshop, 2008.
[5] J. Cao, Y.-D. Zhang, B.-L. Feng, L. Bao, L. Pang, and J.-T. Li,“TRECVID 2009 of MCG-ICT-CAS,” in Proc. of the TRECVID Work-
shop, 2009.
[6] C.G.M. Snoek, K. E. A. van de Sande, O. de Rooij, B. Huurnink,J. R. R. Uijlings, M. van Liempt, M. Bugalho, I. Trancoso, F. Yan,M.A. Tahir, K. Mikolajczyk, J. Kittler, M. de Rijke, J.-M. Geusebroek,T. Gevers, M. Worring, D.C. Koelma, and A.W.M. Smeulders, “TheMediaMill TRECVID 2009 semantic video search engine,” in Proc. of
the TRECVID Workshop, Gaithersburg, USA, November 2009.
[7] O. de Rooij and M. Worring, “Browsing video along multiple threads,” IEEE Transactions on Multimedia (TMM), vol. 12, no. 2, pp. 121 –130,2010.
[8] Y.-T. Zheng, S.-Y. Neo, X. Chen, and T.-S. Chua, “VisionGo: towardstrue interactivity,” in Proc. of CIVR, 2009.
[9] M. Bertini, G. D’Amico, A. Ferracani, M. Meoni, and G. Serra, “Sirio,Orione and Pan: an integrated web system for ontology-based video
search and annotation,” in Proc. of ACM MM , 2010.
[10] W. Bailer, W. Weiss, G. Kienast, G. Thallinger, and W. Haas, “A videobrowsing tool for content management in postproduction,” Interna-
tional Journal of Digital Multimedia Broadcasting, 2010.
[11] S. Vrochidis, A. Moumtzidou, P. King, A. Dimou, V. Mezaris, andI. Kompatsiaris, “VERGE: A video interactive retrieval engine,” inProc. of CBMI , 2010.
[12] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baselinestereo from maximally stable extremal regions,” Image and Vision
Computing, vol. 22, no. 10, pp. 761 – 767, 2004.
[13] H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “SURF: Speeded UpRobust Features,” Computer Vision and Image Understanding (CVIU),
vol. 110, no. 3, pp. 346–359, 2008.
[14] D. G. Lowe, “Distinctive image features from scale-invariant key-points,” International Journal of Computer Vision (IJCV), vol. 60, no.2, pp. 91–110, 2004.
[15] K. Grauman and T. Darrell, “The pyramid match kernel: Efficientlearning with sets of features,” Journal of Machine Learning Research
(JMLR), vol. 8, pp. 725–760, 2007.
[16] G. Amato and P. Savino, “Approximate similarity search in metricspaces using inverted files,” in Proce. of InfoScale, 2008.
[17] L. Ballan, M. Bertini, A. Del Bimbo, and G. Serra, “Video annotationand retrieval using ontologies and rule learning,” IEEE MultiMedia,
vol. 17, no. 4, pp. 80–88, Oct.-Dec. 2010.
[18] P. Fraternali, G. Rossi, and F. Sanchez-Figueroa, “Rich internet appli-cations,” IEEE Internet Computing, vol. 14, pp. 9–12, 2010.
[19] T. Leighton, “Improving performance on the internet,” Communica-
tions of the ACM , vol. 52, pp. 44–51, February 2009.
[20] P. Fraternali, S. Comai, A. Bozzon, and G. T. Carughi, “Engineeringrich internet applications with a model-driven approach,” ACM Trans-
actions on the Web (TWEB), vol. 4, pp. 7:1–7:47, April 2010.