Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

25
Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US

Transcript of Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Page 1: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Pascal Visualization Challenge

Blaž Fortuna, IJS

Marko Grobelnik, IJS

Steve Gunn, US

Page 2: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Part I: Challenge details

Page 3: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

ePrints

Database of around 1600 papers published by Pascal members

Papers are described with: Authors (unique Pascal Id) Title Abstract (most papers) Publish date (some papers

only have year)

Page 4: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Challenge Goal

Two main goals: to test and compare different text

visualization methods, ideas and algorithms on a common dataset,

to contribute to the Pascal dissemination and promotion activities by using data about scientific publications from Pascal’s EPrints server

Page 5: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Task

Visualize and present the Pascal ePrints data in a novel way which enables:

discovering main areas covered by the papers and people in Pascal,

discovering area and people developments trough time,

helping the researchers with recommendation on which papers to read,

helping at finding the right reviewers for new papers.

Page 6: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Data

Raw XML file from Pascal ePrints server Processed data for easier use:

Bag-of-words (TextGarden, Matlab) Graph (Matlab, Pajek)

Data processed for different possible scenarios.

Page 7: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Raw XML file

Cleaned data from Pascal ePrints server.

Data is given as a list of papers, each paper is described by: Title Abstract Year of publication List of authors

Each Author is described by unique Pascal Id and institution.

<paper id="2080" year="2006"><title>Synthesis of Maximum…</title><abstract>In this presentation…</abstract><subjects>

<subject id="CS">Computati…</subject><subject id="LO">Learning…</subject><subject id="TA">Theory …</subject>

</subjects><authors>

<author id="452" institution_id="1">Sandor Szedmak

</author><author id="1" institution_id="1">

John Shawe-Taylor</author>

</authors><institutions>

<institution id="1">Universit…</institution></institutions>

</paper>

Page 8: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Bag-of-words

Covered scenarios: Document == Paper Document == Author Document == Institution

Available formats: TextGarden

Text file where one line equals one document

Matlab Data available in form of

sparse Term-Document matrix

TextGarden (www.textmining.net): Format:

Document_name !Subject DocumentList Example:

Support_Vector_Machine_to_synthesise_kernels !Machine_Vision !Theory_and_Algorithms Support Vector Machine to synthesise kernels -- Suppose we are given two sets of …

Matlab: Sparse matrix saved in text file, it can be

simply read into Matlab by:X = spconvert(load(‘papers.dat’));

Documents are columns in the matrix Names of columns (document names)

and rows (words) are provided.

Page 9: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Graph

Covered scenarios: Vertex == Word,

Edge == Co-Appearance Vertex == Author,

Edge == Co-Authors Vertex == Institution,

Edge == Collaboration

Available formats: Matlab

Data available in form of sparse adjacency matrix

Pajek Software for network

analysis

Matlab: Sparse matrix saved in text file, it

can be simply read into Matlab by:X =

spconvert(load(‘words.dat’)); Names of vertices (words,

authors, institutions) are provided.

Pajek: Can be downloaded from:

vlado.fmf.uni-lj.si/pub/networks/pajek

Page 10: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Submissions

The results can be: images, movies, Web sites, VRML files, executables (windows, linux), etc.

For interactive tool also provide a video, showing the use of the tool on the Pascal ePrints data.

Page 11: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Evaluation

Usability of visualization – The goal is to assess usability of particular visualization in different practical contexts.

Innovativeness – The goal is to estimate how innovative are the ideas used for visualization.

Aesthetics of the image – Here we are aiming to identify the "nicest" images from the challenge.

General Pascal-researchers’ voting over the web about "who likes what".

Since all the criteria are subjective, we will hire experts for judging about the quality.

Each of the criteria will generate a separate ranking.

Page 12: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Part II: Examples

Page 13: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Visualization example 1/2: Document Atlas

Bag-of-words approach: Document == Author Author is described by

a sum of all the abstracts from the papers he co-authored.

We construct separate profile for papers from year 2004 and papers from year 2005.

Page 14: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Dimensionality reduction

Documents are mapped from bag-of-words space to two dimensions in two steps: Latent Semantic Indexing:

13.000 dim => 110 dim Multidimensional Scaling

110 dim => 2 dim

The background reflects the density of documents

document

Page 15: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Background words Each part of the map is

assigned a keyword which is most representative for the documents in the area.

We get a “map” of the topics covered within the documents.

In the case of Pascal ePrints data areas on the map correspond to the areas covered within the Pascal Network.

Page 16: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Time dynamics

For each author we have profile for years 2004 and 2005

By showing the difference we can see how authors’ research focus developed between 2004 and 2005.

gradient

Page 17: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Co-Authorships

Page 18: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Live Demo

Page 19: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Visualization example 2/2: IST World Web portal developed within

IST World EU project Uses search and

visualization methods to: discover the main research

areas and collaborations within the PASCAL organizations

produce recommendation on which papers to read (e.g. papers on image recognition, or kernel trick)

find the right reviewers for a new paper (e.g a paper on "brain computer interface") and assess their competence

Page 20: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Research areas

Institutions are placed on the map of research areas from Pascal Network

Example shows which are the areas closely related to JSI

Page 21: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Collaborations Collaboration of institutions

Collaboration of authors working on

“text mining”

Page 22: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Paper Recommendation

Page 23: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Competence Search

Page 24: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Live Demo

Page 25: Pascal Visualization Challenge Blaž Fortuna, IJS Marko Grobelnik, IJS Steve Gunn, US.

Thank you!